Update row parquet S3

0

Hello, we created a job in AWS Glue, where it extracts a table from a JDBC connection, and takes this data to an S3 bucket, in parquet format. We have the following problem, if one of these data is changed at the source (JDBC), is it possible for it to be changed in the S3 file? For example, the Customers table has the Status column, when we first loaded this Status was 1, but due to an update at the source, it became 2. I would like the respective row existing in S3 to be updated with this information, like this as in source.

質問済み 3ヶ月前178ビュー
2回答
2
承認された回答

You cannot really update objects on S3 (much less for parquet where you cannot just update the file), you would really recreate the file and upload it again, possibly using the old file and making changing before writing back or just regenerating it from the source.
For use cases like that is that formats like Iceberg and Hudi where created, where you can choose to keep the updates on a new file (Merge on Read) or update the existing files easily (Copy on Write).

profile pictureAWS
エキスパート
回答済み 3ヶ月前
profile picture
エキスパート
レビュー済み 2ヶ月前
0

Yes, it is possible to achieve the behavior you described using AWS Glue along with some additional AWS services. One common approach to accomplish this is by using AWS Glue with AWS Lambda and Amazon S3 event notifications.

Here's a high-level overview of how you could set this up:

  1. Configure your AWS Glue job to extract data from the JDBC connection and write it to S3 in Parquet format, as you've already done.
  2. Create a Lambda function that triggers on S3 events, specifically on the ObjectCreated event. This Lambda function will be responsible for updating the S3 object whenever changes are detected in the source data.
  3. Configure S3 event notifications to trigger the Lambda function whenever new objects are created in the S3 bucket where your Glue job outputs the data.
  4. Within the Lambda function, implement logic to compare the updated data in the source (JDBC) with the data in the corresponding S3 object. If any changes are detected, update the S3 object accordingly.
profile picture
エキスパート
回答済み 3ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ