Update row parquet S3

0

Hello, we created a job in AWS Glue, where it extracts a table from a JDBC connection, and takes this data to an S3 bucket, in parquet format. We have the following problem, if one of these data is changed at the source (JDBC), is it possible for it to be changed in the S3 file? For example, the Customers table has the Status column, when we first loaded this Status was 1, but due to an update at the source, it became 2. I would like the respective row existing in S3 to be updated with this information, like this as in source.

已提問 3 個月前檢視次數 180 次
2 個答案
2
已接受的答案

You cannot really update objects on S3 (much less for parquet where you cannot just update the file), you would really recreate the file and upload it again, possibly using the old file and making changing before writing back or just regenerating it from the source.
For use cases like that is that formats like Iceberg and Hudi where created, where you can choose to keep the updates on a new file (Merge on Read) or update the existing files easily (Copy on Write).

profile pictureAWS
專家
已回答 3 個月前
profile picture
專家
已審閱 2 個月前
0

Yes, it is possible to achieve the behavior you described using AWS Glue along with some additional AWS services. One common approach to accomplish this is by using AWS Glue with AWS Lambda and Amazon S3 event notifications.

Here's a high-level overview of how you could set this up:

  1. Configure your AWS Glue job to extract data from the JDBC connection and write it to S3 in Parquet format, as you've already done.
  2. Create a Lambda function that triggers on S3 events, specifically on the ObjectCreated event. This Lambda function will be responsible for updating the S3 object whenever changes are detected in the source data.
  3. Configure S3 event notifications to trigger the Lambda function whenever new objects are created in the S3 bucket where your Glue job outputs the data.
  4. Within the Lambda function, implement logic to compare the updated data in the source (JDBC) with the data in the corresponding S3 object. If any changes are detected, update the S3 object accordingly.
profile picture
專家
已回答 3 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南