Update row parquet S3

0

Hello, we created a job in AWS Glue, where it extracts a table from a JDBC connection, and takes this data to an S3 bucket, in parquet format. We have the following problem, if one of these data is changed at the source (JDBC), is it possible for it to be changed in the S3 file? For example, the Customers table has the Status column, when we first loaded this Status was 1, but due to an update at the source, it became 2. I would like the respective row existing in S3 to be updated with this information, like this as in source.

질문됨 3달 전178회 조회
2개 답변
2
수락된 답변

You cannot really update objects on S3 (much less for parquet where you cannot just update the file), you would really recreate the file and upload it again, possibly using the old file and making changing before writing back or just regenerating it from the source.
For use cases like that is that formats like Iceberg and Hudi where created, where you can choose to keep the updates on a new file (Merge on Read) or update the existing files easily (Copy on Write).

profile pictureAWS
전문가
답변함 3달 전
profile picture
전문가
검토됨 2달 전
0

Yes, it is possible to achieve the behavior you described using AWS Glue along with some additional AWS services. One common approach to accomplish this is by using AWS Glue with AWS Lambda and Amazon S3 event notifications.

Here's a high-level overview of how you could set this up:

  1. Configure your AWS Glue job to extract data from the JDBC connection and write it to S3 in Parquet format, as you've already done.
  2. Create a Lambda function that triggers on S3 events, specifically on the ObjectCreated event. This Lambda function will be responsible for updating the S3 object whenever changes are detected in the source data.
  3. Configure S3 event notifications to trigger the Lambda function whenever new objects are created in the S3 bucket where your Glue job outputs the data.
  4. Within the Lambda function, implement logic to compare the updated data in the source (JDBC) with the data in the corresponding S3 object. If any changes are detected, update the S3 object accordingly.
profile picture
전문가
답변함 3달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠