How to transform the large volume of data in DynamoDB table

0

I have a lot of data in DynamoDB table and i would like to do the record update without scanning each record. What is best way to handle this scenario ? Is it wise to use AWS glue on top of DynamoDB and do the transformation of the records and finally update the same target table of DynamoDB ?

Ram
질문됨 2달 전125회 조회
2개 답변
2

You have to Scan the items some way or another. If avoiding the Scan is to only avoid impacting production traffic then you can use the Export To S3 feature, which comes built into AWS Glue. This only saves on capacity consumption and not cost, so if cost is your concern for not Scanning, then I suggest to just Scan as its the most cost effective way.

My only concern with Glue is that by default it's an overwrite operation. Meaning if you read at time T1, transform and write to DDB at time T3, then any updates made at time T2 will be lost.

One way to overcome that is to use Glue to read the data, and instead of using the spark write, you could use Boto3 UpdateItem calls in a distributed manner, meaning you only update values, not overwrite them. This also allows you to use conditions on your writes: Accept update, only if nothing has been written since I read.....

profile pictureAWS
전문가
답변함 2달 전
  • Adding to Lee's excellent response, consider the possibility of having your regular writes/updates transform the data over time. If you can adjust your reads to accept the data in old or new form, then you can just start making all your regular application writes use the new form. In some cases this approach will mean that you don't have to do the bulk update - and at the very least it will reduce the number of writes required to do that "backfill" transformation.

-1

Using AWS Glue to transform and update records in DynamoDB would work but may not be the most efficient approach for large datasets. A few better options to consider:

  • Use DynamoDB Streams to capture any item updates or writes to the table. Have a Lambda function process the stream records and apply the necessary transformations before writing the items to the new target table.
  • Use AWS Data Pipeline to automate a periodic data migration job. This can fetch records from the source table, transform as needed, and write to the target table in batches.
  • Export the table data to S3 using DynamoDB Export, then use AWS Glue to perform ETL on the exported files. The transformed data can then be written back to the target DynamoDB table.
  • For simple attribute updates, use the DynamoDB UpdateItem API directly without scanning the whole table. You can update multiple items in parallel using BatchWriteItem

table.update_item(
  Key={
    'id': item_id
  },
  UpdateExpression="SET attr1 = :val1",
  ExpressionAttributeValues={':val1': new_value}  
)
profile picture
전문가
답변함 2달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠