AWS glue studio node long run time for data preview

0

Hi, I am using AWS glue studio to read from a DDB table with direct DDB connection. So far my visual diagram has two nodes:

  1. Source DDB table node -> Here preview takes 5 minutes for only 2 rows of dataset but at least shows result
  2. Transform- selectFields -> Here session runs for long time (>20 minutes) and fails with error of 'session not ready My DDB table is of 691 bytes with provisioned capacity units as 5 RCU and 5 WCU. The glue job details has below config:
  3. Glue version -> 4.0
  4. Language-> python3
  5. Worker Type -> G1X (automatic scale for number of workers is enabled)
  6. Max number of workers -> 11
  7. job timeout-> 2880

Considering this is a smaller data subset, can you please let me know why it is taking a long time to run? or where to look for any related insights? I am hoping to use this as a part of my production data-pipeline that will transform and move data to redshift for DW purposes. Unfortunately there isn't enough information available for glue studios.

質問済み 3ヶ月前262ビュー
1回答
0

First of all I would suggest using on-demand mode in DynamoDB, at least until you get it working correctly. When you have 5 RCU, Glue takes that number as a limit, and rate limits its requests as not to exceed it. But I suspect you may have other issues.

Moreover, DynamoDB is releasing ZeroETL with Redshift, which is now in private preview, so perhaps it's advisable not to spend too much time creating the wheel. https://aws.amazon.com/about-aws/whats-new/2023/11/amazon-dynamodb-zero-etl-integration-redshift/

profile pictureAWS
エキスパート
回答済み 3ヶ月前
  • Hi Leeroy, thanks for the prompt response and redirecting towards zero ETL with Redshift blog. While our account gets allow-listed for the preview, can you please let me know what other parts of the config I should be looking at to speed up the preview of sample dataset? I have changed DDB tables to on-demand mode, but it's not really speed up yet.

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ