En utilisant AWS re:Post, vous acceptez les AWS re:Post Conditions d’utilisation

AWS glue studio node long run time for data preview

0

Hi, I am using AWS glue studio to read from a DDB table with direct DDB connection. So far my visual diagram has two nodes:

  1. Source DDB table node -> Here preview takes 5 minutes for only 2 rows of dataset but at least shows result
  2. Transform- selectFields -> Here session runs for long time (>20 minutes) and fails with error of 'session not ready My DDB table is of 691 bytes with provisioned capacity units as 5 RCU and 5 WCU. The glue job details has below config:
  3. Glue version -> 4.0
  4. Language-> python3
  5. Worker Type -> G1X (automatic scale for number of workers is enabled)
  6. Max number of workers -> 11
  7. job timeout-> 2880

Considering this is a smaller data subset, can you please let me know why it is taking a long time to run? or where to look for any related insights? I am hoping to use this as a part of my production data-pipeline that will transform and move data to redshift for DW purposes. Unfortunately there isn't enough information available for glue studios.

1 réponse
0

First of all I would suggest using on-demand mode in DynamoDB, at least until you get it working correctly. When you have 5 RCU, Glue takes that number as a limit, and rate limits its requests as not to exceed it. But I suspect you may have other issues.

Moreover, DynamoDB is releasing ZeroETL with Redshift, which is now in private preview, so perhaps it's advisable not to spend too much time creating the wheel. https://aws.amazon.com/about-aws/whats-new/2023/11/amazon-dynamodb-zero-etl-integration-redshift/

profile pictureAWS
EXPERT
répondu il y a 9 mois
  • Hi Leeroy, thanks for the prompt response and redirecting towards zero ETL with Redshift blog. While our account gets allow-listed for the preview, can you please let me know what other parts of the config I should be looking at to speed up the preview of sample dataset? I have changed DDB tables to on-demand mode, but it's not really speed up yet.

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions