AWS Glue DataBrew Job crashing

0

Hello, we are receiving the below error on one of our jobs:

RecipeStepError: Failed at step 2. Parameters: {'operation': 'JSON_TO_STRUCTS', 'sourceColumns': '["data"]', 'unnestLevel': '120'}. Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 10.192.20.37, executor 6): java.io.IOException: No space left on device

The step its failing on is changing the data type of a column to struct- and this dataset has a large number of rows. What can i do to fix this? I have tried increasing the number of nodes from 5 to 75 but its still not working? This was previously working okay and only stopped last week

posta 8 mesi fa258 visualizzazioni
1 Risposta
1
Risposta accettata

Hello,

I understand that your Databrew recipe job is failing with the out of memory exception. As per your explanation, seems like that there is an excessive amount of data loaded into an individual node thereby the particular node is unable to handle such data and it is failing with the 'No space left on device'. Thus, increasing the number of nodes might not be helping you out.

The better solution I would suggest you is to move to Glue ETL job where in you get more flexibility into the type of executors. Thereby, you could go for a node with greater capacity.

profile pictureAWS
TECNICO DI SUPPORTO
Chaitu
con risposta 8 mesi fa
profile picture
ESPERTO
verificato 5 giorni fa
  • Thank you for your answer, I am looking at using Glue ETL now. Would avoiding changing the JSON to Structs step in the recipe help do you think? I think we could work around it

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande