AWS Glue DataBrew Job crashing

0

Hello, we are receiving the below error on one of our jobs:

RecipeStepError: Failed at step 2. Parameters: {'operation': 'JSON_TO_STRUCTS', 'sourceColumns': '["data"]', 'unnestLevel': '120'}. Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 10.192.20.37, executor 6): java.io.IOException: No space left on device

The step its failing on is changing the data type of a column to struct- and this dataset has a large number of rows. What can i do to fix this? I have tried increasing the number of nodes from 5 to 75 but its still not working? This was previously working okay and only stopped last week

preguntada hace 8 meses258 visualizaciones
1 Respuesta
1
Respuesta aceptada

Hello,

I understand that your Databrew recipe job is failing with the out of memory exception. As per your explanation, seems like that there is an excessive amount of data loaded into an individual node thereby the particular node is unable to handle such data and it is failing with the 'No space left on device'. Thus, increasing the number of nodes might not be helping you out.

The better solution I would suggest you is to move to Glue ETL job where in you get more flexibility into the type of executors. Thereby, you could go for a node with greater capacity.

profile pictureAWS
INGENIERO DE SOPORTE
Chaitu
respondido hace 8 meses
profile picture
EXPERTO
revisado hace 5 días
  • Thank you for your answer, I am looking at using Glue ETL now. Would avoiding changing the JSON to Structs step in the recipe help do you think? I think we could work around it

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas