AWS Glue DataBrew Job crashing

0

Hello, we are receiving the below error on one of our jobs:

RecipeStepError: Failed at step 2. Parameters: {'operation': 'JSON_TO_STRUCTS', 'sourceColumns': '["data"]', 'unnestLevel': '120'}. Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 10.192.20.37, executor 6): java.io.IOException: No space left on device

The step its failing on is changing the data type of a column to struct- and this dataset has a large number of rows. What can i do to fix this? I have tried increasing the number of nodes from 5 to 75 but its still not working? This was previously working okay and only stopped last week

feita há 8 meses258 visualizações
1 Resposta
1
Resposta aceita

Hello,

I understand that your Databrew recipe job is failing with the out of memory exception. As per your explanation, seems like that there is an excessive amount of data loaded into an individual node thereby the particular node is unable to handle such data and it is failing with the 'No space left on device'. Thus, increasing the number of nodes might not be helping you out.

The better solution I would suggest you is to move to Glue ETL job where in you get more flexibility into the type of executors. Thereby, you could go for a node with greater capacity.

profile pictureAWS
ENGENHEIRO DE SUPORTE
Chaitu
respondido há 8 meses
profile picture
ESPECIALISTA
avaliado há 5 dias
  • Thank you for your answer, I am looking at using Glue ETL now. Would avoiding changing the JSON to Structs step in the recipe help do you think? I think we could work around it

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas