How to deal with zombie Airflow BackfillJobs

0

For our MWAA v2.2.2 environment, we are seeing several BackfillJobs with live heartbeats but no corresponding dagruns in a running state. Going to the Browse > Jobs page, we can see these jobs as still running. These backfill jobs in particular may have been caused by running backfill for the same dag, multiple times -- for some reason, they never failed or completed, resulting in a zombie state.

Reading the Airflow docs, it looks like we'd need access to the underlying machines to terminate the processes for these backfill jobs. Given that MWAA is a managed service running on fargate, how would we go about terminating these jobs? Is there a way to forcibly cycle all the containers?

ukayani
preguntada hace un año641 visualizaciones
1 Respuesta
0

Reading the Airflow docs, it looks like we'd need access to the underlying machines to terminate the processes for these backfill jobs. Given that MWAA is a managed service running on fargate, how would we go about terminating these jobs? Is there a way to forcibly cycle all the containers?

You can programmatically force a hard reset of the environment following the recommendations provided here (in the top answer). The items listed in the answer will restart MWAA and its containers.

AWS
Andrew
respondido hace un año

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas