How to deal with zombie Airflow BackfillJobs

0

For our MWAA v2.2.2 environment, we are seeing several BackfillJobs with live heartbeats but no corresponding dagruns in a running state. Going to the Browse > Jobs page, we can see these jobs as still running. These backfill jobs in particular may have been caused by running backfill for the same dag, multiple times -- for some reason, they never failed or completed, resulting in a zombie state.

Reading the Airflow docs, it looks like we'd need access to the underlying machines to terminate the processes for these backfill jobs. Given that MWAA is a managed service running on fargate, how would we go about terminating these jobs? Is there a way to forcibly cycle all the containers?

ukayani
已提问 1 年前641 查看次数
1 回答
0

Reading the Airflow docs, it looks like we'd need access to the underlying machines to terminate the processes for these backfill jobs. Given that MWAA is a managed service running on fargate, how would we go about terminating these jobs? Is there a way to forcibly cycle all the containers?

You can programmatically force a hard reset of the environment following the recommendations provided here (in the top answer). The items listed in the answer will restart MWAA and its containers.

AWS
Andrew
已回答 1 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则