Please help interpreting error message: Fail number 1/5 | Exception: An error occurred (ThrottlingException) when calling the GetWorkGroup operation (reached max retries: 5): Rate exceeded

0

I have a parallelized python script running in containers which transforms data and writes it to S3 with updates to the Glue catalog. Each container runs several tasks in parallel and the overall data processing task is scaled horizontally by running several containers at once. In total at any time there are > 50 independent tasks running.

As I scale the number of independent tasks I see this error in the logs on occasion: ERROR:awswrangler._utils:Retrying <bound method ClientCreator._create_api_method.<locals>._api_call of <botocore.client.Athena object at 0x7fc04041f200>> | Fail number 1/5 | Exception: An error occurred (ThrottlingException) when calling the GetWorkGroup operation (reached max retries: 5): Rate exceeded

The code I believe this is in response to is: wr.s3.to_parquet(df=ac1_dataset_writeable, path=f"s3://{bucket_name}/{ac1_prefix}/", dataset=True, mode=parquetwritemode, database="f-options-ac1", table="ac1", partition_cols=["ac1_symbol"])

I understand this is caused by attempted simultaneous writes to the catalog. What I am struggling with is that the error message itself is somewhat confusing, specifically as it says:

  • Retrying
  • Fail number 1/5
  • reached max retries: 5

all in the same error message. It is not evident to me if the function will try again to write my data (Retrying), or if it is now not going to try as it has exhausted the available retries (reached max retries: 5) ... If anyone has experience in how to properly interpret this message I would be grateful for the benefit of your experience.

Malcolm
已提问 7 个月前254 查看次数
2 回答
0

The error you're seeing indicates that you may be exceeding the throttling limits for one of the AWS services, in this case Amazon Athena. Athena has quotas on the number of concurrent queries and amount of data scanned per second to prevent any single user from dominating shared resources.

A few things you could try:

  • Add delays or backoff between Athena queries in your tasks to smooth out the request rate over time.
  • Check your IAM roles and policies to ensure the correct service quotas and limits are in place.
  • Consider using Athena concurrency scaling to automatically increase the number of workgroups as needed.
  • For very high throughput needs, look into AWS Glue ETL jobs which may have higher limits than Athena alone.
  • The key is to distribute the load over time rather than having all tasks attempt Athena operations at the exact same time. Some throttling is expected with serverless services, so adding retries and backoff is a best practice. Let me know if any of those suggestions help or if you have additional questions!
profile pictureAWS
已回答 7 个月前
profile picture
专家
已审核 7 个月前
0

That is helpful but I still am left uncertain if the wrangler function will retry again or if this means it has failed and will not retry?

Malcolm
已回答 7 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则