Timeouts on access to S3 by redshift after maintenance update



Last weekend 20/01/2024, after automatic maintenance upgrade of our provisioned redshift cluster (in eu-west-1a) to version 1.0.61678, all our LOAD commands started to fail. Also queries to "external" tables on S3 time out 100%. Regular queries still work normally, so it really looks to be a problem with the connectivity of redshift to S3 only.

Getting data from S3 from regular EC2 instances in same AZ still work as before, proving that VPC and routing tables are still fine (also: nothing else has changed besides this redshift maintenance update).

In STL_ERROR table, entries as below can be seen:

  • Problem reading manifest file - S3CurlException: Resolving timed out after 50001 milliseconds, CurlE
  • S3CurlException: Resolving timed out after 50001 milliseconds, CurlError 28, multiCurlError 0, CanRe

As there is no further visibility on this managed service, reached out to AWS support for investigation of this issue.

Anybody else experienced the same?

Wkr, Bert

已提问 4 个月前211 查看次数
1 回答

It is unfortunate Bert that you have encountered this issue after a routine upgrade and Support should be able to assist you with resolution.

I just wanted to bring the a general recommendation of running the Production cluster on trailing track and have non-prod like UAT, QA, Dev on the latest i.e. current track. This way you catch such issues before they impact Production workloads.

Best of luck!

profile pictureAWS
已回答 4 个月前
profile picture
已审核 1 个月前

您未登录。 登录 发布回答。