Timeouts on access to S3 by redshift after maintenance update



Last weekend 20/01/2024, after automatic maintenance upgrade of our provisioned redshift cluster (in eu-west-1a) to version 1.0.61678, all our LOAD commands started to fail. Also queries to "external" tables on S3 time out 100%. Regular queries still work normally, so it really looks to be a problem with the connectivity of redshift to S3 only.

Getting data from S3 from regular EC2 instances in same AZ still work as before, proving that VPC and routing tables are still fine (also: nothing else has changed besides this redshift maintenance update).

In STL_ERROR table, entries as below can be seen:

  • Problem reading manifest file - S3CurlException: Resolving timed out after 50001 milliseconds, CurlE
  • S3CurlException: Resolving timed out after 50001 milliseconds, CurlError 28, multiCurlError 0, CanRe

As there is no further visibility on this managed service, reached out to AWS support for investigation of this issue.

Anybody else experienced the same?

Wkr, Bert

posta 4 mesi fa211 visualizzazioni
1 Risposta

It is unfortunate Bert that you have encountered this issue after a routine upgrade and Support should be able to assist you with resolution.

I just wanted to bring the a general recommendation of running the Production cluster on trailing track and have non-prod like UAT, QA, Dev on the latest i.e. current track. This way you catch such issues before they impact Production workloads.

Best of luck!

profile pictureAWS
con risposta 4 mesi fa
profile picture
verificato 2 mesi fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande