Receiving socket connection reset in one of our ECS environments when attempting to download a file from S3

0

I have a spring batch application that is expected to read large csvs (200mb each in size) from AWS S3 and write the data into our database. We updated our process slightly to accommodate a new csv schema. When running this new import process using LocalStack, everything works flawlessly. As soon as we introduce a real network, we start having issues, but only in one of our ECS environments (prod). The process failed once in a lower environment and then succeeded every subsequent run. We get the following exception when pulling down a large file from S3:

java.net.SocketException: Connection reset
    at sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:328)
    at sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:355)
    at sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:808)
    at java.net.Socket$SocketInputStream.read(Socket.java:966)
    at sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:484)
    at sun.security.ssl.SSLSocketInputRecord.readFully(SSLSocketInputRecord.java:467)
    at sun.security.ssl.SSLSocketInputRecord.decodeInputRecord(SSLSocketInputRecord.java:243)
    at sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:181)
    at sun.security.ssl.SSLTransport.decode(SSLTransport.java:111)
    at sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1513)
    at sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1484)
    at sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1069)
    at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
    at org.apache.http.impl.io.SessionInputBufferImpl.read(SessionInputBufferImpl.java:197)
    at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:176)
    at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:135)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
    at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
    at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
    at com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:107)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
    at com.amazonaws.services.s3.internal.S3AbortableInputStream.read(S3AbortableInputStream.java:125)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
    at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:270)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:313)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:188)
    at java.io.InputStreamReader.read(InputStreamReader.java:177)
    at java.io.BufferedReader.fill(BufferedReader.java:162)
    at java.io.BufferedReader.readLine(BufferedReader.java:329)
    at java.io.BufferedReader.readLine(BufferedReader.java:396)
    at org.springframework.batch.item.file.FlatFileItemReader.readLine(FlatFileItemReader.java:207)
    ... 33 common frames omitted
Wrapped by: org.springframework.batch.item.file.NonTransientFlatFileException: Unable to read from resource: [Amazon s3 resource [bucket='bucket-name-here' and object='import/test.csv']]
    at org.springframework.batch.item.file.FlatFileItemReader.readLine(FlatFileItemReader.java:226)
    at org.springframework.batch.item.file.FlatFileItemReader.doRead(FlatFileItemReader.java:178)
    at org.springframework.batch.item.support.AbstractItemCountingItemStreamItemReader.read(AbstractItemCountingItemStreamItemReader.java:93)
    at org.springframework.batch.item.support.SynchronizedItemStreamReader.read(SynchronizedItemStreamReader.java:57)
    at org.springframework.batch.item.support.SynchronizedItemStreamReader$$FastClassBySpringCGLIB$$987ea09.invoke(<generated>)
    at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
    at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:793)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
    at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763)
    at org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed(DelegatingIntroductionInterceptor.java:137)
    at org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:124)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
    at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763)
    at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:708)
    at org.springframework.batch.item.support.SynchronizedItemStreamReader$$EnhancerBySpringCGLIB$$2a78c69a.read(<generated>)
    at org.springframework.batch.core.step.item.SimpleChunkProvider.doRead(SimpleChunkProvider.java:99)
    at org.springframework.batch.core.step.item.FaultTolerantChunkProvider.read(FaultTolerantChunkProvider.java:87)
    ... 17 common frames omitted
Wrapped by: org.springframework.batch.core.step.skip.NonSkippableReadException: Non-skippable exception during read
    at org.springframework.batch.core.step.item.FaultTolerantChunkProvider.read(FaultTolerantChunkProvider.java:105)
    at org.springframework.batch.core.step.item.SimpleChunkProvider$1.doInIteration(SimpleChunkProvider.java:126)
    at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:375)
    at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:215)
    at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:145)
    at org.springframework.batch.core.step.item.SimpleChunkProvider.provide(SimpleChunkProvider.java:118)
    at org.springframework.batch.core.step.item.ChunkOrientedTasklet.execute(ChunkOrientedTasklet.java:71)
    at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:407)
    at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:331)
    at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:140)
    at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:273)
    at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:82)
    at org.springframework.batch.repeat.support.TaskExecutorRepeatTemplate$ExecutingRunnable.run(TaskExecutorRepeatTemplate.java:262)
    at org.springframework.security.concurrent.DelegatingSecurityContextRunnable.run(DelegatingSecurityContextRunnable.java:82)
    at io.github.jhipster.async.ExceptionHandlingAsyncTaskExecutor.lambda$createWrappedRunnable$1(ExceptionHandlingAsyncTaskExecutor.java:78)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.lang.Thread.run(Thread.java:840)

This has never been a problem before and we have run this process multiple times over the past year with no issues until now. We are suspecting this to be an issue with the network AWS ECS or some sort of timeout being imposed on the AWS side. We have checked timeouts on our side and everything runs just fine in other ECS environments. Not sure where to go from here. Any help would be appreciated!

EDIT: Additionally, it seems to be failing/dropping connection after about 8min every single time. We have attempted to run this 4 times now in prod and all 4 times, it has thrown the first connection reset exception after 8min. Like I mentioned earlier, we have thoroughly tested our defined timeouts within our application by lowering them dramatically and testing locally and none of the timeouts were a problem.

d-viso
asked 5 months ago205 views
1 Answer
0

Hello,

This could be due to many reasons you may consider visiting this AWS blog[1] which walks you through troubleshooting steps for it 'How do I troubleshoot ConnectionReset errors when I upload or download objects from Amazon S3?'.

According to the documentation[1] there are several other timeouts which possibly related with this condition. You may try to set withRequestTimeout or withClientExecutionTimeout.

[1] https://repost.aws/knowledge-center/s3-connect-reset-error [2] https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/ClientConfiguration.html

AWS
sanju_s
answered 5 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions