Skip to content

We keep getting connection "Connection reset by peer" by the S3 table

0

We keep getting following error sometimes when we are wrting data to S3 table. Can anyone let me know what is the issue and work around? Its not happening for all tables but only specific tables only.

/iceberg/v1/arn%3Aaws%3As3tables%3Aus-east-2%3A{accountId}%3Abucket%2F{bucketName}/namespaces/{namespace}/tables/{tableName} POST b'{"identifier":{"namespace":["{namespace}"],"name":"{table}"},"requirements":[{"type":"assert-ref-snapshot-id","ref":"main","snapshot-id":"{snapshot-id}"},{"type":"assert-table-uuid","uuid":"{assert-table-uuid}"}],"updates":[{"action":"add-snapshot","snapshot":{"snapshot-id":{add-snapshot},"parent-snapshot-id":{parent-snapshot-id},"sequence-number":470,"timestamp-ms":1758178935908,"manifest-list":"s3://{manifest-list}"
('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
1 Answer
-1

The "Connection reset by peer" error you're experiencing with S3 tables is likely due to network connectivity issues between your application and the S3 service. This error occurs when the TCP connection is unexpectedly terminated by the remote server.

There are several potential causes and workarounds for this issue:

  1. Idle Connection Timeout: VPC Endpoints, including those for S3, have a fixed idle timeout of 350 seconds (about 5.8 minutes). After this period of inactivity, the connection is terminated, which can lead to the "Connection reset by peer" error.

  2. TCP Keepalive Configuration: By default, many systems have TCP Keepalive configured to trigger after 7200 seconds (2 hours), which might be too long for maintaining connections to S3.

Possible workarounds include:

  1. Implement TCP Keepalive: Configure your application or operating system to use TCP Keepalive with more aggressive settings:
  • Set tcp_keepalive_time to around 300 seconds (5 minutes)
  • Set tcp_keepalive_intvl to 60 seconds
  • Set tcp_keepalive_probes to 5
  1. Implement Application-Level Keepalive: Periodically send small requests to S3 to keep the connection active.

  2. Add Retry Logic: Implement robust retry mechanisms in your application to handle connection resets gracefully. When a reset occurs, re-establish the connection and resume the operation.

  3. Use AWS SDK: If you're not already using it, the AWS SDK provides built-in retry mechanisms and can handle many connection issues automatically.

  4. Check Concurrency Settings: If you're making many concurrent connections, you might need to increase the fs.s3.maxConnections value to accommodate your application's concurrency needs.

Since this is happening only with specific tables, also check if those tables have different characteristics (size, access patterns, etc.) that might be triggering the issue.
Sources
Network issue to S3: Timeouts in S3? | AWS re:Post
OpenSSL SSL_read: Connection reset by peer, errno 104 | AWS re:Post

answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.