How do I troubleshoot connection and socket timeout issues with OpenSearch Service?

4 minute read
0

My client application gets connection timeout or socket errors when I try to connect to Amazon OpenSearch Service.

Short description

Your client application might receive a connection or socket timeout error similar to these example outputs:

Connection timeout error output

curl: (7) Failed to connect to vpc-xxxxxxxx.us-east-1.es.amazonaws.com  port 443: Connection timed out
curl: (28) Operation timed out after 1001 milliseconds with 0 out of 0 bytes received

Socket timeout error output

j.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-5083 [ACTIVE]

java.net.SocketTimeoutException: Read timed out
Caused by: java.net.SocketTimeoutException: Read timed out

Resolution

Connection timeout errors

Public domains

Public domains are accessible over the internet when the client has connectivity or routes to the internet.

If the client doesn't have connectivity or routes to the internet, then you might get these responses:

telnet search-domain-name-someid.aws-region.es.amazonaws.com 443

OUTPUT :

Trying xyz.xyz.xyz.xyz...
telnet: connect to address xyz.xyz.xyz.xyz: Connection timed out

-or-

curl -v https://search-domain-name-someid.aws-region.es.amazonaws.com:443

OUTPUT:

*   Trying xyz.xyz.xyz.xyz:443...
* connect to xyz.xyz.xyz.xyz port 443 failed: Operation timed out
* Failed to connect to search-domain-name-someid.aws-region.es.amazonaws.com port 443 after 75243 ms: Couldn't connect to server
* Closing connection 0
curl: (28) Failed to connect to search-domain-name-someid.aws-region.es.amazonaws.com port 443 after 75243 ms: Couldn't connect to server

To resolve this issue, make sure that the client has routes to the internet and doesn't block outgoing requests to the search endpoint. 

Example of a successful response from the search endpoint:

telnet search-domain-name-someid.aws-region.es.amazonaws.com 443

OUTPUT :

Trying xyz.xyz.xyz.xyz...
Connected to search-domain-name-someid.aws-region.es.amazonaws.com.
Escape character is '^]'.

Domains inside a virtual private cloud (VPC)

Domains inside a VPC

For OpenSearch Service domains that you create inside a virtual private cloud (VPC), an elastic network interface is placed in the VPC for each data node. The network interfaces forward network traffic to your domain.

To check connectivity with the network interfaces to your domain inside the VPC, complete the following these steps:

  1. Run one of the following commands to get the network interface IP addresses in your VPC:

    nslookup -q=A vpc-domain-name-id.aws-region.es.amazonaws.com

    -or-

    dig +short vpc-domain-name-id.aws-region.es.amazonaws.com
  2. Run one of the following commands for each data node IP address:

    telnet <ip-address-of-ENI-from-step-1> 443

    -or-

    curl -v telnet://<ip-address-of-ENI-from-step-1>:443

Example timeout response:

Trying xyz.xyz.xyz.xyz...
telnet: connect to address xyz.xyz.xyz.xyz: Connection timed out

If the connection timed out, then check your VPC configuration security groups, route tables, and network access control list (network ACL).
Example successful response:

Trying xyz.xyz.xyz.xyz...
Connected to xyz.xyz.xyz.xyz.
Escape character is '^]'.

If you can connect to some of the network interfaces but others time out, then contact AWS Support for further assistance.

Note: You can't access your OpenSearch Service domains from outside the VPC. For more information, see Launching your OpenSearch Service domains within a VPC.

Socket timeout errors

Socket timeout errors usually occur when a client sends too many requests or complex requests. The OpenSearch Service domain might experience high resource utilization with delayed client responses.

To troubleshoot socket timeout errors, follow these steps:

  • Activate slow logs for your OpenSearch Service index, and then specify log thresholds. Use slow logs to determine if a query takes a long time to complete. You can also speed up the query with query tuning. For more information, see Viewing OpenSearch Service slow logs.
  • Reduce the amount of data that OpenSearch Service queries for the requests. This reduces the amount of time that's required for the requests to complete.
  • Use a larger instance type. For more information, see Choosing instance types and testing.
  • Configure exponential backoff and retry mechanisms in your application so that requests that time out are sent again.
  • Use the Profile API debugging tool to get detailed time out information. For more information, see Profile API on the Elastic website.

Related information

Troubleshooting Amazon OpenSearch Service

How do I troubleshoot search latency spikes in my Amazon OpenSearch Service cluster?

How can I improve the indexing performance on my Amazon OpenSearch Service cluster?

Analyzing Amazon OpenSearch Service slow logs using Amazon CloudWatch Logs streaming and Kibana

AWS OFFICIAL
AWS OFFICIALUpdated 8 months ago