I experience connectivity issues with my Amazon VPC-based AWS Lambda function that results in an error. How do I troubleshoot this?
Short description
You receive a connection-related error when you try to run your Amazon Virtual Private Cloud (Amazon VPC) Lambda function targeting a remote endpoint or service. This might be due to a network connectivity issue that creates an error message. For example, you try to create a DynamoDB table, but the operation times out.
Resolution
Create a test function that replicates the network configuration of the target function that you want to test. This is useful when you can't edit the target function to add troubleshooting logic. See Configuring a Lambda function to access resources in a VPC for further information on how to configure your Lambda for VPC access.
This is an example of a defined test function:
import socket
def connect_tcp(event, context):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.settimeout(8)
hostname = "www.amazon.com"
port = 443
server_address = (hostname, port) # Server address and port
try:
IPAddr = socket.gethostbyname(hostname)
print("Hostname: " + hostname)
print("Host IP:" + IPAddr)
print("Attempting to connect ..")
sock.connect(server_address)
sock.shutdown(socket.SHUT_RDWR)
print("connected")
except Exception as e:
print("-- Error --")
print(e)
finally:
sock.close()
In this example, the socket timeout is set to 8 seconds, which requires that the connection is established within that time, or else it fails. You can adjust this value if necessary.
The socket library is a runtime dependency that's not bundled by default with the Python runtime. Include it as part of the deployment package or in a layer that's associated with the function. See Deploy Python Lambda functions with .zip file archives for information on how to deploy dependencies with zip archive. See Using layers with your Lambda function for information on how to deploy dependencies as layers.
Note: It's a best practice to attempt to replicate the runtime of the target function. The test function is written in Python, but it can be ported to other runtimes.
With the test function in place, troubleshoot by using these steps:
- Set the hostname and port variables to match the ones to which the target function is attempting to establish a connection.
- Mirror the network configuration (subnet and security group) of the target function.
- Set a function timeout value to accommodate any overhead. It's a best practice to make sure that the function timeout is higher than the socket connection timeout to allow for a connection.
- Run the test.
If the test fails, then there is likely a connectivity issue that must be investigated. If the test is successful, then there is likely a connectivity between the Lambda Amazon VPC environment (including the security group) to the endpoint. In this case, there is probably an issue with the target function and any of its dependencies.
Note: It's a best practice to check that the test function is launched in a similar subnet as the failing target function where multiple subnets with different routing profiles are involved.
If the failing subnet isn't known, rotate through the subnets to identify the failing subnet by following these steps:
- Specify the first subnet, and ignore the availability warning because you won't deploy this to a production platform.
- Test the function.
- Specify the next subnet, and test again.
- Repeat the previous step until all subnets are checked.