Resolution
Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.
To troubleshoot your failed test connection in AWS Glue, check your network and authentication connections.
Network issues
Check connectivity to JDBC data stores
AWS Glue creates elastic network interfaces with private IP addresses in the connection's subnet. Data that you store outside the Amazon Virtual Private Cloud (Amazon VPC) requires the subnet's route table to include a NAT gateway in a public subnet. Otherwise, the connection times out.
Note: The data that you store outside the Amazon VPC is an on-premises data store or an Amazon Relational Database Service (Amazon RDS) resource with a public hostname.
Check that the connection's security groups and network access control list (network ACL) allow traffic to the data in the VPC. Then, use the AWSSupport-TroubleshootGlueConnection runbook in AWS Systems Manager. For more information, see How do I troubleshoot errors with an AWS Glue connection that has a JDBC source?
If the connection requires a NAT gateway or access to AWS Secrets Manager and AWS Security Token Service (AWS STS), then attach the endpoints. For more information, see Connecting to data.
Check the connection's security groups
One of the security groups that's associated with the connection must have a self-referenced inbound rule that's open to all TCP ports. One of the security groups must be open to all outbound traffic. You can use a self-referenced rule to restrict outbound traffic to the VPC. For more information, see Setting up Amazon VPC for JDBC connections to Amazon RDS data stores from AWS Glue.
Check the number of free IP addresses
The number of free IP addresses in the subnet must be greater than the number of workers that you specify for the job. This allows AWS Glue to create network interfaces in the specified subnet.
Check that the subnet can access Amazon S3
Provide an Amazon Simple Storage Service (Amazon S3) endpoint or a route to a NAT gateway in your subnet's route table. For more information, see Error: Could not find S3 endpoint or NAT Gateway for subnetId in VPC.
Check whether you have an AWS KMS VPC endpoint
To encrypt connections for your AWS Glue Data Catalog, be sure that you have a route to AWS Key Management Service (AWS KMS). For example, the route can be an AWS KMS VPC interface endpoint. For more information, see Connecting to AWS KMS through a VPC endpoint.
Check whether the AWS Glue connection and the database use different VPCs
Your test connection fails with a timeout error when the following conditions are true:
- The database isn't publicly accessible.
- You attached the AWS Glue job to a connection that uses a different VPC without VPC peering.
To resolve a test connection failure, create a dedicated AWS Glue VPC and set up the associated VPC peering with your other VPCs.
Check the connectivity to the on-premises data store
To check the AWS Glue connection to an on-premises database that connects to an Amazon Elastic Compute Cloud (Amazon EC2) instance, run the following commands:
$ telnet hostname port
$ nc -zv hostname port
$ dig hostname
$ traceroute -AnT -p IP port
Check your VPN and your VPC, subnet, security group, and network ACLs configurations. Be sure that the configurations don't block the connectivity from VPC to your on-premises database or create firewall issues from the on-premises database. For more information, see How to access and analyze on-premises data stores using AWS Glue.
Authentication issues
Choose the correct IAM role
The AWS Identity and Access Management (IAM) role that you select for the test connection must have a trust relationship with AWS Glue. Choose a service-linked role that has the AWSGlueServiceRole policy attached to it to create the trust relationship.
Check the connection's IAM role
If you encrypted the connection password with AWS KMS, then check that the connection's IAM role allows the kms:Decrypt action for the key. For more information, see Setting up encryption in AWS Glue.
Check the connection logs
Check the logs for error messages. You can find logs from test connections in Amazon CloudWatch Logs under /aws-glue/testconnection/output.
Check the SSL settings
If the data store requires SSL connectivity for the specified user, then select Require SSL connection when you create the connection on the console. Select this option only when the data store supports SSL.
Check the JDBC username and password
Users must have sufficient permissions to access the Java Database Connectivity (JDBC) data store. For example, AWS Glue crawlers require the SELECT permission. A job that writes to a data store requires INSERT, UPDATE, and DELETE permissions.
Check the JDBC URL syntax
Syntax requirements vary by database engine. For more information, see AWS Glue JDBC connection properties and review the examples under JDBC URL.
Additional tips
Check the connection type
Be sure to choose the correct connection type. When you choose Amazon RDS or Amazon Redshift for Connection type, AWS Glue auto populates the VPC, subnet, and security group.
The test connection feature only works for MySQL 5.x versions. The built-in AWS Glue JDBC driver doesn't support MySQL version 8. If you test the connection against a MySQL version that's later than version 5.x, then you might get a connection timeout error. Use your AWS Glue connection to connect to MySQL version 8. Enter the compatible driver Java Archive (JAR) for MySQL versions 8 and later to use the connection on an extract, load, and transform (ETL) job. Then, load the JAR file into your job. For more information, see Connection types and options for ETL in AWS Glue for Apache Spark.
Verify that DNS isn't causing the issues
To verify that DNS isn't causing the issues, use the data store's public or private IP address as the JDBC URL for the AWS Glue connection. Clear the Require SSL connection field because a domain name is no longer used.
Check whether the driver is incompatible
Provide the correct driver as an extra JAR file in the job properties along with the failed connection name. When you specify the connection name as a job property, AWS Glue uses the connection's network settings, such as the VPC and subnets. Create the Spark DataFrame with the JAR file in the job properties to override the default AWS Glue data store drivers.
You can also convert the DataFrame into an AWS Glue DynamicFrame. For more information, see fromDF.
Check whether the JDBC data store is publicly accessible
Use MySQL Workbench and the JDBC URL to connect to the data store. Or, launch an Amazon EC2 instance that has SSH access to the same subnet and security groups that you use for the connection. Then, use SSH to connect to the instance, and run the following command to test connectivity:
dig hostname$ nc -zv hostname port
Related information
Troubleshooting Spark errors