- Newest
- Most votes
- Most comments
Based on the information provided, it appears you're experiencing connectivity issues between AWS Glue and Amazon Redshift when trying to move data from RDS (PostgreSQL) to Redshift across different VPCs. Here are some potential causes and solutions to consider:
-
VPC Configuration: Even though you have a peering connection established between the VPCs, ensure that the route tables in both VPCs are properly configured to route traffic through the peering connection. Check if the CIDR blocks of both VPCs are added to each other's route tables.
-
Security Group Configuration: While you've allowed all traffic in the security groups of both Redshift and RDS, make sure that the security group associated with the AWS Glue job (in the VPC where Glue is running) also allows outbound traffic to Redshift's port (typically 5439).
-
Network ACLs: Check if there are any Network ACLs in place that might be blocking the traffic between the VPCs or to/from AWS Glue.
-
Redshift Publicly Accessible: If your Redshift cluster is in a VPC, ensure it's configured to be publicly accessible. This is required for Glue to connect to it.
-
IAM Role Permissions: Verify that the IAM role used by the Glue job has the necessary permissions to access both the RDS and Redshift resources.
-
Temporary Credentials: When moving data to and from Amazon Redshift, AWS Glue uses temporary Amazon S3 credentials that expire after 1 hour. For long-running jobs, this might cause failures. Consider breaking your job into smaller chunks or implementing a mechanism to refresh the credentials.
-
Connection Configuration: Double-check the connection details in your AWS Glue job, ensuring the endpoint, port, database name, and credentials for Redshift are correct.
-
Network Connectivity: The "connect timed out" error suggests a network connectivity issue. Try running a network connectivity test from the Glue VPC to the Redshift cluster to ensure there are no network-level problems.
If these steps don't resolve the issue, you may need to engage AWS support for further investigation, as there could be more complex networking issues at play.
Sources
Troubleshooting errors Spark errors - AWS Glue
Troubleshooting Amazon Redshift - Amazon Data Firehose
Connecting to Amazon Redshift in AWS Glue Studio - AWS Glue
Hello.
To resolve the issue of transferring data from RDS to Redshift through AWS Glue please follow the below steps it would be really helpful for you and also resolved.
1. Verify VPC Peering
- Check Peering Status Ensure the VPC peering connection is active.
- Route Tables Make sure the route tables in both VPCs have routes to the CIDR block of the other VPC.
2. Check Security Groups
RDS Security Group
- Allow inbound traffic on port 5432 PostgreSQL.
Redshift Security Group
- Allow inbound traffic on port 5439 Redshift.
- All Traffic Make sure you’ve allowed all traffic but be specific if needed.
Review Network ACLs
- Allow Traffic Check that the Network ACLs allow inbound and outbound traffic for the required ports.
Test Connectivity
- Use an EC2 Instance: Launch a temporary EC2 instance in the Glue job's VPC to test connectivity.
telnet <RDS-Endpoint> 5432 telnet <Redshift-Endpoint> 5439
- Ensure you can connect to both databases.
Check Glue Job Configuration
- VPC Settings Ensure your Glue job is running in the correct VPC with access to both RDS and Redshift.
- Subnet and Security Group Verify the correct subnet and security group settings in the Glue job.
Logs Review
- CloudWatch Logs Check the CloudWatch logs for the Glue job for any additional error messages related to the timeout.
Retry the Job
- Run the Glue Job Again After making the above changes, retry your Glue job to see if the issue is resolved.
Relevant content
- asked 3 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago