How do I route traffic for an AWS Glue ETL job to a database through a static IP address?

5 minute read
0

I want to allowlist a specific public or private IP address in an on-premises or Amazon Relational Database Service (Amazon RDS) database. When I use an AWS Glue extract-transform-and-load (ETL) job, I can’t access the database.

Resolution

An AWS Glue ETL job has multiple workers, and the worker's IP addresses change for each job that you run. To route AWS Glue ETL job traffic through a static IP address, add a AWS Glue connection to route the worker's traffic to the database through a NAT gateway. You can then add the NAT gateway's IP address to the allowlist. When the target database IP address is public, add the NAT gateway's public IP address to the allowlist. If the target database IP address is private, then add the NAT gateway's private IP address to the database's allowlist.

Assign a static public IP address

If you can allowlist only one static public IP address on the database side, then use a public IP address to reach the on-premises database. Then, create the AWS Glue connection in a private subnet, and route the traffic to a NAT gateway in a public subnet. Traffic to the on-premises database uses the public IP address of the NAT gateway.

In the following example configuration, the virtual private cloud (VPC) has two subnets:

ItemCIDR
VPC172.31.0.0/16
subnet_1172.31.0.0/20
subnet_2172.31.16.0/20

The subnets use a default route table with the following routes:

DestinationTarget
172.31.0.0/16local
0.0.0.0/0igw-xxxxxxxxxxxx

To route the traffic to an on-premises database through a static public IP address for an AWS Glue ETL job, complete the following steps:

1.     Create a NAT gateway with a public IP address in subnet_1.
Note: In this example, the NAT gateway ID is nat-xxxxxxxxxx1.

2.     Create a new route table with the following routes, and then associate it to subnet_2.

DestinationTarget
172.31.0.0/16local
0.0.0.0/0nat-xxxxxxxxxx1

3.     Create a security group with a self-referencing inbound rule. The security group must allow all outbound traffic for destination 0.0.0.0/0.

4.    To connect to the on-premises database, add an AWS Glue connection. For the network setting, choose subnet_2 and the security group that you created.

5.    Add the public IP address of the NAT gateway to the allowlist for the on-premises database.

6.    Attach the AWS Glue connection to a target AWS Glue ETL job. On the AWS Glue console, for Job details, enter your connection details for the AWS Glue ETL job.

Assign a static private IP address

If you can allowlist only one private IP address to connect to a database, then use a private IP address to reach the database. For example, to connect to an Amazon RDS database through the same VPC, map the Amazon RDS endpoint domain name to a private IP address. Create a private NAT gateway to route AWS Glue ETL job traffic. Then, allowlist the NAT gateway's private IP address for the database.

Note: The following is an example configuration of a database that's hosted on an Amazon RDS instance.

In this example, the VPC has two subnets that belong to the Amazon RDS instance's subnet group.

ItemCIDR
VPC172.31.0.0/16
subnet_1172.31.0.0/20
subnet_2172.31.16.0/20

The subnets use a default route table with the following routes. In the route table, the prefix list is for an Amazon Simple Storage Service (Amazon S3) gateway endpoint.

DestinationTarget
172.31.0.0/16local
pl-xxxxxvpce-xxxxxxxx

1.    Create a NAT gateway with a private IP address in subnet_1.
Note: In this example, the NAT gateway ID is nat-xxxxxxxxxx1.

2.    Create a new private subnet with a CIDR range of 172.31.32.0/20.

3.    Create a new route table with the following routes. Associate it with the private subnet that you created.

DestinationTarget
172.31.0.0/16local
172.31.0.0/20nat-xxxxxxxxxx1
172.31.16.0/20nat-xxxxxxxxxx1
pl-xxxxxvpce-xxxxxxxx

4.    Create a security group with a self-referencing inbound rule. The security group must allow all outbound traffic for destination 0.0.0.0/0.

5.    To connect to the Amazon RDS database, create a connection. For the network setting, choose the subnet and security group that you created.

6.    Add the private IP address of the NAT gateway to the inbound rule for the Amazon RDS database security group.

7.    Attach the AWS Glue connection that you created to a AWS Glue ETL job.

Note: When you provision a NAT gateway, you're charged for each hour that your NAT gateway is available. You're also charged for each gigabyte of data that it processes. For more information, see Amazon VPC pricing.

Related information

Defining connections in the AWS Glue Data Catalog

NAT gateways

Control traffic to your AWS resources using security groups

AWS OFFICIAL
AWS OFFICIALUpdated 8 months ago