Skip to content

Implementing SSL Connections to RDS Oracle DB in AWS Glue: A Guide for DynamicFrame and DataFrame

5 minute read
Content level: Advanced
0

This article provides a comprehensive guide for establishing secure SSL connections to RDS Oracle databases from AWS Glue jobs. We'll explore both DynamicFrame and DataFrame approaches, covering the necessary configuration steps and solutions to ensure successful SSL connectivity.

Connecting AWS Glue jobs to RDS Oracle databases using SSL requires proper configuration of connection parameters. This guide demonstrates how to implement secure connections using both Spark DataFrame and Glue DynamicFrame approaches, with special attention to SSL certificate configuration and connection string formatting.

Connection Mechanism Overview

The Oracle connection flow follows this path:

  1. Glue job (act as client) sends a connection request to the Oracle TNS listener
  2. Oracle Listener (TNS) processes the request
  3. Oracle Listener forwards the connection to the Oracle Database

When troubleshooting connection issues, also check the Oracle TNS logs along with database alert logs, as connection failures often occur during the ssl negotiation phase on Oracle TNS Listener.

Create the Glue connection for Oracle database :

  • In the AWS Glue console, create or update your Oracle connection.
  • Add the JDBC and networking details of the database.
  • Enable Enforce SSL or Require SSL connection
  • Provide S3 path for Custom JDBC certificate if using different certificate then RDS bundle. (Choose your X.509 certificate. Must be DER-encoded Base64 PEM format.)
  • Add the SSL_SERVER_CERT_DN in the "Custom JDBC certificate string" parameter:
    C=US,ST=Washington,L=Seattle,O=Amazon.com,OU=RDS,CN=your-rds-endpoint.region.rds.amazonaws.com
    
  • Create or save changes.
  • Add this connection to Glue ETL Job.
  • Test connection may fail as connection uses the 11.2 driver oracle version whereas 12.x to 23.x version Oracle database required updated driver, however it will work with Glue jobs.

Method 1: Using DynamicFrame Approach

  1. Set up the basic Glue job structure:

    import sys
    from awsglue.transforms import *
    from awsglue.utils import getResolvedOptions
    from pyspark.context import SparkContext
    from awsglue.context import GlueContext
    from awsglue.job import Job
    
    args = getResolvedOptions(sys.argv, ['JOB_NAME'])
    sc = SparkContext()
    glueContext = GlueContext(sc)
    spark = glueContext.spark_session
    job = Job(glueContext)
    job.init(args['JOB_NAME'], args)
  2. Create the DynamicFrame with connection options:

    query='select * from demo_table'
    
    OracleSQL_node = glueContext.create_dynamic_frame.from_options(
        connection_type = "oracle",
        connection_options = {
            "useConnectionProperties": "true",
            "dbtable": "demo_table",
            "connectionName": "Oracle connection",
            "query": query
        },
        transformation_ctx = "OracleSQL_node"
    )
    OracleSQL_node.show()

Method 2: Using DataFrame Approach

To connect using Spark DataFrame with SSL:

  1. Extract connection details from Glue connection:

    connection_name = "Oracle connection"
    conn_details = glueContext.extract_jdbc_conf(connection_name)
    username = conn_details.get("user")
    password = conn_details.get("password")
  2. Construct the proper JDBC URL with SSL parameters:

    jdbc_url = """jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCPS)(HOST=your-rds-endpoint.region.rds.amazonaws.com)(PORT=1688))(CONNECT_DATA=(SERVICE_NAME=DBSERVICENAME))(SECURITY = (SSL_SERVER_CERT_DN = "C=US,ST=Washington,L=Seattle,O=Amazon.com,OU=RDS,CN=your-rds-endpoint.region.rds.amazonaws.com"))))"""
  3. Create the DataFrame connection:

    query='select * from demo_table'
    data = glueContext.spark_session.read \
        .format("jdbc") \
        .option("url", jdbc_url) \
        .option("user", username) \
        .option("password", password) \
        .option("query", query) \
        .option("driver", "oracle.jdbc.driver.OracleDriver") \
        .load()

Please update the following placeholders in jdbc_url:

  • Replace DBSERVICENAME with your actual database service name
  • Replace default port number if you're using a different port
  • Replace your-rds-endpoint.region.rds.amazonaws.com with your actual RDS endpoint (for example: mydb.123456789012.us-east-1.rds.amazonaws.com)
  • Make sure to use TCPS protocol with its corresponding port and configure the server's DN for mutual authentication in SSL_SERVER_CERT_DN.
  • In JDBC url, we are using Oracle TNS string and it must match exactly same present in the Oracle database Server side tnsnames.ora [Set the Database Connection String].
    TNS string : (DESCRIPTION=(ADDRESS=(PROTOCOL=TCPS)(HOST=your-rds-endpoint.region.rds.amazonaws.com)(PORT=1688))(CONNECT_DATA=(SERVICE_NAME=DBSERVICENAME))(SECURITY = (SSL_SERVER_CERT_DN = "C=US,ST=Washington,L=Seattle,O=Amazon.com,OU=RDS,CN=your-rds-endpoint.region.rds.amazonaws.com"))))
    

Driver Compatibility

When connecting to Oracle databases from AWS Glue, it's important to use the appropriate JDBC driver version for compatibility. Below is the compatibility matrix for Oracle JDBC drivers across different AWS Glue versions:

ProductJDBC driver versions for Glue 5.0JDBC driver versions for Glue 4.0JDBC driver versions for Glue 3.0JDBC driver versions for Glue 0.9, 1.0, 2.0
Oracle Database23.3.0.23.0921.721.111.2

Important Note: Test connections may fail when using the default 11.2 driver version with Oracle database versions 12.x to 23.x, as these newer database versions require updated drivers. While Glue jobs can still connect to and read from Oracle databases due to updated JARs in the job runtime.

For guidance on selecting the correct Oracle JDBC driver for your specific database version, refer to the Oracle JDBC FAQ.

Conclusion

Successfully implementing SSL connections to RDS Oracle databases from AWS Glue requires careful attention to several key factors:

  • Proper TNS string configuration - The connection string must include the correct PROTOCOL=TCPS parameter and match the Oracle server-side tnsnames.ora configuration.
  • SSL certificate validation - The SSL_SERVER_CERT_DN parameter must be correctly specified to match the RDS Oracle instance's certificate.
  • Driver compatibility - Ensure your Oracle JDBC driver version is compatible with both your JDK version and Oracle database version.
  • Connection method differences - While both DataFrame and DynamicFrame approaches can work with SSL, they require slightly different configurations.

By following the steps outlined in this guide, you can establish secure SSL connections to your RDS Oracle databases from AWS Glue jobs, ensuring data is transmitted securely while maintaining compliance with security requirements.

AWS
EXPERT
published 7 months ago505 views