load csv from s3 to Aurora mysql using mysql JDBC driver

0

Hi team,

I'm creating an AWS glue job to load data from CSV file on S3 to Aurora MySQL 8 as DB,

I'm using a custom JDBC driver because as I understood glue connection doesn't support MySQL 8.

please is there an example of script how to load data from S3 to RDS (NOT RDS to RDS) ?

I found this helpful link: https://aws.amazon.com/blogs/big-data/building-aws-glue-spark-etl-jobs-by-bringing-your-own-jdbc-drivers-for-amazon-rds/

but it loads from RDS to RDS not sure how to use the same logic to load from S3 to RDS

I'm currently using this code but it doesn't work the glue job stop with this error :

An error occurred while calling o96.pyWriteDynamicFrame. The specified bucket does not exist (Service: Amazon S3; Status Code: 404; Error Code: NoSuchBucket; Request ID: HJsdDCNsdP96DsdT; S3 Extended Request ID: Fvv72asdLoYsdKCUT9UndlsdRosdfgddup+niZem3RP3sXo4Gp0Fsd5H6sd8TrKMysdanEk=; Proxy: null)

code used :

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
spark.read.option("escapeChar","®")

connection_mysql8_options = {
    "url": "jdbc:mysql://databhddfd8bb-180ewlrdhdfhi3ew.cluster-cqdhdfhddvbc.region.rds.amazonaws.com:3306/mydb",
    "dbtable": "mydbTable",
    "user": "root",
    "password": "WsdtbasdLjZasdVrsadtgGHDNJasd,
    "customJdbcDriverS3Path": "s3://myBucket/mysql-connector-java-8.0.28.jar",
    "customJdbcDriverClassName": "com.mysql.cj.jdbc.Driver"}

datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "csv_db", table_name = "mytable_csv", transformation_ctx = "datasource0")

applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("id", "string", "id", "string"), ("created", "string", "created", "timestamp"), .....], transformation_ctx = "applymapping1")

selectfields2 = SelectFields.apply(frame = applymapping1, paths = ["created", "id", .....], transformation_ctx = "selectfields2")


##datasink5 = glueContext.write_dynamic_frame.from_options(frame = selectfields2, connection_type="mysql",  connection_options=connection_mysql8_options, transformation_ctx = "datasink5")

datasink5 = glueContext.write_dynamic_frame.from_jdbc_conf(frame = selectfields2, catalog_connection = "myaccount-rds-edwextract-connection", connection_options = { 
"customJdbcDriverS3Path": "s3://myBucket/mysql-connector-java-8.0.21.jar", 
"customJdbcDriverClassName": "com.mysql.cj.jdbc.Driver", 
"user": "root", 
"password": "GJDMTRasdasdassdasd1AtsdasdLasdadasd", 
"url": "jdbc:mysql://daafb269d8bb-1asd0ewasfasfew.cluster-cqtsafasf.region.rds.amazonaws.com:3306/mydb", 
"connectionType": "mysql", 
"dbtable": "mydbTable", 
"database": "mydb"}, transformation_ctx = "datasink5")

job.commit()

would like to know what is the correct syntax to load from CSV in S3 to amazon aurora mt SQL 8 via JDBC driver

Thank you!!!

1 Answer
1

Hello AWS Customer,

For loading data from S3 to Aurora, you can use the 'LOAD DATA FROM S3' statement in your code. You can refer this document: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.LoadFromS3.html but there does not happen to be an exact script available using JDBC.

For error that you received, 'The specified bucket does not exist', It can be for multiple reasons like the bucket name can be mistaken, IAM permissions missing, path provided is incorrect, the bucket region is incorrectly specified and unappropriate IAM permissions for cross region setup, etc. it would be great if you could open a case with Premium support so the support team can look at the s3/glue logs internally to see where the specific issue could be in real time.

AWS
Kunal_M
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions