I have a json file in s3 (sample below) in json lines format. I create a crawler in aws glue to read this file, which creates a table definition and produces a table schema as such ,
schema:
# column_name Data type
1 product_code String
2. name String
I plan to read this schema and data via a script , something like this, read the catalog into a spark dataframe and write that dataframe to RDS (postgresql) .
when writing to RDS , is there a setting that i can set , such that it will generate primary key (auto incrementing) and write to the database. or if i can generate a key , while crawler creates the schema and then write it to database , that would also work.
any suggestions?
some working examples would help
schema = glueContext.create_dynamic_frame.from_catalog(database='product-catalog' ...)
spark_dataframe = schema.toDF()
#write to RDS
spark_dataframe.write.format("jdbc").option
{"product_code" : "electornics" , "name": "TV" , ...}
{"product_code": "Kitchen" , "name" : "oven" , ....}
@Gonzalo - thanks. apologize in advance for this, being new to this , are there any example you can point me to for the second option.