how to create index/primary key in aws glue?

0

I have a json file in s3 (sample below) in json lines format. I create a crawler in aws glue to read this file, which creates a table definition and produces a table schema as such ,

schema:

#        column_name     Data type

1        product_code         String
2.        name                      String

I plan to read this schema and data via a script , something like this, read the catalog into a spark dataframe and write that dataframe to RDS (postgresql) .

when writing to RDS , is there a setting that i can set , such that it will generate primary key (auto incrementing) and write to the database. or if i can generate a key , while crawler creates the schema and then write it to database , that would also work.

any suggestions? some working examples would help

schema = glueContext.create_dynamic_frame.from_catalog(database='product-catalog' ...) 

spark_dataframe = schema.toDF()

#write to RDS 
spark_dataframe.write.format("jdbc").option 
{"product_code" : "electornics" , "name": "TV" , ...}
{"product_code": "Kitchen" , "name" : "oven" , ....}
1 Answer
0

That's tricky because Spark avoid features that are DB specific and it does distributed writing.
I can think of two solutions:

  • Generate the sequence in Spark if you are ok having gaps and having to use a 64bit number: monotonically_increasing_id
  • Assign the sequence value using a DB trigger instead of passing the id from Spark
profile pictureAWS
EXPERT
answered 24 days ago
  • @Gonzalo - thanks. apologize in advance for this, being new to this , are there any example you can point me to for the second option.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions