Change file format written to temp folder when using glue write_dynamic_frame.from_options

0

Hi, i am reading a table from MySQL RDS and trying to write it to Redshift. Glue write_dynamic_frame.from_options writes the table data implicitly into temporary folder before trying to insert it to redshift. The job fails , it says that 'unknow boolean format' when i check the error from redshift stl_load_errors table.

I went to check those files in the temporary folder, the issue is that those files are written in CSV format, when i opened one of them , the data in some rows are mixed between the columns, thus, having some text data in the boolean column.

is there a way to change how write_dynamic_frame.from_options writes data to temporary folder? maybe changing csv to another format like json or parquet , something that does not miss up the data.

i tried adding the (format= "json") to the wirte_dynamic_frame.from_options but it gave an error (com.amazonaws.services.glue.util.NonFatalException: Formats not supported for SparkSQL data sinks. Got json).

then i tried adding the "format":"json" in the connection_options parameter but it gave me an error (TypeError: getSink() got multiple values for argument 'format') although format is not specified anywhere else.

code below: Thanks

AmazonRedshift_node3=glueContext.write_dynamic_frame.from_options(
frame= dyn_frame,
connection_type="redshift",
connection_options= {"redshiftTmpDir":"s3://aws-glue-assets-8942440245-eu-west-1/temporary/",
                                      "useConnectionProperties":"true",
                                      "dbtable":"dev.udt_order_temp",
                                       "connectionName":"Glue-to-redshift",
                                       "aws_iam_user":"arn:aws:iam:80423242422:role/glue_iam",
},
transformation_ctx="AmazonRedshift_node3
)

asked a year ago592 views
1 Answer
0

No, you cannot change that because the "COPY" command on Redshift will expect CSV.
What you can do is fix the data before writing so it respects the format and Redshift can load it.

profile pictureAWS
EXPERT
answered a year ago
  • Data in the dynamic data frame is accurate , i inspected the distinct values of the boolean fields which gives true or false (no need for me to correct anything here). But once the write_dynamic_frame.from_options job starts, it writes the data into temporary folder in csv, the data in the csv is not accurate (this is the thing i cannot control). I opened one csv file and data is not appearing in the correct columns for some rows (some rows have text in the boolean field).

  • That's odd, I wonder if something is breaking the csv or the DynamicFrame is not really correct, have you checked the schema to make sure there are no choices?, what happens if you print/get the specific rows that have issues in the csv

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions