Hi, i am reading a table from MySQL RDS and trying to write it to Redshift. Glue write_dynamic_frame.from_options writes the table data implicitly into temporary folder before trying to insert it to redshift.
The job fails , it says that 'unknow boolean format' when i check the error from redshift stl_load_errors table.
I went to check those files in the temporary folder, the issue is that those files are written in CSV format, when i opened one of them , the data in some rows are mixed between the columns, thus, having some text data in the boolean column.
is there a way to change how write_dynamic_frame.from_options writes data to temporary folder? maybe changing csv to another format like json or parquet , something that does not miss up the data.
i tried adding the (format= "json") to the wirte_dynamic_frame.from_options but it gave an error (com.amazonaws.services.glue.util.NonFatalException: Formats not supported for SparkSQL data sinks. Got json).
then i tried adding the "format":"json" in the connection_options parameter but it gave me an error (TypeError: getSink() got multiple values for argument 'format') although format is not specified anywhere else.
code below:
Thanks
AmazonRedshift_node3=glueContext.write_dynamic_frame.from_options(
frame= dyn_frame,
connection_type="redshift",
connection_options= {"redshiftTmpDir":"s3://aws-glue-assets-8942440245-eu-west-1/temporary/",
"useConnectionProperties":"true",
"dbtable":"dev.udt_order_temp",
"connectionName":"Glue-to-redshift",
"aws_iam_user":"arn:aws:iam:80423242422:role/glue_iam",
},
transformation_ctx="AmazonRedshift_node3
)
Data in the dynamic data frame is accurate , i inspected the distinct values of the boolean fields which gives true or false (no need for me to correct anything here). But once the write_dynamic_frame.from_options job starts, it writes the data into temporary folder in csv, the data in the csv is not accurate (this is the thing i cannot control). I opened one csv file and data is not appearing in the correct columns for some rows (some rows have text in the boolean field).
That's odd, I wonder if something is breaking the csv or the DynamicFrame is not really correct, have you checked the schema to make sure there are no choices?, what happens if you print/get the specific rows that have issues in the csv