- Newest
- Most votes
- Most comments
Is the data already present in the MySQL table?
AWS Glue is based on Spark, as you know, 1 node is the Spark driver and the other nodes are hosting the executors (doing the real works), in you case with just 2 workers you have only 1 executor in your cluster.
Even with multiple executor, spark partitions the data without replicating it this means that even parallel writes will never try to enter the same data twice, unless as already mentioned your code has introduced the duplicates.
Glue works in Append mode only. no updates, have you checked if that key is, by any chance, already present in the target database?
hope this helps,
Are you doing some transformation, join, which can produce duplicate data? Is it happening to all the keys or single key only. I would suggest to remove PK or unique constraint and let the job complete once. Then you can evaluate in RDS is it true duplicate or some transformation is generating duplicate
no, it's direct mapping no transformation nor join it happened on a single key each run
Relevant content
- asked 3 years ago
- asked 3 years ago
- asked 2 years ago
- AWS OFFICIALUpdated 4 years ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 2 years ago
I truncate my SQL table every time and make sure that's empty before each run. I tried with 2 workers and 4 workers always the same error. the previous file passes successfully this file cause always this error from glue
@Jess could you please pose an anonymized snippet of the code generated? I have not experienced that and not sure how to reproduce it.
@Jess could you please pose an anonymized snippet of the code generated? I have not experienced that and not sure how to reproduce it.