AWS GLUE - java.sql.SQLIntegrityConstraintViolationException: Duplicate entry Error

I was doing a POC using Glue to Migrate data from RDS MySql to RDS Postgres. I have created Connectors to both source and target, and a crawler which connected to source. Then created a job and tried to migrate data with out any transformation and started getting java.sql.SQLIntegrityConstraintViolationException: Duplicate entry error

java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat com.mysql.cj.util.Util.handleNewInstance(Util.java:192)\n\tat com.mysql.cj.util.Util.getInstance(Util.java:167)\n\tat com.mysql.cj.util.Util.getInstance(Util.java:174)\n\tat com.mysql.cj.jdbc.exceptions.SQLError.createBatchUpdateException(SQLError.java:224)\n\tat com.mysql.cj.jdbc.ServerPreparedStatement.executeBatchSerially(ServerPreparedStatement.java:385)\n\tat com.mysql.cj.jdbc.ClientPreparedStatement.executeBatchInternal(ClientPreparedStatement.java:435)\n\tat com.mysql.cj.jdbc.StatementImpl.executeBatch(StatementImpl.java:794)\n\tat org.apache.spark.sql.jdbc.glue.GlueJDBCSink$.saveBatch(GlueJDBCSink.scala:400)\n\tat org.apache.spark.sql.jdbc.glue.GlueJDBCSink$.$anonfun$saveTable$4(GlueJDBCSink.scala:77)\n\tat org.apache.spark.sql.jdbc.glue.GlueJDBCSink$.$anonfun$saveTable$4$adapted(GlueJDBCSink.scala:77)\n\tat org.apache.spark.sql.jdbc.glue.GlueJDBCSink$.savePartition(GlueJDBCSink.scala:261)\n\tat org.apache.spark.sql.jdbc.glue.GlueJDBCSink$.$anonfun$saveTable$3(GlueJDBCSink.scala:77)\n\tat org.apache.spark.sql.jdbc.glue.GlueJDBCSink$.$anonfun$saveTable$3$adapted(GlueJDBCSink.scala:76)\n\tat org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1020)\n\tat org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1020)\n\tat org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2278)\n\tat org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)\n\tat org.apache.spark.scheduler.Task.run(Task.scala:131)\n\tat org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)\n\tat org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)\n\tat org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat ****java.lang.Thread.run(Thread.java:750)\nCaused by: java.**sql.SQLIntegrityConstraintViolationException: Duplicate entry '00002b0b-1a34-3319-b003-fb073fb8248d' for key 'transformer_fragment.PRIMARY'\n\tat ******com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:117)\n\tat com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)\n\tat com.mysql.cj.jdbc.ServerPreparedStatement.serverExecute(ServerPreparedStatement.java:637)\n\tat com.mysql.cj.jdbc.ServerPreparedStatement.executeInternal(ServerPreparedStatement.java:418)\n\tat com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdateInternal(ClientPreparedStatement.java:1092)\n\tat com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdateInternal(ClientPreparedStatement.java:1040)\n\tat com.mysql.cj.jdbc.ServerPreparedStatement.executeBatchSerially(ServerPreparedStatement.java:357)\n\t... 19 more\n","Accumulator Updates":[{"ID":130,"Update":"105","Internal":false,"Count Failed Values":true},{"ID":132,"Update":"0","Internal":false,"Count Failed Values":true},{"ID":139,"Update":"0","Internal":false,"Count Failed Values":true},{"ID":140,"Update":"1","Internal":false,"Count Failed Values":true},{"ID":141,"Update":"0","Internal":false,"Count Failed Values":true},{"ID":142,"Update":"0","Internal":false,"Count Failed Values":true},{"ID":143,"Update":"1135","Internal":false,"Count Failed Values":true},{"ID":144,"Update":"0","Internal":false,"Count Failed Values":true},{"ID":145,"Update":"1","Internal":false,"Count Failed Values":true}]},"Task Info":{"Task ID":13,"Index":5,"Attempt":1,"Launch Time":1655403050036,"Executor ID":"1","Host":"172.31.22.88","Locality":"NODE_LOCAL","Speculative":false,"Getting Result Time":0,"Finish Time":1655403050162,"Failed":true,"Killed":false,"Accumulables":

The following are the values I have used while creating the job, Type : Spark Glue Version - Glue 3.0 Supports spark3.1, Scala2, Python3 Language - Python3 Worker Type - G.1X Automatically Scale the number of workers - False Requested number of workers - 2 Generate Job Insights - True Job Bookmark - Enable Number of retries - 1 job timeout - 2880

Really appreciate any help !

Themen

Analysen

Relevanter Inhalt

Warum fehlen einige meiner AWS Glue-Tabellen in Athena?
AWS OFFICIALAktualisiert vor einem Jahr
Wie behebe ich den Fehler „The specified queue does not exist or you do not have access to it.“, wenn ich einen AWS Glue Job ausführe, um Nachrichten an Amazon SQS in einer anderen Region zu senden?
AWS OFFICIALAktualisiert vor 3 Jahren
Warum ist meine Secrets Manager Lambda-Rotationsfunktion mit dem Fehler „Database engine must be set to 'postgres'/'mysql“ fehlgeschlagen?
AWS OFFICIALAktualisiert vor 10 Monaten
Wie behebe ich den Fehler „JavasqlSQLException: No more data to read from socket“ (Keine weiteren Daten aus dem Socket ablesbar), wenn ich versuche, eine Verbindung zu meiner Amazon-RDS-for-Oracle-Instance herzustellen?
AWS OFFICIALAktualisiert vor 2 Jahren