AWS GLUE - java.sql.SQLIntegrityConstraintViolationException: Duplicate entry Error

0

I was doing a POC using Glue to Migrate data from RDS MySql to RDS Postgres. I have created Connectors to both source and target, and a crawler which connected to source. Then created a job and tried to migrate data with out any transformation and started getting java.sql.SQLIntegrityConstraintViolationException: Duplicate entry error

java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat com.mysql.cj.util.Util.handleNewInstance(Util.java:192)\n\tat com.mysql.cj.util.Util.getInstance(Util.java:167)\n\tat com.mysql.cj.util.Util.getInstance(Util.java:174)\n\tat com.mysql.cj.jdbc.exceptions.SQLError.createBatchUpdateException(SQLError.java:224)\n\tat com.mysql.cj.jdbc.ServerPreparedStatement.executeBatchSerially(ServerPreparedStatement.java:385)\n\tat com.mysql.cj.jdbc.ClientPreparedStatement.executeBatchInternal(ClientPreparedStatement.java:435)\n\tat com.mysql.cj.jdbc.StatementImpl.executeBatch(StatementImpl.java:794)\n\tat org.apache.spark.sql.jdbc.glue.GlueJDBCSink$.saveBatch(GlueJDBCSink.scala:400)\n\tat org.apache.spark.sql.jdbc.glue.GlueJDBCSink$.$anonfun$saveTable$4(GlueJDBCSink.scala:77)\n\tat org.apache.spark.sql.jdbc.glue.GlueJDBCSink$.$anonfun$saveTable$4$adapted(GlueJDBCSink.scala:77)\n\tat org.apache.spark.sql.jdbc.glue.GlueJDBCSink$.savePartition(GlueJDBCSink.scala:261)\n\tat org.apache.spark.sql.jdbc.glue.GlueJDBCSink$.$anonfun$saveTable$3(GlueJDBCSink.scala:77)\n\tat org.apache.spark.sql.jdbc.glue.GlueJDBCSink$.$anonfun$saveTable$3$adapted(GlueJDBCSink.scala:76)\n\tat org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1020)\n\tat org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1020)\n\tat org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2278)\n\tat org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)\n\tat org.apache.spark.scheduler.Task.run(Task.scala:131)\n\tat org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)\n\tat org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)\n\tat org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat ****java.lang.Thread.run(Thread.java:750)\nCaused by: java.**sql.SQLIntegrityConstraintViolationException: Duplicate entry '00002b0b-1a34-3319-b003-fb073fb8248d' for key 'transformer_fragment.PRIMARY'\n\tat ******com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:117)\n\tat com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)\n\tat com.mysql.cj.jdbc.ServerPreparedStatement.serverExecute(ServerPreparedStatement.java:637)\n\tat com.mysql.cj.jdbc.ServerPreparedStatement.executeInternal(ServerPreparedStatement.java:418)\n\tat com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdateInternal(ClientPreparedStatement.java:1092)\n\tat com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdateInternal(ClientPreparedStatement.java:1040)\n\tat com.mysql.cj.jdbc.ServerPreparedStatement.executeBatchSerially(ServerPreparedStatement.java:357)\n\t... 19 more\n","Accumulator Updates":[{"ID":130,"Update":"105","Internal":false,"Count Failed Values":true},{"ID":132,"Update":"0","Internal":false,"Count Failed Values":true},{"ID":139,"Update":"0","Internal":false,"Count Failed Values":true},{"ID":140,"Update":"1","Internal":false,"Count Failed Values":true},{"ID":141,"Update":"0","Internal":false,"Count Failed Values":true},{"ID":142,"Update":"0","Internal":false,"Count Failed Values":true},{"ID":143,"Update":"1135","Internal":false,"Count Failed Values":true},{"ID":144,"Update":"0","Internal":false,"Count Failed Values":true},{"ID":145,"Update":"1","Internal":false,"Count Failed Values":true}]},"Task Info":{"Task ID":13,"Index":5,"Attempt":1,"Launch Time":1655403050036,"Executor ID":"1","Host":"172.31.22.88","Locality":"NODE_LOCAL","Speculative":false,"Getting Result Time":0,"Finish Time":1655403050162,"Failed":true,"Killed":false,"Accumulables":

The following are the values I have used while creating the job, Type : Spark Glue Version - Glue 3.0 Supports spark3.1, Scala2, Python3 Language - Python3 Worker Type - G.1X Automatically Scale the number of workers - False Requested number of workers - 2 Generate Job Insights - True Job Bookmark - Enable Number of retries - 1 job timeout - 2880

Really appreciate any help !

asked 2 years ago71 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions