AWS GLUE - java.sql.SQLIntegrityConstraintViolationException: Duplicate entry Error

0

I was doing a POC using Glue to Migrate data from RDS MySql to RDS Postgres. I have created Connectors to both source and target, and a crawler which connected to source. Then created a job and tried to migrate data with out any transformation and started getting java.sql.SQLIntegrityConstraintViolationException: Duplicate entry error

java.lang.reflect.Constructor.newInstance(Constructor.java:423)\n\tat com.mysql.cj.util.Util.handleNewInstance(Util.java:192)\n\tat com.mysql.cj.util.Util.getInstance(Util.java:167)\n\tat com.mysql.cj.util.Util.getInstance(Util.java:174)\n\tat com.mysql.cj.jdbc.exceptions.SQLError.createBatchUpdateException(SQLError.java:224)\n\tat com.mysql.cj.jdbc.ServerPreparedStatement.executeBatchSerially(ServerPreparedStatement.java:385)\n\tat com.mysql.cj.jdbc.ClientPreparedStatement.executeBatchInternal(ClientPreparedStatement.java:435)\n\tat com.mysql.cj.jdbc.StatementImpl.executeBatch(StatementImpl.java:794)\n\tat org.apache.spark.sql.jdbc.glue.GlueJDBCSink$.saveBatch(GlueJDBCSink.scala:400)\n\tat org.apache.spark.sql.jdbc.glue.GlueJDBCSink$.$anonfun$saveTable$4(GlueJDBCSink.scala:77)\n\tat org.apache.spark.sql.jdbc.glue.GlueJDBCSink$.$anonfun$saveTable$4$adapted(GlueJDBCSink.scala:77)\n\tat org.apache.spark.sql.jdbc.glue.GlueJDBCSink$.savePartition(GlueJDBCSink.scala:261)\n\tat org.apache.spark.sql.jdbc.glue.GlueJDBCSink$.$anonfun$saveTable$3(GlueJDBCSink.scala:77)\n\tat org.apache.spark.sql.jdbc.glue.GlueJDBCSink$.$anonfun$saveTable$3$adapted(GlueJDBCSink.scala:76)\n\tat org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1020)\n\tat org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1020)\n\tat org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2278)\n\tat org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)\n\tat org.apache.spark.scheduler.Task.run(Task.scala:131)\n\tat org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)\n\tat org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)\n\tat org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat ****java.lang.Thread.run(Thread.java:750)\nCaused by: java.**sql.SQLIntegrityConstraintViolationException: Duplicate entry '00002b0b-1a34-3319-b003-fb073fb8248d' for key 'transformer_fragment.PRIMARY'\n\tat ******com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:117)\n\tat com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)\n\tat com.mysql.cj.jdbc.ServerPreparedStatement.serverExecute(ServerPreparedStatement.java:637)\n\tat com.mysql.cj.jdbc.ServerPreparedStatement.executeInternal(ServerPreparedStatement.java:418)\n\tat com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdateInternal(ClientPreparedStatement.java:1092)\n\tat com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdateInternal(ClientPreparedStatement.java:1040)\n\tat com.mysql.cj.jdbc.ServerPreparedStatement.executeBatchSerially(ServerPreparedStatement.java:357)\n\t... 19 more\n","Accumulator Updates":[{"ID":130,"Update":"105","Internal":false,"Count Failed Values":true},{"ID":132,"Update":"0","Internal":false,"Count Failed Values":true},{"ID":139,"Update":"0","Internal":false,"Count Failed Values":true},{"ID":140,"Update":"1","Internal":false,"Count Failed Values":true},{"ID":141,"Update":"0","Internal":false,"Count Failed Values":true},{"ID":142,"Update":"0","Internal":false,"Count Failed Values":true},{"ID":143,"Update":"1135","Internal":false,"Count Failed Values":true},{"ID":144,"Update":"0","Internal":false,"Count Failed Values":true},{"ID":145,"Update":"1","Internal":false,"Count Failed Values":true}]},"Task Info":{"Task ID":13,"Index":5,"Attempt":1,"Launch Time":1655403050036,"Executor ID":"1","Host":"172.31.22.88","Locality":"NODE_LOCAL","Speculative":false,"Getting Result Time":0,"Finish Time":1655403050162,"Failed":true,"Killed":false,"Accumulables":

The following are the values I have used while creating the job, Type : Spark Glue Version - Glue 3.0 Supports spark3.1, Scala2, Python3 Language - Python3 Worker Type - G.1X Automatically Scale the number of workers - False Requested number of workers - 2 Generate Job Insights - True Job Bookmark - Enable Number of retries - 1 job timeout - 2880

Really appreciate any help !

gefragt vor 2 Jahren71 Aufrufe
Keine Antworten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen