Glue job keeps running and does not write results

0

I have created a job to migrate my Postgres data to S3, I am implementing full load right now, the table consists of a ** lot of records**(count-17496724), that is why I added 10 workers along with auto scaling option checked, but i keep getting this error. The job continues to run for long and does not generate any output. Have tried other combination of worker numbers too like 5,10, but same error. Below are the errors from the Logs:

  1. ERROR [dispatcher-CoarseGrainedScheduler] scheduler.TaskSchedulerImpl (Logging.scala:logError(73)): Lost executor 1 on 10.0.4.209: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

  2. ERROR [dispatcher-CoarseGrainedScheduler] scheduler.TaskSchedulerImpl (Logging.scala:logError(73)): Lost executor 2 on 10.0.4.209: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

About the job: Creating JDBC connection to the Postgres database and writing the data into S3 as parquet files

How do I perform the load, I need to add the incremental logic later.

1개 답변
0

Hello,

Above error occurs when exceutor requires more momemory to process the data than configured. For G.1x worker Glue uses 10 GB and for G.2x it uses 20 GB. By default spark read operation from JDBC sources are not parallel and use one connection to read the entire data. To resolve this issue, read the JDBC table in parallel.

You can use hashexpression and hashfield along with hashpartition to read the data in parallel using Glue dynamic frame. Please refer to article for detail explanation:

https://aws.amazon.com/premiumsupport/knowledge-center/glue-lost-nodes-rds-s3-migration/

Regarding the incremental load, You can use Glue bookmark feature to read the data from JDBC sources. Please refer to article for the same as there are some prerequisite to use bookmark with JDBC sources.

https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html

If you still face any issue, Please feel free to reach out to AWS Premium Support with script and jobrunid and we will be happy to help.

AWS
답변함 2년 전
profile picture
전문가
검토됨 한 달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠