- 最新
- 投票最多
- 评论最多
Hello,
Above error occurs when exceutor requires more momemory to process the data than configured. For G.1x worker Glue uses 10 GB and for G.2x it uses 20 GB. By default spark read operation from JDBC sources are not parallel and use one connection to read the entire data. To resolve this issue, read the JDBC table in parallel.
You can use hashexpression and hashfield along with hashpartition to read the data in parallel using Glue dynamic frame. Please refer to article for detail explanation:
https://aws.amazon.com/premiumsupport/knowledge-center/glue-lost-nodes-rds-s3-migration/
Regarding the incremental load, You can use Glue bookmark feature to read the data from JDBC sources. Please refer to article for the same as there are some prerequisite to use bookmark with JDBC sources.
https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html
If you still face any issue, Please feel free to reach out to AWS Premium Support with script and jobrunid and we will be happy to help.
相关内容
- AWS 官方已更新 3 年前