Above error occurs when exceutor requires more momemory to process the data than configured. For G.1x worker Glue uses 10 GB and for G.2x it uses 20 GB. By default spark read operation from JDBC sources are not parallel and use one connection to read the entire data. To resolve this issue, read the JDBC table in parallel.
You can use hashexpression and hashfield along with hashpartition to read the data in parallel using Glue dynamic frame. Please refer to article for detail explanation:
Regarding the incremental load, You can use Glue bookmark feature to read the data from JDBC sources. Please refer to article for the same as there are some prerequisite to use bookmark with JDBC sources.
If you still face any issue, Please feel free to reach out to AWS Premium Support with script and jobrunid and we will be happy to help.
- Accepted Answerasked a year ago
- asked 3 months ago
- Why does my AWS Glue job fail with lost nodes when I migrate a large data set from Amazon RDS to Amazon S3?AWS OFFICIALUpdated 8 months ago
- Why am I not able to see the Amazon CloudWatch metrics for my AWS Glue ETL job even after I enabled job metrics?AWS OFFICIALUpdated 2 years ago
- EXPERTpublished 9 months ago
- EXPERTpublished 9 days ago