Questions tagged with AWS Glue
AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development.
Content language: English
Select up to 5 tags to filter
Sort by most recent
Browse through the questions and answers listed below or filter and sort to narrow down your results.
1734 results
I am having some inconsistencies with my Glue crawler and Glue ETL job and I need some help to figure out the best setup. At the moment, I have an S3 bucket where I store CSV files and I partition the...
I have below python script where i have enabled job bookmark and also provided path in temp directory to create bookmark files. The problem is its creating bookmark json file which is empty. I dont...
Hello AWS Experts,
I’m new to AWS Glue bookmarking and need some assistance with an issue I’m facing.
Scenario: I have a testing Glue job that processes CSV files from the Landing/ folder, converts...
I have been testing the direct write to Iceberg feature in Firehose and I have come to realize that there is a problem with the feature. It does not always work. Let me elaborate:
- I created an IAM...
[Extracting key insights from Amazon S3 access logs with AWS Glue for Ray](https://aws.amazon.com/blogs/big-data/extracting-key-insights-from-amazon-s3-access-logs-with-aws-glue-for-ray/) introduces a...
**Issue:**
Our aim is to reduce logging to control data ingestion by 'PutLogEvent' metrics of CW. In past when we ran our glue job against a 35GB data size, we got a billed ~2K for cloudwatch most of...
I have below Python script where currently it generates several gz files with size 4MB in S3 bucket. Its bydeafult what AWS glue has created. But now i want to create multiple files of each file size...
I have below python script in AWS Glue job. For incremental load logic i have now set the Job bookmark option to enable. And then i try to run the glue job again but it did not create any temporary...
We have data stored in Cosmos DB NoSQL and need to migrate it to Snowflake using AWS Glue with a Change Data Capture (CDC) approach.
Our objective is to perform CRUD operations based on CDC to handle...
Hi Team,
Have a AWS Glue job connection to mongo db atlas . Getting this error ServerSelectionTimeoutError: xyz.mongodb.net:27017: timed out error. How can i resolve this using AWS privatelink and...
Steps taken:
1. Select existing ETL Job (let's call it "sample-job").
2. Clone job.
3. New job created, called "sample-job-copy".
4. Rename job.
5. Hit enter immediately after renaming.
Outcome: New...
We are new to Glue env and dealing with our huge cloud-watch bill, we changed log-level in pyspark script from INFO to ERROR. We are using both python logger and spark logger as below in pyspark (Glue...