Questions tagged with AWS Glue

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development.

Content language: English

Select up to 5 tags to filter
Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

1734 results
I am having some inconsistencies with my Glue crawler and Glue ETL job and I need some help to figure out the best setup. At the moment, I have an S3 bucket where I store CSV files and I partition the...
1
answers
0
votes
47
views
Vas
asked a month ago
I have below python script where i have enabled job bookmark and also provided path in temp directory to create bookmark files. The problem is its creating bookmark json file which is empty. I dont...
1
answers
0
votes
36
views
RahulD
asked a month ago
Hello AWS Experts, I’m new to AWS Glue bookmarking and need some assistance with an issue I’m facing. Scenario: I have a testing Glue job that processes CSV files from the Landing/ folder, converts...
0
answers
0
votes
20
views
Sampath
asked a month ago
I have been testing the direct write to Iceberg feature in Firehose and I have come to realize that there is a problem with the feature. It does not always work. Let me elaborate: - I created an IAM...
1
answers
0
votes
41
views
profile picture
Humaid
asked a month ago
[Extracting key insights from Amazon S3 access logs with AWS Glue for Ray](https://aws.amazon.com/blogs/big-data/extracting-key-insights-from-amazon-s3-access-logs-with-aws-glue-for-ray/) introduces a...
2
answers
1
votes
57
views
asked a month ago
**Issue:** Our aim is to reduce logging to control data ingestion by 'PutLogEvent' metrics of CW. In past when we ran our glue job against a 35GB data size, we got a billed ~2K for cloudwatch most of...
1
answers
0
votes
57
views
asked a month ago
I have below Python script where currently it generates several gz files with size 4MB in S3 bucket. Its bydeafult what AWS glue has created. But now i want to create multiple files of each file size...
2
answers
0
votes
56
views
RahulD
asked a month ago
I have below python script in AWS Glue job. For incremental load logic i have now set the Job bookmark option to enable. And then i try to run the glue job again but it did not create any temporary...
1
answers
0
votes
48
views
RahulD
asked a month ago
We have data stored in Cosmos DB NoSQL and need to migrate it to Snowflake using AWS Glue with a Change Data Capture (CDC) approach. Our objective is to perform CRUD operations based on CDC to handle...
1
answers
0
votes
29
views
sowndar
asked a month ago
Hi Team, Have a AWS Glue job connection to mongo db atlas . Getting this error ServerSelectionTimeoutError: xyz.mongodb.net:27017: timed out error. How can i resolve this using AWS privatelink and...
1
answers
0
votes
40
views
MD
asked a month ago
Steps taken: 1. Select existing ETL Job (let's call it "sample-job"). 2. Clone job. 3. New job created, called "sample-job-copy". 4. Rename job. 5. Hit enter immediately after renaming. Outcome: New...
Accepted AnswerAWS Glue
2
answers
0
votes
58
views
asked a month ago
We are new to Glue env and dealing with our huge cloud-watch bill, we changed log-level in pyspark script from INFO to ERROR. We are using both python logger and spark logger as below in pyspark (Glue...
0
answers
0
votes
23
views
asked a month ago