All Content tagged with AWS Glue

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development.

Content language: English

Select up to 5 tags to filter
Sort by most recent
Hello AWS Experts, I’m new to AWS Glue bookmarking and need some assistance with an issue I’m facing. Scenario: I have a testing Glue job that processes CSV files from the Landing/ folder, converts...
0
answers
0
votes
10
views
Sampath
asked a day ago
I have been testing the direct write to Iceberg feature in Firehose and I have come to realize that there is a problem with the feature. It does not always work. Let me elaborate: - I created an IAM...
0
answers
0
votes
15
views
profile picture
Humaid
asked a day ago
[Extracting key insights from Amazon S3 access logs with AWS Glue for Ray](https://aws.amazon.com/blogs/big-data/extracting-key-insights-from-amazon-s3-access-logs-with-aws-glue-for-ray/) introduces a...
2
answers
1
votes
25
views
asked 2 days ago
**Issue:** Our aim is to reduce logging to control data ingestion by 'PutLogEvent' metrics of CW. In past when we ran our glue job against a 35GB data size, we got a billed ~2K for cloudwatch most of...
1
answers
0
votes
40
views
asked 2 days ago
I have below Python script where currently it generates several gz files with size 4MB in S3 bucket. Its bydeafult what AWS glue has created. But now i want to create multiple files of each file size...
2
answers
0
votes
32
views
RahulD
asked 5 days ago
I have below python script in AWS Glue job. For incremental load logic i have now set the Job bookmark option to enable. And then i try to run the glue job again but it did not create any temporary...
1
answers
0
votes
25
views
RahulD
asked 5 days ago
We have data stored in Cosmos DB NoSQL and need to migrate it to Snowflake using AWS Glue with a Change Data Capture (CDC) approach. Our objective is to perform CRUD operations based on CDC to handle...
1
answers
0
votes
12
views
sowndar
asked 6 days ago
I am trying to create a Kinesis Firehose stream that can directly write to Iceberg tables in S3. I have defined the Glue Data Catalog in the same account and created a bucket to hold the metadata. ...
3
answers
0
votes
42
views
profile picture
Humaid
asked 7 days ago
Hi Team, Have a AWS Glue job connection to mongo db atlas . Getting this error ServerSelectionTimeoutError: xyz.mongodb.net:27017: timed out error. How can i resolve this using AWS privatelink and...
1
answers
0
votes
23
views
MD
asked 8 days ago
Steps taken: 1. Select existing ETL Job (let's call it "sample-job"). 2. Clone job. 3. New job created, called "sample-job-copy". 4. Rename job. 5. Hit enter immediately after renaming. Outcome: New...
Accepted AnswerAWS Glue
2
answers
0
votes
40
views
asked 9 days ago
We are new to Glue env and dealing with our huge cloud-watch bill, we changed log-level in pyspark script from INFO to ERROR. We are using both python logger and spark logger as below in pyspark (Glue...
0
answers
0
votes
19
views
asked 9 days ago