Questions tagged with AWS Glue

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development.

Content language: English

Select up to 5 tags to filter
Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

1736 results
Hello, I am facing this weird issue from AWS Glue. I do have a NAT Gateway in the VPC, which should take care of the network issues. So I am not sure why the networking issue exists/persists. I...
Accepted AnswerAmazon VPCAWS Glue
1
answers
0
votes
294
views
pbocan
asked 2 months ago
I'm running a Visual ETL job under Glue service. I'm testing that service thru visual editor and I stated thru datasource pointing to some DynamoDB table (before I made a crawler, run it then I aws...
2
answers
0
votes
570
views
cfabres
asked 2 months ago
I have an EventBridge rule that triggers when a new file is added to an S3 bucket with the EventBridge target being a glue workflow. Now I want to pass event data from EventBridge to my glue workflow...
2
answers
0
votes
567
views
Toby
asked 2 months ago
I zipped my modules into zip file, uploaded to s3 and added to Pyspark and Shell jobs under `Python library path ` parameter: ![Enter image description...
1
answers
0
votes
275
views
asked 2 months ago
Hi, I have been using a docker image from amazon/aws-glue-libs:glue_libs_4.0.0_image_01 to run locally Glue Spark jobs. I also want to test Ray locally in the same manner, by inspecting the image...
0
answers
0
votes
230
views
asked 2 months ago
We are using Step Functions for our ETL pipeline. The first step kicks off 21 jobs that each take about 1-3 minutes each consuming 2 DPUs. The Step Function fails with the below error when trying to...
1
answers
0
votes
289
views
asked 2 months ago
We are trying to read a CSV file to process the data using AWS Glue and we are getting an error message as below: Py4JJavaError: An error occurred while calling o91.schema. :...
1
answers
0
votes
305
views
sravan
asked 2 months ago
How can I monitor who is querying which glue tables? After some trial and error I found that the BatchGetTable Glue API event is recorded using CloudTrail every time I run an Athena query, and...
2
answers
0
votes
508
views
Soumaya
asked 3 months ago
I have Athena Iceberg table. The table has 2 partitions. Each hour I update it with `MERGE` and `DELETE` commands. ``` SELECT count(*) FROM "my_table$files" ``` now **gives 16. Meanwhile data...
1
answers
0
votes
458
views
profile picture
Smotrov
asked 3 months ago
I have a bunch of parquet files in a flat S3 folder, no partitions:...
1
answers
0
votes
190
views
ecmons
asked 3 months ago
Hi. I had a table that was created by a crawler, then I deleted the table ( in Athena) and created it by DDL. after running crawler. it could not find the table and create a new table. note: The s3...
1
answers
0
votes
371
views
profile picture
gh02
asked 3 months ago
I have a few text files on S3 that I need to add to the Glue Catalog in order to use them in a job. None of them have separators, they are all fixed-width files. I have the schemas, but the crawler...
Accepted AnswerAWS Glue
1
answers
0
votes
111
views
profile picture
asked 3 months ago