Questions tagged with AWS Glue

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development.

Content language: English

Select up to 5 tags to filter
Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

I zipped my modules into zip file, uploaded to s3 and added to Pyspark and Shell jobs under `Python library path ` parameter: ![Enter image description...
1
answers
0
votes
245
views
asked a month ago
Hi, I have been using a docker image from amazon/aws-glue-libs:glue_libs_4.0.0_image_01 to run locally Glue Spark jobs. I also want to test Ray locally in the same manner, by inspecting the image...
0
answers
0
votes
223
views
asked a month ago
We are using Step Functions for our ETL pipeline. The first step kicks off 21 jobs that each take about 1-3 minutes each consuming 2 DPUs. The Step Function fails with the below error when trying to...
1
answers
0
votes
264
views
asked a month ago
We are trying to read a CSV file to process the data using AWS Glue and we are getting an error message as below: Py4JJavaError: An error occurred while calling o91.schema. :...
1
answers
0
votes
287
views
sravan
asked a month ago
How can I monitor who is querying which glue tables? After some trial and error I found that the BatchGetTable Glue API event is recorded using CloudTrail every time I run an Athena query, and...
2
answers
0
votes
480
views
Soumaya
asked a month ago
I have Athena Iceberg table. The table has 2 partitions. Each hour I update it with `MERGE` and `DELETE` commands. ``` SELECT count(*) FROM "my_table$files" ``` now **gives 16. Meanwhile data...
1
answers
0
votes
435
views
profile picture
Smotrov
asked a month ago
I have a bunch of parquet files in a flat S3 folder, no partitions:...
1
answers
0
votes
161
views
ecmons
asked a month ago
Hi. I had a table that was created by a crawler, then I deleted the table ( in Athena) and created it by DDL. after running crawler. it could not find the table and create a new table. note: The s3...
1
answers
0
votes
341
views
profile picture
gh02
asked a month ago
I have a few text files on S3 that I need to add to the Glue Catalog in order to use them in a job. None of them have separators, they are all fixed-width files. I have the schemas, but the crawler...
Accepted AnswerAWS Glue
1
answers
0
votes
85
views
profile picture
asked 2 months ago
Hello, I am currently working with AWS Glue ETL Jobs and encountered an issue where the "Push to repository" and "Pull from repository" options are disabled when trying to push the script/job to...
1
answers
0
votes
198
views
asked 2 months ago
I created a custom visual transform component and put the needed json and python files in S3. The component loaded up as expected. Later, I needed to do some more adjustments to the parameters...
2
answers
0
votes
147
views
EdwardR
asked 2 months ago
I have a glue pyspark script that processes DDB data exported to S3 and writes it to Redshift. Initially, it was using below logic: ``` redshiftConnectionOptions = { "postactions": "BEGIN; MERGE...
1
answers
0
votes
221
views
asked 2 months ago