Questions tagged with AWS Glue
AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development.
Content language: English
Select up to 5 tags to filter
Sort by most recent
Browse through the questions and answers listed below or filter and sort to narrow down your results.
1732 results
Hi, I have been using a docker image from amazon/aws-glue-libs:glue_libs_4.0.0_image_01 to run locally Glue Spark jobs.
I also want to test Ray locally in the same manner, by inspecting the image...
We are using Step Functions for our ETL pipeline. The first step kicks off 21 jobs that each take about 1-3 minutes each consuming 2 DPUs. The Step Function fails with the below error when trying to...
We are trying to read a CSV file to process the data using AWS Glue and we are getting an error message as below:
Py4JJavaError: An error occurred while calling o91.schema.
:...
How can I monitor who is querying which glue tables?
After some trial and error I found that the BatchGetTable Glue API event is recorded using CloudTrail every time I run an Athena query, and...
I have Athena Iceberg table.
The table has 2 partitions.
Each hour I update it with `MERGE` and `DELETE` commands.
```
SELECT count(*) FROM "my_table$files"
```
now **gives 16. Meanwhile data...
I have a bunch of parquet files in a flat S3 folder, no partitions:...
Hi. I had a table that was created by a crawler, then I deleted the table ( in Athena) and created it by DDL. after running crawler. it could not find the table and create a new table.
note: The s3...
I have a few text files on S3 that I need to add to the Glue Catalog in order to use them in a job. None of them have separators, they are all fixed-width files. I have the schemas, but the crawler...
Hello,
I am currently working with AWS Glue ETL Jobs and encountered an issue where the "Push to repository" and "Pull from repository" options are disabled when trying to push the script/job to...
I created a custom visual transform component and put the needed json and python files in S3. The component loaded up as expected. Later, I needed to do some more adjustments to the parameters...
I have a glue pyspark script that processes DDB data exported to S3 and writes it to Redshift. Initially, it was using below logic:
```
redshiftConnectionOptions = {
"postactions": "BEGIN; MERGE...
I just can't understand what I'm doing wrong.
I have a table.
```
CREATE EXTERNAL TABLE test (
originalrequest string,
requeststarted string
)
PARTITIONED BY (
req_start_partition...