Questions tagged with AWS Glue

Content language: English

Select up to 5 tags to filter

Sort by most recent

Filter Questions by

AllAnsweredUnansweredNo Answer

Browse through the questions and answers listed below or filter and sort to narrow down your results.

AWS Glue Studio Visual Editor Data Preview changing schema data types incorrectly

We have a file that we used the default XML crawler to crawl the data for, and it correctly created a table and schema for the data (relevant column shown): ![Correct...

AWS Glue

answers

votes

145

views

jeff

asked 5 months ago

Handle de-dup in Glue Job Pyspark

Hello I am using PySpark on Glue Job to do ETL on a table sourced from S3 And S3 sourced from mysql via DMS (table schema as below, column 'op', 'row_updated_timestamp' & 'row_commit_timestamp' are...

AWS Glue Extract Transform & Load Data

answers

votes

141

views

rePost-User-1943247

asked 5 months ago

How to fix the unknown schema datatype of AWS Glue Table?

There was a data source (JSON files) in S3. The JSON structure is as follows. I used AWS Glue Crawler to build the Glue table based on this S3 data source. I think the "data" column should be "Struct"...

Accepted AnswerAWS Glue

answers

votes

434

views

CharlieWu

asked 5 months ago

Insufficient Lake Formation permission(s) on mock_data_patient (Database name: crawl_db, Table Name: mock_data_patient)

Crawler Error: Insufficient Lake Formation permission(s) on mock_data_patient (Database name: crawl_db, Table Name: mock_data_patient) (Service: AWSGlue; Status Code: 400; Error Code:...

Accepted AnswerAWS Identity and Access Management AWS Glue AWS Lake Formation

answers

votes

210

views

Omkar

asked 5 months ago

Glue Visual ETL: Can't copy raw data from RDS MySQL to S3 bucket due to unclassified error: Schema specified that header line is to be written; but contains no column names

I'm trying to build an ETL pipeline with AWS Glue, and the first step is to copy raw data from the original source to a staging bucket. The job is rather simple: source is a data catalog table (from...

Accepted AnswerAWS Glue Extract Transform & Load Data

answers

votes

295

views

NLopeDeBarrios

asked 5 months ago

Glue ETL AccessDeniedException for not existent Lake Formation

Hello, In a Glue ETL made of nodes: Amazon S3, Change Schema, AWS Glue Data Catalog with the table "us_spending" backed by S3, I have the following error: > Error Category: PERMISSION_ERROR;...

Accepted AnswerAWS Glue Extract Transform & Load Data AWS Lake Formation

answers

votes

235

views

rePost-User-4717319

asked 5 months ago

Pass parameter from one glue job to another in step function?

I am looking for the best way to pass a parameter from one glue job to another within a step function. Each day, I will receive a file. In the file there will be data for certain dates. The first...

AWS Step Functions AWS Glue Extract Transform & Load Data

answers

votes

948

views

rpost

asked 5 months ago

Using AWS Glue to export ~500TB of DynamoDB table to S3 bucket

We have use case where we want to export ~500TB of DynamoDb data to a S3, one of the possible approaches that I found was making use of AWS Glue Job. Also while exporting the data to S3, we need to...

AWS Glue

answers

votes

334

views

shasnk

asked 5 months ago

AWS GLUE - Big Query Connector Errors

I have issue in trying to set up custom query on glue studio for Big Query. For example, the query below works on BQ, but doesn't work on the custom query on glue studio. ``` SELECT * FROM...

AWS Glue

answers

votes

views

asked 5 months ago

Glue to BigQuery JSON field

I need to load data from my dataframe to BiqQuery table JSON type field. There is connector that according to documentation supports this feature:...

AWS Glue

answers

votes

views

Adas Kavaliauskas

asked 5 months ago

Support for Record Matching transforms in Glue 3.0/4.0

I'm working on a project that makes use of Glue Record Matching transforms which, by my best research though AWS docs, is only supported in Glue 2.0 jobs (and additionally, the maximum Glue version I...

AWS Glue

answers

votes

views

bryanhart

asked 5 months ago

Delete old parquet files of overwritten Iceberg table

I am trying to write a pyspark dataframe to S3 and the AWS data catalog using the Iceberg format and the pyspark.sql.DataFrameWriterV2 with the createOrReplace function. When I write the same...

Amazon Athena AWS Glue

answers

votes

669

views

Thomas Mueller

asked 5 months ago