Questions tagged with AWS Glue
Content language: English
Select up to 5 tags to filter
Sort by most recent
Browse through the questions and answers listed below or filter and sort to narrow down your results.
I want to use Glue Studio for creating a glue ETL job. This job needs to filter out the data in its first step based on the input parameters given to it at run time. Is there a way with visual ETL...
Accepted AnswerAWS Glue
2
answers
0
votes
314
views
asked 3 months agolg...
I have data currently partitioned on a key (say cluster) and I'm repartitioning to a new key 'date'. So I do (in Python)
```
df = glueContext.create_dynamic_frame.from_options(...)
df =...
1
answers
0
votes
158
views
asked 3 months agolg...
Hello,
For an AWS Data Catalog table, I ran Glue (structure: Amazon S3 -> Change Schema -> AWS Glue Data Catalog ) and populate table with only string records. All the actions were done from the...
1
answers
0
votes
142
views
asked 3 months agolg...
We have a file that we used the default XML crawler to crawl the data for, and it correctly created a table and schema for the data (relevant column shown):
![Correct...
0
answers
0
votes
126
views
asked 3 months agolg...
Hello
I am using PySpark on Glue Job to do ETL on a table sourced from S3 And S3 sourced from mysql via DMS (table schema as below, column 'op', 'row_updated_timestamp' & 'row_commit_timestamp' are...
1
answers
0
votes
110
views
asked 3 months agolg...
There was a data source (JSON files) in S3. The JSON structure is as follows.
I used AWS Glue Crawler to build the Glue table based on this S3 data source.
I think the "data" column should be "Struct"...
Accepted AnswerAWS Glue
2
answers
0
votes
297
views
asked 3 months agolg...
Crawler Error:
Insufficient Lake Formation permission(s) on mock_data_patient (Database name: crawl_db, Table Name: mock_data_patient) (Service: AWSGlue; Status Code: 400; Error Code:...
1
answers
0
votes
166
views
asked 3 months agolg...
I'm trying to build an ETL pipeline with AWS Glue, and the first step is to copy raw data from the original source to a staging bucket. The job is rather simple: source is a data catalog table (from...
1
answers
0
votes
223
views
asked 3 months agolg...
Hello,
In a Glue ETL made of nodes: Amazon S3, Change Schema, AWS Glue Data Catalog with the table "us_spending" backed by S3, I have the following error:
> Error Category: PERMISSION_ERROR;...
1
answers
0
votes
185
views
asked 3 months agolg...
I am looking for the best way to pass a parameter from one glue job to another within a step function.
Each day, I will receive a file. In the file there will be data for certain dates. The first...
1
answers
0
votes
616
views
asked 3 months agolg...
We have use case where we want to export ~500TB of DynamoDb data to a S3, one of the possible approaches that I found was making use of AWS Glue Job.
Also while exporting the data to S3, we need to...
2
answers
0
votes
250
views
asked 3 months agolg...
I have issue in trying to set up custom query on glue studio for Big Query. For example, the query below works on BQ, but doesn't work on the custom query on glue studio.
```
SELECT * FROM...
0
answers
0
votes
54
views
asked 3 months agolg...