Questions tagged with AWS Glue

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development.

Content language: English

Select up to 5 tags to filter
Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

1734 results
I have been implementing a small ETL job using Pyspark. **I plan to deploy it to AWS Glue and will use an S3 bucket. to read and write my files instead of local file, once it is ready.** This ETL...
1
answers
1
votes
346
views
profile picture
asked 3 months ago
I am using AWS GLUE ETL job that is fetching data from Mongo DB and putting it to AWS Glue catalog table but the issue is everytime the job runs it is creating the duplicate entries.(If there are 1000...
2
answers
0
votes
489
views
asked 3 months ago
Hello AWS Community, I am currently storing event logs in an RDS Postgres database and am looking for an efficient way to manage the growing size of our tables. Here's what I am aiming to...
1
answers
0
votes
494
views
Alan
asked 3 months ago
say, i have some data in s3 bucket and an aws glue crawler job that reads and creates few aws glue catalog tables. I want to read the data in these tables and push it to some other database like...
1
answers
0
votes
686
views
asked 3 months ago
I have been trying to send data from Salesforce to Redshift using App Flow. Every time when I setup the flow I am getting an error 'Connector timed out'. I have tried both serverless and cluster. I am...
2
answers
0
votes
714
views
asked 3 months ago
I have a glue database called `edb_iris_iceberg_test`. It has an iceberg type table called `sample_data_iceberg_2`. Below is the table DDL - ``` CREATE TABLE...
1
answers
0
votes
278
views
suraj
asked 3 months ago
I have a pyspark script , where i read data from a etl table and post it to rds , sample code below . I read the data in a dataframe and use overwrite mode to update the data. this deletes the old...
1
answers
0
votes
274
views
asked 3 months ago
I have a glue job that is supposed to read from DynamoDB table of size 1.4GB, process it and write to Redshift. The job always fails with: **'An error occurred while calling o181.pyWriteDynamicFrame....
0
answers
0
votes
472
views
asked 3 months ago
Hi, I am trying to read a csv file and then write to Delta file in S3 in AWS Glue notebook. Getting error: Caused by: java.lang.ClassNotFoundException: delta.DefaultSource I am using below : from...
Accepted AnswerAWS Glue
1
answers
0
votes
284
views
asked 3 months ago
I have a AWS glue table with one partition named dt, i can add data in my s3, using Athena via this glue table and can also query it. But I am not able to query data using redshift query editor. I...
1
answers
0
votes
555
views
shivank
asked 3 months ago
I'm trying to find out if Trino on EMR supports access controls maintained in Lake Formation. My catalog is AWS Glue. I couldn't find any documentation on Lake Formation or EMR side that would talk...
1
answers
0
votes
487
views
profile picture
Saawgr
asked 3 months ago
Hi everyone, I changed the KMS key in Glue Catalog setting. So I need to delete my tables, and then re-create them by running Crawlers. it seems that deleting and recreating tables causes the bookmark...
Accepted AnswerAWS Glue
1
answers
0
votes
160
views
profile picture
gh02
asked 3 months ago