Questions tagged with AWS Glue
AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development.
Content language: English
Select up to 5 tags to filter
Sort by most recent
Browse through the questions and answers listed below or filter and sort to narrow down your results.
1734 results
I have been implementing a small ETL job using Pyspark.
**I plan to deploy it to AWS Glue and will use an S3 bucket. to read and write my files instead of local file, once it is ready.**
This ETL...
I am using AWS GLUE ETL job that is fetching data from Mongo DB and putting it to AWS Glue catalog table but the issue is everytime the job runs it is creating the duplicate entries.(If there are 1000...
Hello AWS Community,
I am currently storing event logs in an RDS Postgres database and am looking for an efficient way to manage the growing size of our tables. Here's what I am aiming to...
say, i have some data in s3 bucket and an aws glue crawler job that reads and creates few aws glue catalog tables. I want to read the data in these tables and push it to some other database like...
I have been trying to send data from Salesforce to Redshift using App Flow. Every time when I setup the flow I am getting an error 'Connector timed out'. I have tried both serverless and cluster. I am...
I have a glue database called `edb_iris_iceberg_test`. It has an iceberg type table called `sample_data_iceberg_2`.
Below is the table DDL -
```
CREATE TABLE...
I have a pyspark script , where i read data from a etl table and post it to rds , sample code below . I read the data in a dataframe and use overwrite mode to update the data. this deletes the old...
I have a glue job that is supposed to read from DynamoDB table of size 1.4GB, process it and write to Redshift. The job always fails with:
**'An error occurred while calling o181.pyWriteDynamicFrame....
Hi,
I am trying to read a csv file and then write to Delta file in S3 in AWS Glue notebook. Getting error:
Caused by: java.lang.ClassNotFoundException: delta.DefaultSource
I am using below :
from...
I have a AWS glue table with one partition named dt, i can add data in my s3, using Athena via this glue table and can also query it.
But I am not able to query data using redshift query editor.
I...
I'm trying to find out if Trino on EMR supports access controls maintained in Lake Formation. My catalog is AWS Glue. I couldn't find any documentation on Lake Formation or EMR side that would talk...
Hi everyone, I changed the KMS key in Glue Catalog setting. So I need to delete my tables, and then re-create them by running Crawlers. it seems that deleting and recreating tables causes the bookmark...