Skip to content

All Content tagged with AWS Glue

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development.

Content language: English

Filter content
Select tags to filter
Sort by
Sort by most recent
2145 results
Hi, in AWS docs it is said that data that Glue jobs use is encrypted via KMS keys when it is at rest and in transit, which provides ambiguity since Glue jobs might use S3 instead of their local disk. ...
1
answers
0
votes
22
views
asked 5 days ago
While using **AWS DataZone Catalog**, I observed an issue with **glossary term navigation from assets**. ### **Observed behavior** * Assets (e.g., Glue Tables) show associated **Glossary Terms** as ...
1
answers
0
votes
27
views
asked 9 days ago
I have an IAM role for my bucket that has the following permissions: "s3:PutObject", "s3:GetObject", "iam:PassRole", "kms:Decrypt", "kms:Encrypt", "kms:GenerateDataKey", "s3:ListBucket", "s3:DeleteObj...
1
answers
0
votes
40
views
asked 10 days ago
Production streaming applications across Apache Flink, Spark Streaming, Kafka Streams, Kinesis Client Library and AWS Lambda might experience throttling on AWS Glue Schema Registry's GetSchemaVersion ...
I am attempting to use Hudi 1.1.1 in an AWS 5.0 Glue job. To avoid conflicts with the pre-installed Glue Hudi libraries, I have set the job parameter `--datalake-formats` to an empty string (""). Howe...
2
answers
0
votes
63
views
asked 13 days ago
Apache Iceberg is a leading open table format for analytics. When paired with Kafka Connect, teams can build streaming pipelines into Iceberg tables. However, a known issue with the Kafka Connect Ice...
Looking to implement sort compaction on an S3 Tables Iceberg table following the blog post: > [https://aws.amazon.com/blogs/aws/new-improve-apache-iceberg-query-performance-in-amazon-s3-with-sort-and...
4
answers
0
votes
104
views
asked 19 days ago
I'm using Amazon Kinesis Data Firehose to deliver streaming data into an Apache Iceberg table registered in the AWS Glue Data Catalog, which is part of a non-default catalog (s3tablescatalog/pulse-pro...
1
answers
0
votes
79
views
asked 25 days ago
I am building a Lakehouse solution using aws glue visual etl. When writing the dataset using the target s3 node in visual editor, there is no option to specify writemode() to overwrite When i checked ...
1
answers
0
votes
80
views
asked 25 days ago
I'm trying to install Python modules in my AWS Glue Python Shell job using wheel files stored in Amazon Simple Storage Service (Amazon S3). My job runs in a private Virtual Private Cloud (VPC) with Am...
We have a large-scale S3 data lake with the following characteristics: - Source: AWS Flink application writing Parquet files directly to S3 - Volume: ~4000 Parquet files per hour, ranging from 200GB ...
1
answers
0
votes
69
views
asked a month ago
We have glue crawler that runs on a S3 bucket. The bucket contains a single file only. This file get replaced with new data daily. Currently we run glue crawler everytime when this data. After which w...
2
answers
0
votes
26
views
asked a month ago
  • 1
  • 2
  • 3
  • 4
  • 5
  • •••
  • 179
  • Page size
    12 / page