AWS DataBrew - DeltaLake source

2

Hello Team,

how can we read Delta Lake (delta parquet) source with DataBrew? Currently, I am getting an error when trying to run a job:

An error occurred while calling o104.getDynamicFrame. s3://[REDACTED]/manifest is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [117, 101, 116, 10]

The datatype of the dataset is Data Catalog table. With Athena working fine, can fetch the data.

gefragt vor einem Jahr393 Aufrufe
1 Antwort
1

Hello,

Glue Catalog table based on Delta Lake is not supported by Glue DataBrew at this moment. As a workaround, you can use Glue ETL jobs to read and write Delta Lake tables. For more details and examples, refer to: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-delta-lake.html

I hope this information helps.

AWS
Ethan_H
beantwortet vor einem Jahr
  • I would like to use DataBrew services like visual data profiling, quality assurance, etc... Then I am looking for another solution. Thanks.

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen