AWS DataBrew - DeltaLake source

2

Hello Team,

how can we read Delta Lake (delta parquet) source with DataBrew? Currently, I am getting an error when trying to run a job:

An error occurred while calling o104.getDynamicFrame. s3://[REDACTED]/manifest is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [117, 101, 116, 10]

The datatype of the dataset is Data Catalog table. With Athena working fine, can fetch the data.

posta un anno fa391 visualizzazioni
1 Risposta
1

Hello,

Glue Catalog table based on Delta Lake is not supported by Glue DataBrew at this moment. As a workaround, you can use Glue ETL jobs to read and write Delta Lake tables. For more details and examples, refer to: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-delta-lake.html

I hope this information helps.

AWS
Ethan_H
con risposta un anno fa
  • I would like to use DataBrew services like visual data profiling, quality assurance, etc... Then I am looking for another solution. Thanks.

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande