AWS DataBrew - DeltaLake source

2

Hello Team,

how can we read Delta Lake (delta parquet) source with DataBrew? Currently, I am getting an error when trying to run a job:

An error occurred while calling o104.getDynamicFrame. s3://[REDACTED]/manifest is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [117, 101, 116, 10]

The datatype of the dataset is Data Catalog table. With Athena working fine, can fetch the data.

preguntada hace un año393 visualizaciones
1 Respuesta
1

Hello,

Glue Catalog table based on Delta Lake is not supported by Glue DataBrew at this moment. As a workaround, you can use Glue ETL jobs to read and write Delta Lake tables. For more details and examples, refer to: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-delta-lake.html

I hope this information helps.

AWS
Ethan_H
respondido hace un año
  • I would like to use DataBrew services like visual data profiling, quality assurance, etc... Then I am looking for another solution. Thanks.

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas