AWS DataBrew - DeltaLake source

2

Hello Team,

how can we read Delta Lake (delta parquet) source with DataBrew? Currently, I am getting an error when trying to run a job:

An error occurred while calling o104.getDynamicFrame. s3://[REDACTED]/manifest is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [117, 101, 116, 10]

The datatype of the dataset is Data Catalog table. With Athena working fine, can fetch the data.

질문됨 일 년 전393회 조회
1개 답변
1

Hello,

Glue Catalog table based on Delta Lake is not supported by Glue DataBrew at this moment. As a workaround, you can use Glue ETL jobs to read and write Delta Lake tables. For more details and examples, refer to: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-delta-lake.html

I hope this information helps.

AWS
Ethan_H
답변함 일 년 전
  • I would like to use DataBrew services like visual data profiling, quality assurance, etc... Then I am looking for another solution. Thanks.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠