ICEBERG_CANNOT_OPEN_SPLIT: Error opening Iceberg split s3

0

Recurring Issue with AWS Athena When Running Queries on Iceberg Table.

Description I am trying to run queries on an Iceberg table using AWS Athena. The data is stored in S3, and I am using EMR 6.12.0, Iceberg 1.3.0-amzn-0, and Spark 3.4.0. The data ingestion process is running on EMR, which consumes data from a Kafka topic and ingests it into my Iceberg table in S3. Interestingly, sometimes the query runs successfully, but other times I encounter the following error in Athena:

ICEBERG_CANNOT_OPEN_SPLIT: Error opening Iceberg split my_s3_path/data/id_pk_bucket=2/created_at_month=2023-08/my_parquet.parquet (offset=4, length=16038): Incorrect file size (16042) for file (end of stream not reached): my_s3_path/data/id_pk_bucket=2/created_at_month=2023-08/my_parquet.parquet

The error occurs only in Athena; running a query on the table using Spark works fine.

Steps to Reproduce Configured EMR with version 6.12.0 and Spark 3.4.0.

Set up an ingestion process on EMR to consume data from a Kafka topic and insert it into an Iceberg table on S3.

Created an Iceberg table on S3 using Iceberg version 1.3.0-amzn-0 and the following properties:

OPTIONS ( 'format-version'='2', 'write.target-file-size-bytes'='124217728', 'history.expire.max-snapshot-age-ms'='172800000' PARTITIONED BY (bucket(10, my_pk), months(created_at)) ) Data write process executed in Spark:

query = ( df.writeStream.format("iceberg") .outputMode("append") .trigger(once=True) .option("path", iceberg_table) .option("fanout-enabled", "true") .option( "checkpointLocation", checkpoint_location, ) )

query.toTable(iceberg_table).awaitTermination() Tried running a query in AWS Athena. SELECT * FROM "db"."table" limit 10; Expected Result I expected the query in AWS Athena to run without any issues.

Actual Result I am receiving a recurring error, ICEBERG_CANNOT_OPEN_SPLIT, which appears to indicate there is an issue with the file size or with the data streaming from S3.

Additional Information EMR Version: 6.12.0 Iceberg Version: 1.3.0-amzn-0 Spark Version: 3.4.0 We are using Glue as the catalog I am open to providing more information as needed. Thank you!

  • did u find the answer? can u pls let me know?

1개 답변
0

Thank you for your question, however this requires us to look at cluster and logs so that we can troubleshoot further. Please open a case with premium support so that we may debug the issue and look into your resources accordingly. We are unable to share details on the cluster here due to privacy and security concerns.

AWS
지원 엔지니어
답변함 9달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠