HIVE_FILESYSTEM_ERROR: Incorrect fileSize

0

This question has come up before on the AWS developer forums, but without any resolution.

In our case, we have a step function that does the following:

  • Copy a Parquet file (generated by a separate process) from a temporary location in S3 to a permanent location via a lambda. This lambda blocks until the copy is complete.
  • Runs an Athena query on the copied file with the arn:aws:states:::athena:startQueryExecution state
  • Invokes a lambda that waits for the query to finish. This lambda checks the query status with the get_query_execution API call.

We're receiving an error from the get_query_execution in ~4% of our step function executions:

HIVE_FILESYSTEM_ERROR: Incorrect fileSize 150486 for file s3://<redacted>.parquet, end of stream not reached

Like one of the commenters on the original issue, it seems to use like the copy is still in progress when the Athena query starts even though our lambda blocks until the copy had finished.

We plan to workaround this issue for now by retrying on this error. But, I would still like to know:

  • Does anyone have any specific insight on why this happens?
  • Does AWS offer any guidance on how to prevent this issue?

Thanks in advance!

답변 없음

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠