HIVE_FILESYSTEM_ERROR: Incorrect fileSize

0

This question has come up before on the AWS developer forums, but without any resolution.

In our case, we have a step function that does the following:

  • Copy a Parquet file (generated by a separate process) from a temporary location in S3 to a permanent location via a lambda. This lambda blocks until the copy is complete.
  • Runs an Athena query on the copied file with the arn:aws:states:::athena:startQueryExecution state
  • Invokes a lambda that waits for the query to finish. This lambda checks the query status with the get_query_execution API call.

We're receiving an error from the get_query_execution in ~4% of our step function executions:

HIVE_FILESYSTEM_ERROR: Incorrect fileSize 150486 for file s3://<redacted>.parquet, end of stream not reached

Like one of the commenters on the original issue, it seems to use like the copy is still in progress when the Athena query starts even though our lambda blocks until the copy had finished.

We plan to workaround this issue for now by retrying on this error. But, I would still like to know:

  • Does anyone have any specific insight on why this happens?
  • Does AWS offer any guidance on how to prevent this issue?

Thanks in advance!

No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions