Athena query consistently fails with HIVE_CURSOR_ERROR: Failed to read ORC file

0

I am seeing Athena queries over a bucket containing ORC files fail with the error message 'HIVE_CURSOR_ERROR: Failed to read ORC file'. Any query over entirety of the data in the bucket fails. A specific example query has been SELECT * FROM reachcounts_outbound WHERE calculation='a8d9458d-83e2-4e94-b272-3dbcd91296a0' where calculation is set up as a partition in the reachcounts_outbound table (which is backed by an S3 bucket unscoreit-reachcounts-outbound).

I've validated that the file referenced by the error message is a valid ORC file by downloading it and running orc-tools data on it, and the contents are what I'd expect. I've downloaded other ORC files in the bucket and compared them. They have the same schema and that schema is what I'd expect it to be; it matches the schema I've defined for the table.

I've tried deleting the individual file referenced when the error message first appeared. However, it continues to fail with the same message with a different file in the bucket. However if I specify a limit clause of any number under 1597894 on the query above, it will succeed.

I've tried running MSCK REPAIR TABLE on the reachcounts_outbound table. This did not change anything.

The query id of a request that caused a failure is 54480f27-1992-40f7-8240-17cc622f91db.

Thanks!

Update: The ORC files that are rejected all appear to have exactly 10,000 rows, which is the stride size for the file

mattlaq
질문됨 2년 전2163회 조회
1개 답변
0

I had a similar situation and in my case the issue was that when executing the CREATE TABLE command to define the athena table, I did not include all the fields that are present in the lines in the ORC files as column names for the table. Once I checked that all the fields were listed, the query worked again just fine.

muloem
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠