1 個回答
- 最新
- 最多得票
- 最多評論
0
The HIVE_CANNOT_OPEN_SPLIT
error is typically seen when there is an issue reading data from a partition and its associated S3 prefix. There are several things to check here:
- How many files are in the date prefix where you’re seeing the error?
- There’s an S3 limitation on how many requests can be made per prefix. If you have many (e.g. thousands) smaller files in the partition you’re trying to query from, it is possible to hit this S3 limitation from Athena. I recommend merging those smaller files into larger files and/or distributing your data over another partition field to avoid this issue when querying in the future.
- I would also view any S3 metrics in CloudWatch to see if S3 requests to your bucket and the specific prefix are being throttled.
- Are other applications also reading from the same S3 prefix?
- Here you will run into the same S3 limitation so I recommend, again, merging smaller files into larger files within the prefix and/or distributing your data over another partition field to avoid this issue when querying in the future.
- Is versioning available on your S3 bucket?
- If there’s a high number of files that have been deleted, you might also see this error. This is because S3 places a delete marker instead of deleting the objects when versioning is enabled. Athena will check the different versions to determine which objects to include in the query result. I recommend creating an S3 lifecycle configuration policy to automatically delete any object versions marked for deletion.
Hopefully, this helps resolves your issue.
相關內容
- AWS 官方已更新 2 年前
- AWS 官方已更新 2 年前
Thank you for sharing possible reasons for this issue.
I've checked the all things that you mentioned but it's still not clear.
This bucket is not using versioning and the number of prefix is less than 1,500 while I heard that the limitation for GET requests per prefix is about 5,500.
Fortunately, although I did nothing for this issue yet, the schedule is working well since the issue occur...
Thank you very much anyway, and I should monitor some other metrics.