By using AWS re:Post, you agree to the AWS re:Post Terms of Use

HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split

0

I am trying to extract current month data from database which is having 3 months data getting following error: HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split

SELECT fc, pl, bucket, scheduled_qty, snapshot_date
FROM "self_serve_database"."inverted_index_without_fc_dayresource_type_pl"
WHERE fc IN ('AMD2', 'BLR8', 'BLR4', 'BOM5', 'CCX2', 'DEL5', 'DED5', 'DED3', 'HYD8', 'ISK3', 'JPX2', 'MAA4', 'LKO1', 'PNQ3')
  AND pl IN ('ARIPL Grocery', 'ARIPL Drugstore', 'ARIPL Pantry', 'ARIPL Prescription Drugs', 'ARIPL Pet Products', 'ARIPL Wine', 'ARIPL Beauty', 'ARIPL Baby', 'ARIPL Gourmet')
  AND regexp_extract(snapshot_date, '^(\\d{4})', 1) = TO_CHAR(CURRENT_DATE, 'YYYY')
  AND regexp_extract(snapshot_date, '^\\d{4}-(\\d{2})', 1) = TO_CHAR(CURRENT_DATE, 'MM')
    AND snapshot_date BETWEEN CAST(DATE_ADD('day', -13, CURRENT_DATE) AS VARCHAR) AND CAST(CURRENT_DATE AS VARCHAR)
     --AND snapshot_date >=  CAST(DATE_ADD('day', -13, CURRENT_DATE) AS VARCHAR)
  --AND snapshot_date <= CAST(CURRENT_DATE AS VARCHAR)
ORDER BY snapshot_date

Error-

HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://self-serve-fc-resources-snapshot-eu-in-prod/inverted_index_without_fc_day/resource_type=pl/snapshot_date=2024-09-24/snapshot_time=04-53/10810d7d92994b1fa3854ee423b91209.snappy.parquet (offset=0, length=26414):

asked a month ago56 views
1 Answer
0

The error you're encountering, "HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split," typically occurs when there's an issue accessing or reading the data files. In this case, it seems to be related to a Parquet file in your S3 bucket.

There are a few potential reasons for this error:

  1. Permissions: Ensure that the IAM role or user executing the query has the necessary permissions to access the S3 bucket and read the files.

  2. File corruption: The Parquet file mentioned in the error message might be corrupted. You may need to check the integrity of the file or recreate it.

  3. S3 access issues: There could be temporary S3 access problems. Try running the query again after a short wait.

  4. Incompatible Parquet versions: Ensure that the Parquet file version is compatible with the version of Hive you're using.

  5. Partition mismatch: The error mentions a date in 2024, which is in the future. Make sure your partition values and data are consistent and within the expected range.

To troubleshoot:

  1. Verify S3 permissions and access.
  2. Check the integrity of the Parquet files in your S3 bucket.
  3. Ensure your partition values are correct and within the expected date range.
  4. If possible, try accessing a smaller subset of data or a specific partition to isolate the issue.
  5. Review your Hive and EMR configurations to ensure they're set up correctly for handling Parquet files and S3 access.

If the issue persists, you may need to involve your system administrator or AWS support to investigate further, as they can access more detailed logs and configuration information.
Sources
Resolve the Athena error “HIVE_INVALID_PARTITION_VALUE” | AWS re:Post
Amazon EMR 6.6.0 - Hive release notes - Amazon EMR

profile picture
answered a month ago
profile picture
EXPERT
reviewed a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions