Querying LZ4 compressed file on s3 using AWS Athena

0

Hi I have created an external table on AWS Glue catalog db .

The table points to a lz4 compressed file on an s3.

the table definition looks like this

CREATE EXTERNAL TABLE `myapplogs`(
  `timestamp` string COMMENT 'from deserializer', 
  `num` string COMMENT 'from deserializer', 
  `num2` string COMMENT 'from deserializer', 
  `num3` string COMMENT 'from deserializer')
ROW FORMAT SERDE 
  'com.amazonaws.glue.serde.GrokSerDe' 
WITH SERDEPROPERTIES ( 
  'input.format'='%{TIMESTAMP_ISO8601:timestamp} %{INT:num} %{INT:num2} %{INT:num3}') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://tesbucket/original/'
TBLPROPERTIES (
  
  'grokPattern'='%{TIMESTAMP_ISO8601:timestamp} %{INT:num} %{INT:num2} %{INT:num3}', 
  'typeOfData'='file')

The table gets created succesfully but the select queries are not returning any data

Pradeep
asked 17 days ago273 views
1 Answer
0

The issue could be with how the file is compressed as 'lz4'.

There are 2 ways in which lz4 compresses the data:

  1. Block format : Refer - The legacy format is a simple block-based compression format where each block of compressed data is standalone and does not contain any header or framing information. It directly represents the compressed data without any additional metadata.

  2. Framing format : Refer : The standard format, also known as LZ4 framing, introduces a framing mechanism where each block of compressed data is preceded by a small header containing metadata and framing information. This framing header provides additional features such as the ability to include metadata about the compressed data (e.g., the size of the original uncompressed data), optional content checksums, and other parameters.

While the lz4 utility uses framing format which is not support by Athena. Currently Athena does not support Framing format, hence, please review which format is used to compress your file.

AWS
Anu_C
answered 15 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions