Partition Project leading pad leading zeros on dates

0

My create table SQL looks like the below (table structure greatly simplified). I'm using partition projection.

I am trying to figure out how to handle date strings. For easier SQL querying purposes, I want the dates to have leading zeros, eg "2022/09/06". However the S3 buckets unfortunately are using non padded zero, eg, "2022/9/6".

Is there a way to alter the below query to accomplish this? If I simply change the line 'projection.dt.format'='yyyy/M/d', to 'projection.dt.format'='yyyy/MM/dd', Then the table partitions don't pick up a date like "2022/9/6".

 CREATE EXTERNAL TABLE IF NOT EXISTS TableName (
  userkey string,
  id string,
  createdon string,
  siteid int,
 --snipped---
)
PARTITIONED BY ( 
  `dt` string)
ROW FORMAT SERDE 
  'org.openx.data.jsonserde.JsonSerDe' 
WITH SERDEPROPERTIES ( 
  'ignore.malformed.json'='true') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://company-log-split/{{NetworkIdHandler.value}}/company-log'
TBLPROPERTIES (
  'projection.enabled'='true', 
  'projection.dt.format'='yyyy/M/d', 
  'projection.dt.interval'='1', 
  'projection.dt.interval.unit'='DAYS', 
  'projection.dt.range'='NOW - 3 MONTHS,NOW + 5 YEARS', 
  'projection.dt.type'='date',
  'storage.location.template'='s3://company-log-split/{{NetworkIdHandler.value}}/company-log/${dt}'
) 
질문됨 2년 전267회 조회
2개 답변
0
수락된 답변

Hello,

As you have correctly pointed out, Athena query can only pick up the underlying date partitions and data when the date partition format in 'storage.location.template' matches the actual path in the S3 bucket. Because the '${dt}' path format in 'storage.location.template' depends on the date format defined in 'projection.dt.format', it is not possible to define or map the zero padded format to non zero padded format with Athena Partition Projection.

A possible workaround is using conventional Hive partitions instead of Partition Projection, for example, after created the table without Partition Projection, you can use ALTER TABLE ADD PARTITION query to add and map date partitions to the s3 path manually or programmatically:

ALTER TABLE my_table_name ADD PARTITION (dt = '2022/09/06') LOCATION 's3://mystorage/path/to/2022/9/6/';

To add partitions in batch programmatically, you can construct the Athena queries using a custom python script and execute them via start_query_execution API. Reference: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/athena.html#Athena.Client.start_query_execution

AWS
Ethan_H
답변함 2년 전
0

Thanks very much for the answer. I was afraid of that. We switched from the old "ADD PARTITION" method to "PARTITION PROJECTION" so as to make things easier, but in the case of leading zeros, it didn't. :(

답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인