Partition Project leading pad leading zeros on dates

0

My create table SQL looks like the below (table structure greatly simplified). I'm using partition projection.

I am trying to figure out how to handle date strings. For easier SQL querying purposes, I want the dates to have leading zeros, eg "2022/09/06". However the S3 buckets unfortunately are using non padded zero, eg, "2022/9/6".

Is there a way to alter the below query to accomplish this? If I simply change the line 'projection.dt.format'='yyyy/M/d', to 'projection.dt.format'='yyyy/MM/dd', Then the table partitions don't pick up a date like "2022/9/6".

 CREATE EXTERNAL TABLE IF NOT EXISTS TableName (
  userkey string,
  id string,
  createdon string,
  siteid int,
 --snipped---
)
PARTITIONED BY ( 
  `dt` string)
ROW FORMAT SERDE 
  'org.openx.data.jsonserde.JsonSerDe' 
WITH SERDEPROPERTIES ( 
  'ignore.malformed.json'='true') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://company-log-split/{{NetworkIdHandler.value}}/company-log'
TBLPROPERTIES (
  'projection.enabled'='true', 
  'projection.dt.format'='yyyy/M/d', 
  'projection.dt.interval'='1', 
  'projection.dt.interval.unit'='DAYS', 
  'projection.dt.range'='NOW - 3 MONTHS,NOW + 5 YEARS', 
  'projection.dt.type'='date',
  'storage.location.template'='s3://company-log-split/{{NetworkIdHandler.value}}/company-log/${dt}'
) 
gefragt vor 2 Jahren267 Aufrufe
2 Antworten
0
Akzeptierte Antwort

Hello,

As you have correctly pointed out, Athena query can only pick up the underlying date partitions and data when the date partition format in 'storage.location.template' matches the actual path in the S3 bucket. Because the '${dt}' path format in 'storage.location.template' depends on the date format defined in 'projection.dt.format', it is not possible to define or map the zero padded format to non zero padded format with Athena Partition Projection.

A possible workaround is using conventional Hive partitions instead of Partition Projection, for example, after created the table without Partition Projection, you can use ALTER TABLE ADD PARTITION query to add and map date partitions to the s3 path manually or programmatically:

ALTER TABLE my_table_name ADD PARTITION (dt = '2022/09/06') LOCATION 's3://mystorage/path/to/2022/9/6/';

To add partitions in batch programmatically, you can construct the Athena queries using a custom python script and execute them via start_query_execution API. Reference: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/athena.html#Athena.Client.start_query_execution

AWS
Ethan_H
beantwortet vor 2 Jahren
0

Thanks very much for the answer. I was afraid of that. We switched from the old "ADD PARTITION" method to "PARTITION PROJECTION" so as to make things easier, but in the case of leading zeros, it didn't. :(

beantwortet vor 2 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen