- Newest
- Most votes
- Most comments
I understand that you want to make use of partition projection on your table. You can make use of partition projection on top of parquet data. There is no limitation mentioned as such. The format of partitions that you are using is the right one only. Athena only supports hive-style partitioning so the way that you are defining your partitions is correct.
Coming to the issue that you are facing, you are unable to view any data when you run the particular query. Please have a look at this documentation which contains the considerations and limitations while making use of partitions. The Athena query will not return any rows in case the projected partition does not exist in the s3 bucket. But, seems like you are doing that correctly as well.
I see that your s3 bucket consists of another partition 'aws-service' which is not registered in your Glue table as a partition key. Thus, please try adding that and check whether the query runs or not.
Please go through this documentation about the supported data types that can be implemented for partition projection. You can make use of date type for partition projection data type.
I hope that this information would be of use to you! If you require in-depth investigation and analysis of any issue that you are facing in Athena then I would suggest you to raise a ticket with Athena technical support team.
Hello buddy. I ended up in the same situation as you today and found your question here. So I created this public Gist with what is working for me right now.
I'm using hourly partition in the Flow Logs, but if you don't use it, just remove the occurrences of the "hour" field from "PARTITIONED BY" and "TBLPROPERTIES".
Hope this helps you too. =)
Relevant content
- Accepted Answerasked 2 years ago
- asked 2 years ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated 3 months ago
Thanks for your response @chaitu. Related to what you mentioned about not having 'aws-service' as a partition, I did add it as a partition as well as got rid of the template storage location since partition projection should recognize the column values automatically without being told where they are. With those changes I am not getting back data as expected! But I am still unclear on how to format a date partition in this format. Would this be correct for the data type format with hive compatible prefixes?
"projection.day.format" = "year=yyyy/month=MM/day=dd"
The projection format can be like this: projection.columnName.format=yyyyMM/dd-MM-yyyy/dd-MM-yyyy-HH-mm-ss
Note: A date format string based on the Java date format DateTimeFormatter. Can be any supported Java.time.* type.