I can't find a proper way of setting the correct data type for a timestamp attribute on my Athena table parquet in order to query for time intervals.
im creating the table via a crawler on parquet files resultant from a glue interactive job (PySpark)
**below my Athena table and the timestamp formats im trying **
- ts (datatype timestamp) : 2023-02-07 23:59:59.460000
- ts_iso_stamp (datatype timestamp) : 2023-02-07 23:59:59.460000
- ts_iso_str (datatype timestamp) : "2023-02-07T23:59:59.460000+00:00"
im applying this mapping in my glue job to create the various formats
ApplyMapping.apply(frame = dyf, mappings = [ ("date", "date", "date", "date")
,("ts", "string", "ts", "timestamp")
,("ts_iso", "string", "ts_iso_stamp", "timestamp")
,("ts_iso", "string", "ts_iso_str", "string") ]
ts_iso looks like this when you run .show() on the Dynamic Dataframe. a valid iso format as far as I understand.
"ts_iso": "2023-02-07T23:59:59.460000+00:00"
query examples failing same error
select * from searchdb.queries_lu where appid = 'aHs7sPcX66ytv2FqmRnv'
AND ts >= TIMESTAMP '2023-02-07 23:59:59.405000'
AND ts <= TIMESTAMP '2023-02-08 00:00:00.637000';
select * from searchdb.queries_lu where appid = 'aHs7sPcX66ytv2FqmRnv'
AND ts_iso_stamp >= TIMESTAMP '2023-02-07 23:59:59.405000'
AND ts_iso_stamp <= TIMESTAMP '2023-02-08 00:00:00.637000';
select * from searchdb.queries_lu where appid = 'aHs7sPcX66ytv2FqmRnv'
AND ts_iso_str >= TIMESTAMP '2023-02-07T23:59:59.405000+00:00'
AND ts_iso_str <= TIMESTAMP '2023-02-08T00:00:00.637000+00:00';
Athena DDL
create table searchdb.queries_ac
(
ts timestamp(23, 3) ,
ts_iso_stamp timestamp(23,3),
ts_iso_str string(255),
endpoint string(255),
appid string(255),
requestid string(255),
"in" string(255),
q string(255),
qq string(255),
results_id string(255),
results_count string(255),
requestline string(255),
"date" string(255)
);
Thanks Gonzalo , that was the missing part to dominate the Timestamp challenge . Regards