Using SQL Alchemy ORM to query an AWS athena database. First off, here is the raw SQL query, which is working perfectly
WITH zi AS (SELECT
trip.start_dt,
address.zone_id
FROM view_midtable_esc_trip_details_cleaned AS trip
INNER JOIN view_midtable_esc_address_expanded AS address
ON trip.start_address_ref = address.address_ref),
all_trips AS (SELECT
date_parse(
concat(
date_format(zi.start_dt, '%Y-%m-%d %H:'),
-- The following line causing issues when converting to ORM
CAST(FLOOR(extract(minute from zi.start_dt)/15)*15 AS VARCHAR),
':00'),
'%Y-%m-%d %H:%i:%s'
) AS modified_timestamp,
zi.zone_id
FROM zi)
SELECT COUNT(*) as demand,
modified_timestamp,
zone_id FROM all_trips GROUP BY modified_timestamp, zone_id;
Now, here is the relevant part of the code where I am doing the minutes extraction from the timestamp, and replacing it with the following logic
floor(original_minutes/15)*15
minutes_interval:int=15
quantised_minute:BinaryExpression=func.FLOOR(func.extract('minute',
zi.columns[original_timestamp])/minutes_interval)*minutes_interval
modified_timestamp = func.date_format(func.concat(
func.date_format(zi.columns[original_timestamp], '%Y-%m-%d %H:'),
func.CAST(quantised_minute, VARCHAR)),
text("'%Y-%m-%d %H:%i'")).label(name=original_timestamp)
query:Query=session.query(modified_timestamp,
zi.columns[zone_col],
func.count().label(name=LABEL)).group_by(modified_timestamp, zi.columns[zone_col])
When I print the raw query that is sent to the engine, this is getting translated by ORM as (comments and formatting mine)
SELECT date_format(concat(date_format(anon_1.start_dt, '%Y-%m-%d %H:'),
-- What is this extra casting?
CAST(FLOOR(EXTRACT(minute FROM anon_1.start_dt) / CAST(15 AS NUMERIC)) * 15 AS VARCHAR)),
'%Y-%m-%d %H:%i') AS start_dt, anon_1.zone_id, count(*) AS total_unique_demand
FROM ...
and the python error message is
OperationalError: (pyathena.error.OperationalError) TYPE_MISMATCH: line 1:123: Unknown type: NUMERIC
[SQL: SELECT date_format(concat(date_format(anon_1.start_dt, '%%Y-%%m-%%d %%H:'), CAST(FLOOR(EXTRACT(minute FROM anon_1.start_dt) / CAST(15 AS NUMERIC)) * 15 AS VARCHAR)), '%%Y-%%m-%%d %%H:%%i') AS start_dt, anon_1.zone_id, count(*) AS total_unique_demand
So, to verify it, I put this in Athena
SELECT CAST(15 AS NUMERIC);
and, sure enough, it is throwing the exact same error.
So, the question is how to divide by 15 in SQL Alchemy in a way that is compatible with athena?
If it is important, I am using pyathena for engine.