Problem Overview
It's not clear from the documentation how to dynamically pass parameters to query a Amazon Redshift database with the create_dynamic_frame.from_options method. When using the AWS SDK For pandas, we can use the `awswrangler.redshift.read_sql_query() and use the "params" method to pass these values. I am looking for something similar. Every time my glue job runs, I need to dynamically pass in a filter on a specific column on my redshift table. In the example below "dynamic_value" is the value I want to pass to my where statement to my redshift table:
Sample query to redshift table
def get_queried_results(dynamic_value):
"""gets queried results from an Amazon Redshift database"""
dyf= glueContext.create_dynamic_frame.from_options(
connection_type="redshift",
connection_options={
"sampleQuery": """select * from public.test
where column1= dynamic_value""",
"redshiftTmpDir": f"s3://{s3_bucket}/processed/temp/redshift/",
"useConnectionProperties": "true",
"dbtable": "public.test",
"connectionName": "redshift1"})
return dyf
dyf= get_queried_results(dynamic_value='2023-02')
Question
I know the above example is no SQL injection safe but just using it as an example. Is there a parameter on this method that we should be passing our values to in pyspark so it can filter our table based on a value of a column?
Hi Gonzalo,
When attempting this answer using a common table expression in the "sampleQuery" for example
The first usage of the parameter is not being detected, while the second usage is. I have tried adding and removing the {} for both uses, but to no effect. It's not detecting the parameters correctly. Do you have any suggetions?
Notice this is Python string interpolation (notice the f at the beginning), it has to be translated before it's passed on (print it to verify it has the correct value)