Pass glue job parameters dynamically to a redshift table query

0

Problem Overview

It's not clear from the documentation how to dynamically pass parameters to query a Amazon Redshift database with the create_dynamic_frame.from_options method. When using the AWS SDK For pandas, we can use the `awswrangler.redshift.read_sql_query() and use the "params" method to pass these values. I am looking for something similar. Every time my glue job runs, I need to dynamically pass in a filter on a specific column on my redshift table. In the example below "dynamic_value" is the value I want to pass to my where statement to my redshift table:

Sample query to redshift table

def get_queried_results(dynamic_value):
   """gets queried results from an Amazon Redshift database"""
   dyf= glueContext.create_dynamic_frame.from_options(
   connection_type="redshift",
   connection_options={
   "sampleQuery": """select * from public.test
   where column1= dynamic_value""",
   "redshiftTmpDir": f"s3://{s3_bucket}/processed/temp/redshift/",
   "useConnectionProperties": "true",
   "dbtable": "public.test",
   "connectionName": "redshift1"})
   return dyf

dyf= get_queried_results(dynamic_value='2023-02')

Question I know the above example is no SQL injection safe but just using it as an example. Is there a parameter on this method that we should be passing our values to in pyspark so it can filter our table based on a value of a column?

Adriano
asked a year ago873 views
1 Answer
0

In this case, just put that inside the query, you can use the Python string interpolation to that easily:

dynamic_value = "value"
....
"sampleQuery": f"""select * from public.test  where column1= {dynamic_value}"""
profile pictureAWS
EXPERT
answered a year ago
  • Hi Gonzalo,

    When attempting this answer using a common table expression in the "sampleQuery" for example

    with cte_exp as
    (
    select column
    from table1
    where
    column = {parameter}
    )
    select * from table2
    left join cte_exp
    on cte_exp.field1 = table2.field2
    where table2.column = parameter
    
    

    The first usage of the parameter is not being detected, while the second usage is. I have tried adding and removing the {} for both uses, but to no effect. It's not detecting the parameters correctly. Do you have any suggetions?

  • Notice this is Python string interpolation (notice the f at the beginning), it has to be translated before it's passed on (print it to verify it has the correct value)

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions