While loading redshift with Glue job Null records are getting dropped. How to handle this?

0

Trying to replace values with NULL in redshift table. though glue job. values are not replacing with null.

what I am misssing here?

  • You mean you are uploading rows to Redshift from Glue and then in the table they have a empty strings instead of NULL values?

Manisha
asked a year ago375 views
1 Answer
0

If you are trying to replace values with NULL in a Redshift table using Glue job, you can use the replace_null() transformation function in Glue. Here's an example of how to do it:

from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session

## Create a DynamicFrame using a Glue Catalog table
dynamic_frame = glueContext.create_dynamic_frame.from_catalog(database = "database_name", table_name = "table_name")

## Replace values with NULL using replace_null() function
dynamic_frame = ApplyMapping.apply(frame = dynamic_frame, mappings = [("column_name", "string", "column_name", "string"), ("column_with_value_to_replace", "string", "column_with_value_to_replace", "string")])
dynamic_frame = Map.apply(frame = dynamic_frame, f = lambda x: {"column_name": x["column_name"], "column_with_value_to_replace": replace_null(x["column_with_value_to_replace"])})


## Convert DynamicFrame back to DataFrame and write to Redshift
data_frame = dynamic_frame.toDF()
data_frame.write.format("com.databricks.spark.redshift").option("url", "jdbc:redshift://<redshift_endpoint>:<port>/<database>?user=<user>&password=<password>").option("dbtable", "table_name").option("tempdir", "s3://<s3_bucket_path>/temp").mode("overwrite").save()

In the above code, you can see that we use the replace_null() function to replace any value in the "column_with_value_to_replace" column with NULL. You can add more columns as needed in the mappings list. The Map function is used to apply the replace_null() function to each row.

Make sure to replace the placeholders in the option function with your own values for the Redshift endpoint, port, database, user, password, and S3 bucket path.

If your Glue job is still dropping null records after implementing this, please provide more details on your Glue job and the data you are trying to load.

hash
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions