While loading redshift with Glue job Null records are getting dropped. How to handle this?

0

Trying to replace values with NULL in redshift table. though glue job. values are not replacing with null.

what I am misssing here?

  • You mean you are uploading rows to Redshift from Glue and then in the table they have a empty strings instead of NULL values?

Manisha
질문됨 일 년 전379회 조회
1개 답변
0

If you are trying to replace values with NULL in a Redshift table using Glue job, you can use the replace_null() transformation function in Glue. Here's an example of how to do it:

from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session

## Create a DynamicFrame using a Glue Catalog table
dynamic_frame = glueContext.create_dynamic_frame.from_catalog(database = "database_name", table_name = "table_name")

## Replace values with NULL using replace_null() function
dynamic_frame = ApplyMapping.apply(frame = dynamic_frame, mappings = [("column_name", "string", "column_name", "string"), ("column_with_value_to_replace", "string", "column_with_value_to_replace", "string")])
dynamic_frame = Map.apply(frame = dynamic_frame, f = lambda x: {"column_name": x["column_name"], "column_with_value_to_replace": replace_null(x["column_with_value_to_replace"])})


## Convert DynamicFrame back to DataFrame and write to Redshift
data_frame = dynamic_frame.toDF()
data_frame.write.format("com.databricks.spark.redshift").option("url", "jdbc:redshift://<redshift_endpoint>:<port>/<database>?user=<user>&password=<password>").option("dbtable", "table_name").option("tempdir", "s3://<s3_bucket_path>/temp").mode("overwrite").save()

In the above code, you can see that we use the replace_null() function to replace any value in the "column_with_value_to_replace" column with NULL. You can add more columns as needed in the mappings list. The Map function is used to apply the replace_null() function to each row.

Make sure to replace the placeholders in the option function with your own values for the Redshift endpoint, port, database, user, password, and S3 bucket path.

If your Glue job is still dropping null records after implementing this, please provide more details on your Glue job and the data you are trying to load.

hash
답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠