Getting Error Category: UNCLASSIFIED_ERROR; An error occurred while calling o107.pyWriteDynamicFrame. Exception thrown in awaitResult: when running Glue job to transfer data from S3 to Redshift

0

Hi. I am trying to run an AWS Glue job where I transfer data from S3 to Amazon Redshift. However, I am receiving the following error:

Error Category: UNCLASSIFIED_ERROR; An error occurred while calling o107.pyWriteDynamicFrame. Exception thrown in awaitResult:

I'm really not sure where to begin to try to resolve this error.

Here is the script for the Glue job that I am trying to run:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue import DynamicFrame

args = getResolvedOptions(sys.argv, ["JOB_NAME"])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)

# Script generated for node Amazon S3
AmazonS3_node1704977097218 = glueContext.create_dynamic_frame.from_options(
    format_options={"quoteChar": '"', "withHeader": True, "separator": ","},
    connection_type="s3",
    format="csv",
    connection_options={"paths": ["s3://smedia-data-raw-dev/google/"], "recurse": True},
    transformation_ctx="AmazonS3_node1704977097218",
)

# Script generated for node Amazon Redshift
AmazonRedshift_node1704977113638 = glueContext.write_dynamic_frame.from_options(
    frame=AmazonS3_node1704977097218,
    connection_type="redshift",
    connection_options={
        "redshiftTmpDir": "s3://aws-glue-assets-191965435652-us-east-1/temporary/",
        "useConnectionProperties": "true",
        "dbtable": "public.google_test",
        "connectionName": "Redshift connection",
        "preactions": "DROP TABLE IF EXISTS public.google_test; CREATE TABLE IF NOT EXISTS public.google_test (resourcename VARCHAR, status VARCHAR, basecampaign VARCHAR, name VARCHAR, id VARCHAR, campaignbudget VARCHAR, startdate VARCHAR, enddate VARCHAR, adservingoptimizationstatus VARCHAR, advertisingchanneltype VARCHAR, advertisingchannelsubtype VARCHAR, experimenttype VARCHAR, servingstatus VARCHAR, biddingstrategytype VARCHAR, domainname VARCHAR, languagecode VARCHAR, usesuppliedurlsonly VARCHAR, positivegeotargettype VARCHAR, negativegeotargettype VARCHAR, paymentmode VARCHAR, optimizationgoaltypes VARCHAR, date VARCHAR, averagecost VARCHAR, clicks VARCHAR, costmicros VARCHAR, impressions VARCHAR, useaudiencegrouped VARCHAR, activeviewmeasurablecostmicros VARCHAR, costperallconversions VARCHAR, costperconversion VARCHAR, invalidclicks VARCHAR, publisherpurchasedclicks VARCHAR, averagepageviews VARCHAR, videoviews VARCHAR, allconversionsbyconversiondate VARCHAR, allconversionsvaluebyconversiondate VARCHAR, conversionsbyconversiondate VARCHAR, conversionsvaluebyconversiondate VARCHAR, valueperallconversionsbyconversiondate VARCHAR, valueperconversionsbyconversiondate VARCHAR, allconversions VARCHAR, absolutetopimpressionpercentage VARCHAR, searchabsolutetopimpressionshare VARCHAR, averagecpc VARCHAR, searchimpressionshare VARCHAR, searchtopimpressionshare VARCHAR, activeviewctr VARCHAR, ctr VARCHAR, relativectr VARCHAR);",
    },
    transformation_ctx="AmazonRedshift_node1704977113638",
)

job.commit()

Any help on what I can do to resolve this issue would be greatly appreciated. Thank you!

Matt_J
asked 3 months ago805 views
2 Answers
0

Check the cause of that stuck trace and any errors around it that tell you what went wrong.
Often, it's an issue with the parsing on CSV data when using internally COPY on Redshift, check the Redshift table stl_load_errors for errors.

profile pictureAWS
EXPERT
answered 3 months ago
  • Hi. Thank you for your response.

    I checked the logs but couldn't find any errors around the error I was getting. I couldn't even find the specific error I was getting. Also, I'm using dBeaver to connect to my Redshift cluster, and queried stl_load_errors, but I am getting zero results. Maybe I am doing something wrong. Do you have any suggestions?

0

Hi there, I have test and write some script for you

from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder.getOrCreate()

# Retrieve connection information from Glue connection
redshift_properties = {
    "user": "admin",
    "password": "<Redshift password>",
}

# Define the path to the CSV file in S3
csv_file_path = "s3://<your-bucket>/employees.csv"

# Read the CSV file from S3 into a DataFrame
df = spark.read.format("csv").option("header", "true").load(csv_file_path)

# Write the DataFrame to the Redshift table
df.write.jdbc(url="jdbc:redshift://<your-workgroup>.<your-account>.ap-southeast-1.redshift-serverless.amazonaws.com:5439/dev", table="public.<your-table>", mode="overwrite", properties=redshift_properties)

Use Author code with a script editor instead of Visual ETL offer more custom with your code. And remember to create a connection to Redshift and test your connection. Also add IAM role for your job with same IAM role that you tested with the connection

Here is result

Enter image description here

I have tested and it work. So if you want to support more, please notify me. If it helpful, please vote me 🤣

profile picture
answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions