Getting Error Category: UNCLASSIFIED_ERROR; An error occurred while calling o107.pyWriteDynamicFrame. Exception thrown in awaitResult: when running Glue job to transfer data from S3 to Redshift

0

Hi. I am trying to run an AWS Glue job where I transfer data from S3 to Amazon Redshift. However, I am receiving the following error:

Error Category: UNCLASSIFIED_ERROR; An error occurred while calling o107.pyWriteDynamicFrame. Exception thrown in awaitResult:

I'm really not sure where to begin to try to resolve this error.

Here is the script for the Glue job that I am trying to run:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue import DynamicFrame

args = getResolvedOptions(sys.argv, ["JOB_NAME"])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)

# Script generated for node Amazon S3
AmazonS3_node1704977097218 = glueContext.create_dynamic_frame.from_options(
    format_options={"quoteChar": '"', "withHeader": True, "separator": ","},
    connection_type="s3",
    format="csv",
    connection_options={"paths": ["s3://smedia-data-raw-dev/google/"], "recurse": True},
    transformation_ctx="AmazonS3_node1704977097218",
)

# Script generated for node Amazon Redshift
AmazonRedshift_node1704977113638 = glueContext.write_dynamic_frame.from_options(
    frame=AmazonS3_node1704977097218,
    connection_type="redshift",
    connection_options={
        "redshiftTmpDir": "s3://aws-glue-assets-191965435652-us-east-1/temporary/",
        "useConnectionProperties": "true",
        "dbtable": "public.google_test",
        "connectionName": "Redshift connection",
        "preactions": "DROP TABLE IF EXISTS public.google_test; CREATE TABLE IF NOT EXISTS public.google_test (resourcename VARCHAR, status VARCHAR, basecampaign VARCHAR, name VARCHAR, id VARCHAR, campaignbudget VARCHAR, startdate VARCHAR, enddate VARCHAR, adservingoptimizationstatus VARCHAR, advertisingchanneltype VARCHAR, advertisingchannelsubtype VARCHAR, experimenttype VARCHAR, servingstatus VARCHAR, biddingstrategytype VARCHAR, domainname VARCHAR, languagecode VARCHAR, usesuppliedurlsonly VARCHAR, positivegeotargettype VARCHAR, negativegeotargettype VARCHAR, paymentmode VARCHAR, optimizationgoaltypes VARCHAR, date VARCHAR, averagecost VARCHAR, clicks VARCHAR, costmicros VARCHAR, impressions VARCHAR, useaudiencegrouped VARCHAR, activeviewmeasurablecostmicros VARCHAR, costperallconversions VARCHAR, costperconversion VARCHAR, invalidclicks VARCHAR, publisherpurchasedclicks VARCHAR, averagepageviews VARCHAR, videoviews VARCHAR, allconversionsbyconversiondate VARCHAR, allconversionsvaluebyconversiondate VARCHAR, conversionsbyconversiondate VARCHAR, conversionsvaluebyconversiondate VARCHAR, valueperallconversionsbyconversiondate VARCHAR, valueperconversionsbyconversiondate VARCHAR, allconversions VARCHAR, absolutetopimpressionpercentage VARCHAR, searchabsolutetopimpressionshare VARCHAR, averagecpc VARCHAR, searchimpressionshare VARCHAR, searchtopimpressionshare VARCHAR, activeviewctr VARCHAR, ctr VARCHAR, relativectr VARCHAR);",
    },
    transformation_ctx="AmazonRedshift_node1704977113638",
)

job.commit()

Any help on what I can do to resolve this issue would be greatly appreciated. Thank you!

Matt_J
질문됨 4달 전927회 조회
2개 답변
0

Check the cause of that stuck trace and any errors around it that tell you what went wrong.
Often, it's an issue with the parsing on CSV data when using internally COPY on Redshift, check the Redshift table stl_load_errors for errors.

profile pictureAWS
전문가
답변함 4달 전
  • Hi. Thank you for your response.

    I checked the logs but couldn't find any errors around the error I was getting. I couldn't even find the specific error I was getting. Also, I'm using dBeaver to connect to my Redshift cluster, and queried stl_load_errors, but I am getting zero results. Maybe I am doing something wrong. Do you have any suggestions?

0

Hi there, I have test and write some script for you

from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder.getOrCreate()

# Retrieve connection information from Glue connection
redshift_properties = {
    "user": "admin",
    "password": "<Redshift password>",
}

# Define the path to the CSV file in S3
csv_file_path = "s3://<your-bucket>/employees.csv"

# Read the CSV file from S3 into a DataFrame
df = spark.read.format("csv").option("header", "true").load(csv_file_path)

# Write the DataFrame to the Redshift table
df.write.jdbc(url="jdbc:redshift://<your-workgroup>.<your-account>.ap-southeast-1.redshift-serverless.amazonaws.com:5439/dev", table="public.<your-table>", mode="overwrite", properties=redshift_properties)

Use Author code with a script editor instead of Visual ETL offer more custom with your code. And remember to create a connection to Redshift and test your connection. Also add IAM role for your job with same IAM role that you tested with the connection

Here is result

Enter image description here

I have tested and it work. So if you want to support more, please notify me. If it helpful, please vote me 🤣

profile picture
답변함 4달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠