IAM Identity of AWS glue job - AccessDenied Exception

0

Hello - I have a glue job that reads data from glue catalog table, and writes it back into s3 in Delta format.

IAM role of the glue job has s3:PutObject, List, Describe and all other permissions needed to interact with s3 (read and write). However, I keep running into the error -

2022-12-14 13:48:09,274 ERROR [Thread-9] output.FileOutputCommitter (FileOutputCommitter.java:setupJob(360)): Mkdirs failed to create glue-d-xxx-data-catalog-t-<dataset-name>-m-w://<s3-prefix>/_temporary/0

2022-12-14 13:48:13,875 WARN [task-result-getter-2] scheduler.TaskSetManager (Logging.scala:logWarning(73)): Lost task 5.0 in stage 1.0 (TID 6) (172.34.113.239 executor 2): java.io.IOException: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: HG7ST1B44A6G30JC; S3 Extended Request ID: tR1CgoC1RcXZHEEZZ1DuDOvwIAmqC0+flXRd1ccdsY3C8PyjkEpS4wHDaosFoKpRskfH1Del/NA=; Proxy: null)

This error does not appear when I open up s3 bucket access with wildcard(principal:*) in the s3 bucket permissions section. Job fails even if I change the principal section to the same role as Glue jobs are associated with.

Now, my question is - is there a different identify that AWS Glue assumes to run the job. The IAM role associated with the job has all the permissions to interact with s3 but it throws above AccessDenied exception ( failed to create directory). However, job succeeds with wildcard(*) on s3 permissions.

Just to add some more context - this error does not happen when I am using native glue constructs like dynamic frame, spark data frame to read, process and persist data into s3. It only happens with delta format.

Below is the samplec code

src_dyf = glueContext.create_dynamic_frame.from_catalog(database="<db_name>", table_name="<>table_name_glue_catalog")
dset_df = src_dyf.toDF() # dynamic frame to dta frame conversion

# write data frame into s3 prefix in delta format. 
glueContext.write_data_frame_from_catalog(
    frame=dset_df,
    database="xxx_data_catalog",
    table_name="<tbale_name>",
    additional_options=additional_options  #contains key-value pair of s3 path with key, path
)

1 Answer
0

Hello. The error which you are facing generally occurs due to incomplete permissions in your IAM role or S3 bucket policy. To check what could be the exact issue, we would be needing the resource IDs of your IAM role and S3 bucket along with your Glue Job details. In order for me to troubleshoot further, please feel free to open a support case with AWS using the following link with the resource IDs and we would be happy to help.

https://console.aws.amazon.com/support/home#/case/create

AWS
SUPPORT ENGINEER
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions