IAM Identity of AWS glue job - AccessDenied Exception

0

Hello - I have a glue job that reads data from glue catalog table, and writes it back into s3 in Delta format.

IAM role of the glue job has s3:PutObject, List, Describe and all other permissions needed to interact with s3 (read and write). However, I keep running into the error -

2022-12-14 13:48:09,274 ERROR [Thread-9] output.FileOutputCommitter (FileOutputCommitter.java:setupJob(360)): Mkdirs failed to create glue-d-xxx-data-catalog-t-<dataset-name>-m-w://<s3-prefix>/_temporary/0

2022-12-14 13:48:13,875 WARN [task-result-getter-2] scheduler.TaskSetManager (Logging.scala:logWarning(73)): Lost task 5.0 in stage 1.0 (TID 6) (172.34.113.239 executor 2): java.io.IOException: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: HG7ST1B44A6G30JC; S3 Extended Request ID: tR1CgoC1RcXZHEEZZ1DuDOvwIAmqC0+flXRd1ccdsY3C8PyjkEpS4wHDaosFoKpRskfH1Del/NA=; Proxy: null)

This error does not appear when I open up s3 bucket access with wildcard(principal:*) in the s3 bucket permissions section. Job fails even if I change the principal section to the same role as Glue jobs are associated with.

Now, my question is - is there a different identify that AWS Glue assumes to run the job. The IAM role associated with the job has all the permissions to interact with s3 but it throws above AccessDenied exception ( failed to create directory). However, job succeeds with wildcard(*) on s3 permissions.

Just to add some more context - this error does not happen when I am using native glue constructs like dynamic frame, spark data frame to read, process and persist data into s3. It only happens with delta format.

Below is the samplec code

src_dyf = glueContext.create_dynamic_frame.from_catalog(database="<db_name>", table_name="<>table_name_glue_catalog")
dset_df = src_dyf.toDF() # dynamic frame to dta frame conversion

# write data frame into s3 prefix in delta format. 
glueContext.write_data_frame_from_catalog(
    frame=dset_df,
    database="xxx_data_catalog",
    table_name="<tbale_name>",
    additional_options=additional_options  #contains key-value pair of s3 path with key, path
)

1개 답변
0

Hello. The error which you are facing generally occurs due to incomplete permissions in your IAM role or S3 bucket policy. To check what could be the exact issue, we would be needing the resource IDs of your IAM role and S3 bucket along with your Glue Job details. In order for me to troubleshoot further, please feel free to open a support case with AWS using the following link with the resource IDs and we would be happy to help.

https://console.aws.amazon.com/support/home#/case/create

AWS
지원 엔지니어
답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠