What are necessary permissions to query S3 from Athena + Spark?

0

I'm trying to play with the new Athena enhancement that provides support for spark notebooks.

I have data in S3 that follows a partition scheme like this:

s3://my_bucket/path1/p1=1642/p2=431/p3=2023-02-02 00:00:00/*.parquet"

I want to read it into a spark data frame by prefix. Something like:

df = spark.read.option("recursiveFileLookup","true").parquet('s3://my_bucket/path1/p1=1642/p2=431/*')

When I try that, I get an error that "path doesn't exist"

Even if I put the full path to a .parquet file, I still get an error related to the p3 partition:

pyspark.sql.utils.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: fdatehb=2023-02-02 00:00:00

I even tried to simply read a file from S3 using straight python:

`def read_file(bucket, key, encoding="utf-8") -> str:

file_obj = io.BytesIO()

bucket.download_fileobj(key, file_obj)

wrapper = io.TextIOWrapper(file_obj, encoding=encoding)

file_obj.seek(0)

return wrapper.read() ` And I still get errors related to permissions - which indicates to me that I have to setup some kind of relationship between Athena and the source bucket. I can already query this same data using "traditional" Athena (SQL), so it must be something additional (?)

But I can't find any documentation about this.

Who can point me in the right direction?

zack
asked a year ago514 views
2 Answers
0

Hello,

Have you set the region or try using the regional endpoint from here https://docs.aws.amazon.com/general/latest/gr/s3.html ?

Other topic to check - how are you authenticating? https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Authentication_properties e.g. fs.s3a.aws.credentials.provider

Third question is there any s3 resource based policy / ACL in place for the bucket?

And one more thing - is it S3 encrypted with KMS bucket or not? If encrypted, you need to check KMS key attached policy.

profile picture
answered a year ago
0

When you configure the Athena WorkGroup for Spark, you select a role to use. The role needs to have permissions to read those s3 files.

profile pictureAWS
EXPERT
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions