What are necessary permissions to query S3 from Athena + Spark?

0

I'm trying to play with the new Athena enhancement that provides support for spark notebooks.

I have data in S3 that follows a partition scheme like this:

s3://my_bucket/path1/p1=1642/p2=431/p3=2023-02-02 00:00:00/*.parquet"

I want to read it into a spark data frame by prefix. Something like:

df = spark.read.option("recursiveFileLookup","true").parquet('s3://my_bucket/path1/p1=1642/p2=431/*')

When I try that, I get an error that "path doesn't exist"

Even if I put the full path to a .parquet file, I still get an error related to the p3 partition:

pyspark.sql.utils.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: fdatehb=2023-02-02 00:00:00

I even tried to simply read a file from S3 using straight python:

`def read_file(bucket, key, encoding="utf-8") -> str:

file_obj = io.BytesIO()

bucket.download_fileobj(key, file_obj)

wrapper = io.TextIOWrapper(file_obj, encoding=encoding)

file_obj.seek(0)

return wrapper.read() ` And I still get errors related to permissions - which indicates to me that I have to setup some kind of relationship between Athena and the source bucket. I can already query this same data using "traditional" Athena (SQL), so it must be something additional (?)

But I can't find any documentation about this.

Who can point me in the right direction?

zack
gefragt vor einem Jahr525 Aufrufe
2 Antworten
0

Hello,

Have you set the region or try using the regional endpoint from here https://docs.aws.amazon.com/general/latest/gr/s3.html ?

Other topic to check - how are you authenticating? https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Authentication_properties e.g. fs.s3a.aws.credentials.provider

Third question is there any s3 resource based policy / ACL in place for the bucket?

And one more thing - is it S3 encrypted with KMS bucket or not? If encrypted, you need to check KMS key attached policy.

profile picture
beantwortet vor einem Jahr
0

When you configure the Athena WorkGroup for Spark, you select a role to use. The role needs to have permissions to read those s3 files.

profile pictureAWS
EXPERTE
beantwortet vor einem Jahr

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen