What are necessary permissions to query S3 from Athena + Spark?

0

I'm trying to play with the new Athena enhancement that provides support for spark notebooks.

I have data in S3 that follows a partition scheme like this:

s3://my_bucket/path1/p1=1642/p2=431/p3=2023-02-02 00:00:00/*.parquet"

I want to read it into a spark data frame by prefix. Something like:

df = spark.read.option("recursiveFileLookup","true").parquet('s3://my_bucket/path1/p1=1642/p2=431/*')

When I try that, I get an error that "path doesn't exist"

Even if I put the full path to a .parquet file, I still get an error related to the p3 partition:

pyspark.sql.utils.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: fdatehb=2023-02-02 00:00:00

I even tried to simply read a file from S3 using straight python:

`def read_file(bucket, key, encoding="utf-8") -> str:

file_obj = io.BytesIO()

bucket.download_fileobj(key, file_obj)

wrapper = io.TextIOWrapper(file_obj, encoding=encoding)

file_obj.seek(0)

return wrapper.read() ` And I still get errors related to permissions - which indicates to me that I have to setup some kind of relationship between Athena and the source bucket. I can already query this same data using "traditional" Athena (SQL), so it must be something additional (?)

But I can't find any documentation about this.

Who can point me in the right direction?

zack
posta un anno fa525 visualizzazioni
2 Risposte
0

Hello,

Have you set the region or try using the regional endpoint from here https://docs.aws.amazon.com/general/latest/gr/s3.html ?

Other topic to check - how are you authenticating? https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Authentication_properties e.g. fs.s3a.aws.credentials.provider

Third question is there any s3 resource based policy / ACL in place for the bucket?

And one more thing - is it S3 encrypted with KMS bucket or not? If encrypted, you need to check KMS key attached policy.

profile picture
con risposta un anno fa
0

When you configure the Athena WorkGroup for Spark, you select a role to use. The role needs to have permissions to read those s3 files.

profile pictureAWS
ESPERTO
con risposta un anno fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande