Running spark job from AWS Lambda

0

I would like to get data from IceBerg table using AWS Lambda. I was able to create all the code and containers only to discover that AWS Lambda doesn't allow process substitution that spark uses here: https://github.com/apache/spark/blob/121f9338cefbb1c800fabfea5152899a58176b00/bin/spark-class#L92

The error is: /usr/local/lib/python3.10/dist-packages/pyspark/bin/spark-class: line 92: /dev/fd/63: No such file or directory

Do you maybe have some idea how this can be solved?

已提問 1 年前檢視次數 1017 次
2 個答案
0

Hi,

You can run it for sure. The thing is that you can´t initialise the env that way.

See this post for inspiration.

https://plainenglish.io/blog/spark-on-aws-lambda-c65877c0ac96

If not, my suggestion here is to change to a Java based Lambda ( Spark and Iceberg are based on Java Virtual Machine).

Bests.

profile pictureAWS
已回答 1 年前
0

Hi,

not sure I understand your use case, if you just need to read the Iceberg table from the Lambda Function, a better option might be to invoke a query in Athena using pyAthena or the aws-sdk-pandas, which is provided as a Lambda Layer as well.

if you are trying to run some specific spark jobs and you would like the semplicity of a serverless function you might want to have a look at AWS Glue.

hope this helps

AWS
專家
已回答 1 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南