Running spark job from AWS Lambda

0

I would like to get data from IceBerg table using AWS Lambda. I was able to create all the code and containers only to discover that AWS Lambda doesn't allow process substitution that spark uses here: https://github.com/apache/spark/blob/121f9338cefbb1c800fabfea5152899a58176b00/bin/spark-class#L92

The error is: /usr/local/lib/python3.10/dist-packages/pyspark/bin/spark-class: line 92: /dev/fd/63: No such file or directory

Do you maybe have some idea how this can be solved?

asked a year ago994 views
2 Answers
0

Hi,

You can run it for sure. The thing is that you can´t initialise the env that way.

See this post for inspiration.

https://plainenglish.io/blog/spark-on-aws-lambda-c65877c0ac96

If not, my suggestion here is to change to a Java based Lambda ( Spark and Iceberg are based on Java Virtual Machine).

Bests.

profile pictureAWS
answered a year ago
0

Hi,

not sure I understand your use case, if you just need to read the Iceberg table from the Lambda Function, a better option might be to invoke a query in Athena using pyAthena or the aws-sdk-pandas, which is provided as a Lambda Layer as well.

if you are trying to run some specific spark jobs and you would like the semplicity of a serverless function you might want to have a look at AWS Glue.

hope this helps

AWS
EXPERT
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions