Running spark job from AWS Lambda

0

I would like to get data from IceBerg table using AWS Lambda. I was able to create all the code and containers only to discover that AWS Lambda doesn't allow process substitution that spark uses here: https://github.com/apache/spark/blob/121f9338cefbb1c800fabfea5152899a58176b00/bin/spark-class#L92

The error is: /usr/local/lib/python3.10/dist-packages/pyspark/bin/spark-class: line 92: /dev/fd/63: No such file or directory

Do you maybe have some idea how this can be solved?

gefragt vor einem Jahr1017 Aufrufe
2 Antworten
0

Hi,

You can run it for sure. The thing is that you can´t initialise the env that way.

See this post for inspiration.

https://plainenglish.io/blog/spark-on-aws-lambda-c65877c0ac96

If not, my suggestion here is to change to a Java based Lambda ( Spark and Iceberg are based on Java Virtual Machine).

Bests.

profile pictureAWS
beantwortet vor einem Jahr
0

Hi,

not sure I understand your use case, if you just need to read the Iceberg table from the Lambda Function, a better option might be to invoke a query in Athena using pyAthena or the aws-sdk-pandas, which is provided as a Lambda Layer as well.

if you are trying to run some specific spark jobs and you would like the semplicity of a serverless function you might want to have a look at AWS Glue.

hope this helps

AWS
EXPERTE
beantwortet vor einem Jahr

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen