Running spark job from AWS Lambda

0

I would like to get data from IceBerg table using AWS Lambda. I was able to create all the code and containers only to discover that AWS Lambda doesn't allow process substitution that spark uses here: https://github.com/apache/spark/blob/121f9338cefbb1c800fabfea5152899a58176b00/bin/spark-class#L92

The error is: /usr/local/lib/python3.10/dist-packages/pyspark/bin/spark-class: line 92: /dev/fd/63: No such file or directory

Do you maybe have some idea how this can be solved?

posta un anno fa1016 visualizzazioni
2 Risposte
0

Hi,

You can run it for sure. The thing is that you can´t initialise the env that way.

See this post for inspiration.

https://plainenglish.io/blog/spark-on-aws-lambda-c65877c0ac96

If not, my suggestion here is to change to a Java based Lambda ( Spark and Iceberg are based on Java Virtual Machine).

Bests.

profile pictureAWS
con risposta un anno fa
0

Hi,

not sure I understand your use case, if you just need to read the Iceberg table from the Lambda Function, a better option might be to invoke a query in Athena using pyAthena or the aws-sdk-pandas, which is provided as a Lambda Layer as well.

if you are trying to run some specific spark jobs and you would like the semplicity of a serverless function you might want to have a look at AWS Glue.

hope this helps

AWS
ESPERTO
con risposta un anno fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande