Call AWS lambda service in Glue script

0

Hello All,

I am working on Glue pyspark script . In this script I read data from table and store it in pyspark dataframe. Now I want to add new column whose value will be calculated by passing existing columns to lambda and result will be returned.

So is it possible to call lambda service in Glue script ?

2 Risposte
0

Hello Gonzalo, Yes I was thinking same to call lambda as part of UDF . Thanks for confirming this. Just one more thing I would like to ask , will these call to lambda be synchronous .

Lets say if I have 100 rows in dataframe . Then lambda will called 100 times in parallel for each row and whole process gets completed once we get result for each row ( be it correct result or failure ).

con risposta un anno fa
  • The parallelism depends on the number of partitions in Spark. It won't complete the job until all are partitions and rows are complete

0

Yes, you can call lambda via boto3 inside your Glue code.
The issue is that if you do it distributed on the data (if you mean Glue for Spark) is a bit more complicated and you are likely to get throttling errors and much higher cost than if you did that same lambda code inside a Glue udf (or even better SparkSQL)

profile pictureAWS
ESPERTO
con risposta un anno fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande