Call AWS lambda service in Glue script

0

Hello All,

I am working on Glue pyspark script . In this script I read data from table and store it in pyspark dataframe. Now I want to add new column whose value will be calculated by passing existing columns to lambda and result will be returned.

So is it possible to call lambda service in Glue script ?

2 réponses
0

Hello Gonzalo, Yes I was thinking same to call lambda as part of UDF . Thanks for confirming this. Just one more thing I would like to ask , will these call to lambda be synchronous .

Lets say if I have 100 rows in dataframe . Then lambda will called 100 times in parallel for each row and whole process gets completed once we get result for each row ( be it correct result or failure ).

répondu il y a un an
  • The parallelism depends on the number of partitions in Spark. It won't complete the job until all are partitions and rows are complete

0

Yes, you can call lambda via boto3 inside your Glue code.
The issue is that if you do it distributed on the data (if you mean Glue for Spark) is a bit more complicated and you are likely to get throttling errors and much higher cost than if you did that same lambda code inside a Glue udf (or even better SparkSQL)

profile pictureAWS
EXPERT
répondu il y a un an

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions