Call AWS lambda service in Glue script

0

Hello All,

I am working on Glue pyspark script . In this script I read data from table and store it in pyspark dataframe. Now I want to add new column whose value will be calculated by passing existing columns to lambda and result will be returned.

So is it possible to call lambda service in Glue script ?

2 Answers
0

Hello Gonzalo, Yes I was thinking same to call lambda as part of UDF . Thanks for confirming this. Just one more thing I would like to ask , will these call to lambda be synchronous .

Lets say if I have 100 rows in dataframe . Then lambda will called 100 times in parallel for each row and whole process gets completed once we get result for each row ( be it correct result or failure ).

answered a year ago
  • The parallelism depends on the number of partitions in Spark. It won't complete the job until all are partitions and rows are complete

0

Yes, you can call lambda via boto3 inside your Glue code.
The issue is that if you do it distributed on the data (if you mean Glue for Spark) is a bit more complicated and you are likely to get throttling errors and much higher cost than if you did that same lambda code inside a Glue udf (or even better SparkSQL)

profile pictureAWS
EXPERT
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions