Call AWS lambda service in Glue script

0

Hello All,

I am working on Glue pyspark script . In this script I read data from table and store it in pyspark dataframe. Now I want to add new column whose value will be calculated by passing existing columns to lambda and result will be returned.

So is it possible to call lambda service in Glue script ?

已提问 1 年前326 查看次数
2 回答
0

Hello Gonzalo, Yes I was thinking same to call lambda as part of UDF . Thanks for confirming this. Just one more thing I would like to ask , will these call to lambda be synchronous .

Lets say if I have 100 rows in dataframe . Then lambda will called 100 times in parallel for each row and whole process gets completed once we get result for each row ( be it correct result or failure ).

已回答 1 年前
  • The parallelism depends on the number of partitions in Spark. It won't complete the job until all are partitions and rows are complete

0

Yes, you can call lambda via boto3 inside your Glue code.
The issue is that if you do it distributed on the data (if you mean Glue for Spark) is a bit more complicated and you are likely to get throttling errors and much higher cost than if you did that same lambda code inside a Glue udf (or even better SparkSQL)

profile pictureAWS
专家
已回答 1 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则