2 Risposte
- Più recenti
- Maggior numero di voti
- Maggior numero di commenti
0
Hello Gonzalo, Yes I was thinking same to call lambda as part of UDF . Thanks for confirming this. Just one more thing I would like to ask , will these call to lambda be synchronous .
Lets say if I have 100 rows in dataframe . Then lambda will called 100 times in parallel for each row and whole process gets completed once we get result for each row ( be it correct result or failure ).
con risposta un anno fa
0
Yes, you can call lambda via boto3 inside your Glue code.
The issue is that if you do it distributed on the data (if you mean Glue for Spark) is a bit more complicated and you are likely to get throttling errors and much higher cost than if you did that same lambda code inside a Glue udf (or even better SparkSQL)
Contenuto pertinente
- AWS UFFICIALEAggiornata 8 mesi fa
- AWS UFFICIALEAggiornata 2 anni fa
- AWS UFFICIALEAggiornata 2 anni fa
- AWS UFFICIALEAggiornata 2 anni fa
The parallelism depends on the number of partitions in Spark. It won't complete the job until all are partitions and rows are complete