Using Lambda function rather than GPUs for cost reduction


Current scenario : Our company has a web application where users can upload documents for different processing tasks. This processing is being done on GPUs and services running on ECS. To reduce the costs, our company wants to modify architecture to use Lambda function on the backend. Our architects think that sending a huge number of calls in parallel to lambda would achieve the same performance as GPUs or ECS services, while bringing huge cost reductions.

However, during our initial load testing, we found out that the lambda function shows a significant error rate. With provisioned concurrency the success rate improves. Below, i am attaching the results both without provisioned concurrency and with provisioned concurrency.

Without Provisioned concurrency: 15% error rate

With Provisioned concurrency: 2% error rate

During load testing we receive the following 4 types of errors Gateway Timeout Remote host terminated the handshake Failed to respond Timeout error

Requirement: Our company wants to avoid spending too much on provisioned concurrency, and achieve a good throughput and success rate.

Example Case : 10,000 Users using the web application Each user uploads 1 document of 500 pages For each page there would be around 10 segments. For each segment there would be a back-end lambda function call.

Total Lambda function calls for this case required : 10,000 x 500 x 10 = 50000000

We have following questions for this scenario:

How to configure the lambda function to support these many calls? Can lambda function support these many calls without provisioned concurrency? What would be the cost-effective and good-throughput value for provisioned concurrency? How does the provisioned concurrency cost compare to the GPU compute service? Is there a significant cost reduction possible by moving to lambda functions?

asked a year ago294 views
2 Answers

Given the volume of calls and the architecture that you're implementing I would strongly recommend reaching out to your local AWS Solutions Architect who can help here. It's highly likely you're bumping into some soft limits in the Lamdba service which are designed to protect you from runaway processes but in this case they are not letting you scale the way you would like.

I've seen a lot higher Lambda invocation rates than this with other customers but it did require some work to get there in order to avoid limits on legacy back end systems that the Lambda functions used.

profile pictureAWS
answered a year ago

Lambda has two relevant limits: Concurrency limit, which is the max number of functions that can run at once, and Burst limit, which is how fast you can scale to the first limit. By default the concurrency limit is 1000 per account (can be easily increased), and the second limit is 3000/1000/500 burst depending on the region, with additional 500 per minute, this one is more complex to increase.

To overcome these limits I would recommend to move to an asynchronous architecture, in which you send all paragraphs to an SQS queue and let Lambda process the queue. The way Lambda scales when consuming from queue will resolve all scaling issue you are currently facing.

Also, as @Brettski-AWS recommended, talk to an AWS SA.

profile pictureAWS
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions