Fargate vs. Lambda Performance Issue

0

Hi! We have some .net6 code that we run on both Lambda and Fargate (we can configure which container we use for each of our workloads based on criteria linked to how long it usually takes) So, the code is identitcal (we've built a nuget package to encapsulate our logic, and then just wrap to each container)

This has worked fine for the last few years. Nearly 2 weeks ago, Fargate started to run at < 1% of the performance of the Lambda. We've narrowed it down to the points where we execute SQL commands, run asynchronously, against MS SQL running in EC2. These are execute only SP's (no significant data is changing hands) and because all the work is being done on SQL, the container is pretty much just waiting Lambda, they run in a matter of milliseconds. Fargate, its minutes. And it seems to scale (so a 10 sec SP in Lambda takes many hours in Fargate)

Does anyone have any ideas of where / or what we could do to diagnose this issue? We don't think anything has changed - and lambda continues to perform We've rolled back code - and Fargate continues to not perform We've put debug versions our, changed our docker layers, used the .net8 runtime - nothing makes any difference

We're not really sure at this point how we examine the container for how it is performing or what the problem might be It doesn't fail, so there is no debug. The container is in a waiting state, so CPU is idle. Memory is minimal at the start of the process - and it's happening there so its not memory based. I have gut feels (something to do with CPU allocation, thread priority / allocation), but I've no idea how i go about proving / disproving / and what container configuration thats related that we could try to see if it fixes Anone have any ideas? has anyone seen this behaviour as well?

Any help would be hugely appreciated! Thank you

asked 5 months ago628 views
1 Answer
2

Hi David Markham,

Please try the below steps once , it helps you to resolve your issue.

Review Networking Configuration:

  • VPC Configuration: Ensure that both Lambda and Fargate tasks are configured within the same VPC and subnet. Check for any differences in networking setups that might cause latency issues.
  • Security Groups: Ensure the security groups attached to your Fargate tasks allow traffic to and from the MS SQL server.
  • NAT Gateway/Bastion Host: Verify that the Fargate tasks have a proper route to reach the internet or the MS SQL server if needed through NAT gateways or bastion hosts.

Resource Allocation:

  • Fargate Task Definition: Check the CPU and memory settings for your Fargate tasks. Ensure that they are appropriately allocated and match the performance requirements.
  • Scaling Policies: Ensure that the scaling policies and configurations are suitable for your workload. Sometimes under-provisioning resources can lead to significant performance degradation.

Check for Throttling or Limits:

  • EC2 Instance Limits: Verify if the EC2 instances hosting the SQL server are not hitting any limits (CPU, IO, Network). Check CloudWatch metrics for the EC2 instances to ensure they are not the bottleneck.
  • RDS Instance Limits: If you are using RDS, check the instance class and ensure it is not under-provisioned.

SQL Server Performance:

  • Execution Plans: Ensure the execution plans for stored procedures are optimized. Differences in execution plans might cause significant performance variations.
  • Index Fragmentation: Check for index fragmentation on the tables involved and defragment if necessary.
  • Parameter Sniffing: Ensure parameter sniffing is not causing issues with the stored procedure execution.

Container Insights and Monitoring:

  • CloudWatch Logs: Enable and review detailed logs from Fargate tasks. Look for any anomalies or patterns that might indicate the cause of the delay.
  • Container Insights: Use AWS CloudWatch Container Insights to monitor CPU, memory, disk, and network metrics for your Fargate tasks.

Database Connection Configuration:

  • Connection Pooling: Ensure that the database connection pooling is configured properly for Fargate tasks. Misconfigured connection pooling can lead to performance issues.
  • Connection Strings: Review the connection strings and ensure they are consistent and optimized for both Lambda and Fargate.

Compare Fargate Task Versions:

  • Docker Image Differences: Ensure there are no differences in the Docker images used for Lambda and Fargate. Rebuild and deploy a fresh image if necessary.
  • Environment Variables: Ensure that the environment variables are configured correctly and consistently between Lambda and Fargate tasks.

Network Latency and Throughput:

  • Latency Testing: Conduct network latency testing between Fargate tasks and the SQL server. Tools like ping, traceroute, or custom scripts can help identify latency issues.
  • Bandwidth Utilization: Ensure the network bandwidth is not a bottleneck. High latency or low bandwidth can significantly affect performance.

Use CloudWatch and X-Ray for Monitoring: Enable AWS X-Ray for your Fargate tasks to trace the SQL execution path and identify bottlenecks. Use CloudWatch metrics to monitor the detailed performance of your Fargate tasks and SQL server.

Performance Benchmarks: Create a simple benchmark test that runs SQL commands from both Lambda and Fargate. Compare the results to isolate the problem.

Network Performance Testing: Use tools like iperf to test network throughput between Fargate tasks and the SQL server.

Isolation and Testing: Run Fargate tasks in isolation (with minimal other workloads) to see if performance improves. This can help determine if the issue is related to resource contention.

EXPERT
answered 5 months ago
profile picture
EXPERT
reviewed 4 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions