Managing runtime updates at scale for Lambda
This article explores the challenges of runtime management in large-scale environments and provides actionable solutions for your organizational needs. It also provides insights to how Technical Account Managers (TAMs) can help you throughout this process.
Introduction
In today's rapidly evolving cloud landscape, organizations are adopting Serverless on AWS to enhance agility, reduce operational overhead, and deliver value to their customers. AWS Lambda allows organizations to run code without provisioning or managing servers. This service supports diverse runtime environments, such as Python, Node.js, Java, and Go, to power critical workloads. These workloads include real-time analytics, customer-facing APIs, and internal automation tools.
However, as serverless deployments grow in scale and complexity across multiple accounts and AWS Regions, organizations face challenges that are related to efficiently managing Lambda runtime updates across their cloud estate. Organizations must carefully manage Lambda's runtime lifecycle phases, from active (fully supported) through deprecated (end-of-support announced) to end-of-support (no longer maintained). If organizations don't manage these transitions, then their systems might be exposed to security vulnerabilities, operational inefficiencies, and disruption of critical business functions.
This article examines runtime management challenges in large-scale environments and provides actionable guidance. The article covers how a global EdTech company successfully migrated from manual processes to systematic runtime management across more than 500 AWS accounts and more than 2,000 Lambda functions. It also highlights how TAMs in AWS Enterprise Support can guide organizations to maintain secure, compliant, and efficient serverless operations.
Common runtime management challenges
Organizations that manage extensive Lambda deployments across multiple AWS accounts and Regions face three challenges in runtime lifecycle management:
-
Monitoring resources across multiple AWS accounts and Regions can present challenges. When organizations manage thousands of functions that are distributed across hundreds of accounts, it might be complex to track function states and maintain accurate inventories. This is especially true as runtimes transition through the lifecycle phases.
-
Managing runtime updates requires nuanced approaches based on function criticality. Business-critical operations demand rigorous testing and staged deployments. Lower-priority workloads can follow a more automated update process.
-
Balancing centralized control with the autonomy of the development team might create organizational tension. Security and operations teams enforce standards and compliance requirements. However, development teams need rapid development cycles and want to avoid bottlenecks in runtime management.
Prerequisites
To implement this solution effectively, you need the following AWS services and configurations in place:
-
AWS Organizations provides the foundation for centralized visibility and governance across your accounts. Your AWS Organizations setup must include a management account that serves as the central point for monitoring and control.
-
To track and audit Lambda function configurations, turn on AWS Config across your accounts. This service maintains an inventory of your Lambda functions and their runtime states.
-
You need access to AWS Trusted Advisor that comes with Business, Enterprise On-Ramp, or Enterprise Support plans. Trusted Advisor provides insights about deprecated runtimes and potential security vulnerabilities.
-
Your AWS Identity and Access Management (IAM) configuration must include permissions to view and modify Lambda functions. It must also manage AWS Config rules, work with AWS CloudFormation, and configure CI/CD pipelines. You must have these permissions to implement the governance and automation aspects of the solution in this article.
-
If you plan to implement the Cloud Intelligence Dashboards for enhanced visibility, then you need Amazon QuickSight Enterprise Edition. Also, set up an Amazon Simple Storage Service (Amazon S3) bucket to store your AWS Config data, and configure Amazon Athena to query this data. This combination provides comprehensive reporting and analysis of your Lambda environment.
-
To identify functions with layer dependencies and runtime-specific code, conduct an inventory of Lambda layers across your environment. Test and update layer versions before runtime upgrades to prevent compatibility issues. This sequencing promotes successful deployments.
If you're an Enterprise Support customer, then work with your TAM before implementation. Your TAM can help assess your current Lambda environment and develop a strategy that's tailored to your organization's specific needs and compliance requirements. They can also help identify any additional prerequisites based on your unique architectural requirements.
Solution
To address the runtime management challenges, organizations can implement a comprehensive runtime management strategy that’s focused on three key pillars.
Centralized runtime visibility
AWS offers the following monitoring tools that analyze runtime configurations and identify the functions and versions that you actively deploy in your environment:
-
Identify functions that use deprecated or soon-to-be deprecated runtimes with Trusted Advisor. In the Trusted Advisor console, select Security from the navigation pane. Then, choose AWS Lambda Functions Using Deprecated Runtimes. For more information on this check, see AWS Lambda functions using deprecated runtimes.
-
Use AWS Config to create a view of the configuration of resources in your account and store the configuration snapshot data in Amazon S3. AWS Config queries don't support published function versions. They can only query the $LATEST version. To create dynamic compliance dashboards and analyze AWS Config data, combine Athena's querying capabilities with QuickSight's visualization features. For more information, see Implementing governance in depth for serverless applications on the Serverless Land website.
-
Deploy the Trusted Advisor Organizational dashboard. To get organization-wide visibility across all of your accounts, refer to the Lambda Functions Using Deprecated (or about to be deprecated) Runtimes section.
If you're an Enterprise Support customer, then work with your TAM to maximize the value of these tools. Your TAM can guide you through tool implementation, help design custom dashboards that align with your compliance requirements, and suggest integrated monitoring strategies. TAMs provide you with best practices for runtime monitoring and help you adapt your visibility strategy as your Lambda environment evolves. This combination of AWS tools and TAM expertise creates a real-time view of your Lambda runtime landscape. That way, you can proactively upgrade and minimize the risk of running deprecated runtimes.
Risk-based runtime management strategy
A successful runtime management strategy requires careful consideration of function criticality and appropriate controls. Lambda provides flexible runtime management options through three distinct control modes that help you customize update timeline and implementation. The Lambda layer dependencies that you identified during your prerequisite assessment might affect your upgrade approach and timeline for each function category.
Understanding runtime management controls
Lambda's runtime management system offers three control modes to help you manage updates effectively. Auto mode automatically applies patch-level updates through a gradual rollout process. You can't determine the timing of these updates. With Function update mode, you have more control over the updates. Patch-level updates occur only when you modify the function code or configuration so that you can have indirect control over update timelines. Manual mode offers the highest level of control, requires explicit version changes for any updates, and doesn't apply automatic updates.
Function classification and controls
Effective implementation of these controls requires careful classification of Lambda functions based on their business impact and operational requirements.
Critical functions: Manual mode deployment provides optimal control and risk management when handling business-critical functions that directly affect customer experience and revenue streams. This approach allows you to coordinate runtime updates with scheduled maintenance windows and provide thorough testing before implementation. To make sure that you validate the layer compatibility in staging environments before production deployment, coordinate Lambda layer updates with runtime upgrades. While this method provides precise control, it also introduces the risk of delayed security patches. To mitigate this risk, establish monthly patch review cycles, and create clear procedures for emergency updates, when necessary.Use blue/green deployments with weighted traffic shifting (10% to 100%) for controlled rollouts. Make sure that you support the rollouts with comprehensive monitoring and automated rollback triggers based on error rates and latency thresholds.
Important functions: Internal business operations must use the Function update mode to synchronize runtime updates with existing deployment cycles. This helps organizations create predictable patterns that align with established change management processes and minimize compatibility risks through coordinated Lambda layer version updates. Organizations must establish emergency deployment procedures for critical security patches. Also, they must streamline recovery processes through Lambda aliases, Amazon CloudWatch alarms, and automated rollback triggers. CloudWatch alarms detect the threshold breach when error rates exceed baseline thresholds by 50%, or the function duration surpasses acceptable limits. Then, automated rollbacks initiate the rollback procedure, and Lambda aliases provide the traffic shift back to the activated previous version.
Non-critical functions: The Auto mode deployment balances immediate security updates with minimal operational overhead and delivers optimal efficiency in testing and development environments. Accept higher layer compatibility risk for faster runtime updates, but monitor for layer-related errors post-upgrade. Manage the risk of automatic updates that affect function stability. To do this, implement comprehensive monitoring of the function's health metrics and error rates after updates occur. Development teams can store function configurations in version control and use Infrastructure as Code templates for lower-impact environments. This process helps prioritize rapid restoration through simple redeployment procedures rather than complex zero-downtime rollbacks.
Runtime management strategy matrix
When you implement runtime management at scale, it's crucial to understand how different aspects of the strategy interconnect. The following matrix provides a view of how function criticality levels affect layer management approaches, deployment strategies, and rollback procedures. This serves as a quick reference guide for teams that implement runtime updates and supports consistent decision-making across your Lambda environment. The matrix outlines key operational parameters and requirements for each criticality level, from monitoring thresholds to recovery time objectives.
You can use this matrix as a baseline framework and customize it based on your organization's specific requirements and constraints. Work with your teams to adjust thresholds and procedures and maintain the core principles of graduated control and risk management. If you’re an Enterprise Support customer, then your TAM can help adapt these strategies to align with your operational maturity and compliance requirements.
Managed runtime implementation
Across all function categories, Lambda managed runtimes provide essential capabilities for maintaining secure and compliant environments. These runtimes automatically deliver security patches and bug fixes within the same major version. They also offer visibility through the AWS Health Dashboard for monitoring runtime health and tracking patch applications.
While managed runtimes handle routine security patches automatically, they don't manage major version upgrades. For instance, if you upgrade from Node.js 14 to 16, then you need manual intervention. To manage these transitions effectively, implement a tiered upgrade schedule based on function classification:
-
Critical functions: Plan 6-month migration cycles with extensive testing.
-
Important functions: Align with quarterly release cycles.
-
Non-critical functions: Upgrade within 30 days of availability.
It's a best practice to actively monitor AWS deprecation announcements and maintain a detailed upgrade calendar. This proactive approach helps with smooth transitions before runtime end-of-life dates. It also helps prevent disruptions and maintain the security and efficiency of your Lambda environment.
Preventive and automated governance
It's important for organizations to implement robust preventive controls and automated governance mechanisms to maintain a secure and compliant Lambda environment at scale. This approach combines CI/CD pipeline validation, AWS Config rules, and CloudFormation hooks to create multiple layers of runtime governance:
-
CI/CD pipeline validation: Implement pre-deployment validation scripts into your CI/CD pipeline. These scripts must automatically scan Infrastructure as Code templates for deprecated Lambda runtimes before deployment. This proactive approach catches runtime issues early in the development cycle to prevent maintenance overhead.
-
AWS Config rules: Use the managed rulelambda-function-settings-checkto continuously monitor Lambda functions for deprecated runtimes. Or, you can create custom rules for organization-specific runtime policies. To handle non-compliant functions that the rules detect, configure automatic remediation actions or Amazon Simple Notification Service (Amazon SNS) notifications.
-
CloudFormation hooks: Use CloudFormation hooks to validate runtime versions before deployment. Implement pre-create and pre-update hooks that check Lambda runtime versions against your approved list and prevent stack deployment if deprecated runtimes are detected. For more information, see Validate your Lambda runtime with CloudFormation Lambda hooks.
If you're an Enterprise Support customer, then TAMs provide tailored technical guidance for your governance frameworks under two critical areas:
-
TAMs work with your cloud architecture team to design governance frameworks that align with your organization's operational maturity and compliance requirements. This includes collaboration with your architecture team to design automated validation workflows that integrate runtime checks into your existing CI/CD pipelines. TAMs also provide guidance custom AWS Config rules to manage your specific runtime policies.
-
TAMs share proven implementation patterns from similar Enterprise Support customers that can help you avoid common pitfalls in governance automation. For instance, they can suggest effective approaches for handling exceptions in regulated environments, or strategies for managing runtime updates across multi-account architectures.
Benefits
TAMs helped the EdTech company improve their Lambda runtime management. This led to the following key improvements:
-
Operational efficiency: The organization worked with their TAM to automate runtime updates for 50% of their Lambda functions. This improvement reduced the maintenance time from hours to minutes, and allowed the operations team to save up to 15 hours weekly to focus on strategic initiatives.
-
Enhanced security posture: TAMs helped create a comprehensive governance framework that improved the company's security posture. This framework prevented deprecated runtime deployments through automated pipeline checks, and successfully blocked dozens of potential vulnerabilities in the first few months. It also made sure that the Lambda functions remained secure and compliant across the organization.
-
Compliance and governance: The TAM established a multi-Region governance framework that featured comprehensive dashboards. These dashboards provide real-time visibility across AWS accounts, transform manual compliance processes into automated workflows, and streamline audit preparation.
The EdTech company's successful transformation demonstrates how strategic Lambda runtime management, supported by AWS Enterprise Support, can significantly develop operational efficiency and security at scale.
Conclusion
Managing Lambda runtime updates at scale requires a strategic balance of automation, governance, and rigorous testing protocols. To handle this challenge, organizations can implement comprehensive management frameworks and leverage AWS tools and services. They can maintain secure, compliant, and efficient serverless operations, and minimize risk and maximize value from their AWS investments.
Work with your TAM to develop and use a comprehensive Lambda runtime management strategy. This collaboration can help you navigate the complexities of large-scale serverless environments, establish security and compliance, and drive operational excellence. To learn more about how our plans and offerings can help you get the most out of your AWS environment, see AWS Support.
About the authors
Sangram Thorat
Sangram Thorat is a Senior TAM for EdTech customers at AWS, where he helps enterprises optimize their workloads for security, reliability and cost efficiency. With his expertise in edge services, such as content delivery network (CDN) and edge security, he helps customers optimize application performance and security. Based out of Boston, Massachusetts, Sangram is passionate about transforming education through technology.
Woody Steinhoff
Woody Steinhoff is a Senior TAM at AWS. As an expert in cost optimization and monitoring, he helps enterprise customers implement strategies to maximize their cloud investments. Woody works closely with customers to align their cloud spending with business objectives, optimize resource allocation, and develop long-term cost management strategies.
- Language
- English

Relevant content
AWS OFFICIALUpdated 3 months ago- asked 3 years ago