Skip to content

Securing EC2 Capacity on a short notice: the ASG-to-ODCR Strategy

8 minute read
Content level: Expert
2

Practical strategy that leverages Auto Scaling Groups (ASG) to collect and hold capacity before converting it to proper reservations

When planning blue/green deployments or other capacity-intensive operations, you might find yourself needing EC2 capacity on short notice.

Important: Before exploring capacity reservation strategies, ensure you've implemented proper scaling best practices for your applications. These practices include designing for elasticity, using multiple Availability Zones, implementing gradual scaling patterns, and leveraging diverse instance types. Capacity reservations should only be considered when standard scaling approaches don't meet your specific requirements due to timing constraints or capacity availability concerns.

While future-dated On-Demand Capacity Reservations (ODCRs) are the ideal solution for securing capacity at a future date, they're not always available for your specific instance types and require a minimum amount of days of lead time and commitment.

Here's a practical strategy that leverages Auto Scaling Groups (ASG) to collect capacity over a period of time and hold it before converting it to proper reservations.

The Strategy

  1. Create a temporary ASG with the same instance type and AZ you'll need for your actual workload
  2. Use an AMI with the same billing code as your production AMI (critical for ODCR matching)
  3. Apply the same billing tags as your production workload for cost tracking
  4. Once ASG reaches desired capacity, create matching ODCRs
  5. Terminate the ASG - your capacity is now reserved via ODCR

Manual Implementation Steps

Step 0: Verify AMI Billing Code

Run the following command to verify the AMI billing code you're going to need to enter on steps 1 and 4 as instance-platform.
Make sure to change the <ami-id> and <region-code> parameters according to your needs.

aws ec2 describe-images \
    --image-ids <ami-id> \
    --query 'Images[0].PlatformDetails' \
    --region <region-code>

Step 1: Direct ODCR Creation

Always try creating ODCRs directly first.
Make sure to change the <instance-type>, <instance-platform>, <required-az>, <required-quantity> and <region-code> parameters according to your needs.

aws ec2 create-capacity-reservation \
    --instance-type <instance-type> \
    --instance-platform  "<instance-platform>" \
    --availability-zone <required-az> \
    --instance-count <required-quantity> \
    --region <region-code>

Use the strategy below only when direct ODCR creation fails due to insufficient capacity.

Step 2: Create the Temporary ASG

  • Create launch template with an AMI with the same instance-platform as production workload.
    Make sure to change the <ami-id>, <instance-type>, Tags and <region-code> parameters according to your needs.
aws ec2 create-launch-template \
    --launch-template-name capacity-holder-template \
    --launch-template-data '{
        "ImageId": "<ami-id>",
        "InstanceType": "<instance-type>",
        "TagSpecifications": [{
            "ResourceType": "instance",
            "Tags": [
                {"Key": "Project", "Value": "BlueGreenDeployment"},
                {"Key": "Purpose", "Value": "CapacityReservation"}
            ]
        }]
    }' \
    --region <region-code>
  • Create ASG in target AZ. Make sure to change the <required-quantity>, <required-az>, <region-code> parameters according to your needs.
aws autoscaling create-auto-scaling-group \
    --auto-scaling-group-name capacity-holder-asg \
    --launch-template LaunchTemplateName=capacity-holder-template,Version='$Latest' \
    --min-size 0 \
    --max-size <required-quantity> \
    --desired-capacity <required-quantity> \
    --availability-zones <required-az> \
    --region <region-code>

Step 3: Monitor ASG Capacity

Wait for instances to launch and check their count.
Make sure to change the <region-code> parameter according to your needs.

It can take hours for the required desired capacity to become available at times, if at all, depending on the current instance utilization pattern in the selected AZ. It's recommended that you define a timeout after which you can take the capacity that has been collected by the ASG by moving to step 4 or release everything as described in step 5.

aws autoscaling describe-auto-scaling-groups \
    --auto-scaling-group-names capacity-holder-asg \
    --query 'AutoScalingGroups[0].Instances[?LifecycleState==`InService`] | length(@)' \
    --region <region-code>

Step 4: Create Matching ODCRs

Once the autoscaling reaches the desired capacity, you can create an OnDemand Capacity Reservation matching the attributes of the instances launched by the ASG.
By default open ODCRs match running instances with the same attributes.
Make sure to change the <instance-type>, <instance-platform>, <required-az>, <required-quantity> and <region-code> parameters according to your needs.

aws ec2 create-capacity-reservation \
    --instance-type <instance-type> \
    --instance-platform <instance-platform> \
    --availability-zone <required-az> \
    --instance-count <required-quantity> \
    --instance-match-criteria open \
    --tag-specifications 'ResourceType=capacity-reservation,Tags=[{Key=Project,Value=BlueGreenDeployment},{Key=Purpose, Value=CapacityReservation}]' \
    --region <region-code>

Step 5: Clean Up ASG and Launch Template

Make sure to change the <region-code> parameter according to your needs.

# Scale down and delete ASG
aws autoscaling update-auto-scaling-group \
    --auto-scaling-group-name capacity-holder-asg \
    --min-size 0 \
    --desired-capacity 0 \
    --region <region-code>

aws autoscaling delete-auto-scaling-group \
    --auto-scaling-group-name capacity-holder-asg \
    --force-delete \
    --region <region-code>

# Delete launch template
aws ec2 delete-launch-template \
    --launch-template-name capacity-holder-template \
    --region <region-code>

Key Benefits

  • Capacity validation - If ASG can't reach the desired capacity over an extended period of time, you know capacity isn't readily available and must find alternatives
  • Seamless transition - Convert running instances to proper reservations
  • Billing consistency - Same AMI and tags ensure proper cost allocation and ODCR matching

Important Considerations

Timing

  • The autoscaling group can take a few hours to launch the desired instance count, or succeed to launch less than the desired instance count, in some cases. Plan accordingly for time-sensitive deployments

Costs

  • You'll pay for the temporary instances using your production AMI
  • Cannot use a cheaper AMI - must match production billing code for ODCR compatibility

Critical Requirements

  • AMI billing code must match - Use the exact same AMI (or one with identical billing code) as your production workload
  • ODCRs only match instances with the same instance attributes and platform/billing characteristics

Limitations

  • This strategy does not guarantee full collection of the desired quantity

Conclusion

This ASG-to-ODCR strategy provides a method to secure EC2 capacity when traditional reservation methods aren't readily available. While it adds complexity and minor costs, it can be invaluable for ensuring successful blue/green deployments and other capacity intensive operations in capacity-constrained scenarios.

Remember to always test this process in your specific environment and consider the timing requirements of your deployment pipeline.

Automated Implementation

For hands-off operation, use Step Functions to wait for ASG capacity and create ODCR without race conditions.

Step 1: Create IAM Role for Step Functions

aws iam create-role \
    --role-name stepfunctions-odcr-execution-role \
    --assume-role-policy-document '{
        "Version": "2012-10-17",
        "Statement": [{
            "Effect": "Allow",
            "Principal": {"Service": "states.amazonaws.com"},
            "Action": "sts:AssumeRole"
        }]
    }'

# Create and attach custom policy for EC2 and ASG permissions
aws iam put-role-policy \
    --role-name stepfunctions-odcr-execution-role \
    --policy-name EC2-ASG-Permissions \
    --policy-document '{
        "Version": "2012-10-17",
        "Statement": [{
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeInstances",
                "ec2:DescribeImages",
                "ec2:CreateCapacityReservation",
                "ec2:CreateTags",
                "ec2:DeleteLaunchTemplate",
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:UpdateAutoScalingGroup",
                "autoscaling:DeleteAutoScalingGroup"
            ],
            "Resource": "*"
        }]
    }'

Step 2: Create Step Functions State Machine

  • Create the state machine definition json
cat > odcr-state-machine.json << 'EOF'
{
  "Comment": "Wait for ASG capacity and create ODCR",
  "StartAt": "CheckASGCapacity",
  "States": {
    "CheckASGCapacity": {
      "Type": "Task",
      "Resource": "arn:aws:states:::aws-sdk:autoscaling:describeAutoScalingGroups",
      "Parameters": {
        "AutoScalingGroupNames.$": "$.asgName"
      },
      "ResultPath": "$.asgResponse",
      "Next": "CheckInstancesExist"
    },
    "CheckInstancesExist": {
      "Type": "Choice",
      "Choices": [{
        "Variable": "$.asgResponse.AutoScalingGroups[0].Instances",
        "IsPresent": true,
        "Next": "CountRunningInstances"
      }],
      "Default": "WaitForCapacity"
    },
    "CountRunningInstances": {
      "Type": "Pass",
      "Parameters": {
        "runningInstances.$": "States.ArrayLength($.asgResponse.AutoScalingGroups[0].Instances)",
        "desiredCapacity.$": "$.asgResponse.AutoScalingGroups[0].DesiredCapacity",
        "asgName.$": "$.asgName",
        "asgResponse.$": "$.asgResponse"
      },
      "Next": "EvaluateCapacity"
    },
    "EvaluateCapacity": {
      "Type": "Choice",
      "Choices": [{
        "And": [
          {
            "Variable": "$.runningInstances",
            "NumericGreaterThan": 0
          },
          {
            "Variable": "$.runningInstances",
            "NumericGreaterThanEqualsPath": "$.desiredCapacity"
          }
        ],
        "Next": "PrepareInstanceId"
      }],
      "Default": "WaitForCapacity"
    },
    "WaitForCapacity": {
      "Type": "Wait",
      "Seconds": 30,
      "Next": "CheckASGCapacity"
    },
    "PrepareInstanceId": {
      "Type": "Pass",
      "Parameters": {
        "instanceId.$": "$.asgResponse.AutoScalingGroups[0].Instances[0].InstanceId",
        "asgResponse.$": "$.asgResponse"
      },
      "Next": "GetInstanceDetails"
    },
    "GetInstanceDetails": {
      "Type": "Task",
      "Resource": "arn:aws:states:::aws-sdk:ec2:describeInstances",
      "Parameters": {
        "InstanceIds.$": "States.Array($.instanceId)"
      },
      "ResultPath": "$.instanceResponse",
      "Next": "PrepareImageId"
    },
    "PrepareImageId": {
      "Type": "Pass",
      "Parameters": {
        "imageId.$": "$.instanceResponse.Reservations[0].Instances[0].ImageId",
        "instanceResponse.$": "$.instanceResponse",
        "asgResponse.$": "$.asgResponse"
      },
      "Next": "GetAMIDetails"
    },
    "GetAMIDetails": {
      "Type": "Task",
      "Resource": "arn:aws:states:::aws-sdk:ec2:describeImages",
      "Parameters": {
        "ImageIds.$": "States.Array($.imageId)"
      },
      "ResultPath": "$.amiResponse",
      "Next": "CreateODCR"
    },
    "CreateODCR": {
      "Type": "Task",
      "Resource": "arn:aws:states:::aws-sdk:ec2:createCapacityReservation",
      "Parameters": {
        "InstanceType.$": "$.instanceResponse.Reservations[0].Instances[0].InstanceType",
        "InstancePlatform.$": "$.amiResponse.Images[0].PlatformDetails",
        "AvailabilityZone.$": "$.instanceResponse.Reservations[0].Instances[0].Placement.AvailabilityZone",
        "InstanceCount.$": "$.asgResponse.AutoScalingGroups[0].DesiredCapacity",
        "TagSpecifications": [{
          "ResourceType": "capacity-reservation",
          "Tags": [{"Key": "CreatedBy", "Value": "StepFunctions-Automation"}]
        }]
      },
      "ResultPath": "$.odcrResponse",
      "Next": "PrepareLaunchTemplateName"
    },
    "PrepareLaunchTemplateName": {
      "Type": "Pass",
      "Parameters": {
        "launchTemplateName.$": "$.asgResponse.AutoScalingGroups[0].LaunchTemplate.LaunchTemplateName",
        "asgName.$": "$.asgResponse.AutoScalingGroups[0].AutoScalingGroupName"
      },
      "Next": "DeleteASG"
    },
    "DeleteASG": {
      "Type": "Task",
      "Resource": "arn:aws:states:::aws-sdk:autoscaling:deleteAutoScalingGroup",
      "Parameters": {
        "AutoScalingGroupName.$": "$.asgName",
        "ForceDelete": true
      },
      "ResultPath": null,
      "Next": "WaitForASGDeletion"
    },
    "WaitForASGDeletion": {
      "Type": "Wait",
      "Seconds": 30,
      "Next": "DeleteLaunchTemplate"
    },
    "DeleteLaunchTemplate": {
      "Type": "Task",
      "Resource": "arn:aws:states:::aws-sdk:ec2:deleteLaunchTemplate",
      "Parameters": {
        "LaunchTemplateName.$": "$.launchTemplateName"
      },
      "End": true
    }
  }
}
EOF
  • Create the state machine
    Make sure to change the <region-code> parameter according to your needs.
aws stepfunctions create-state-machine \
    --name odcr-capacity-automation \
    --definition file://odcr-state-machine.json \
    --role-arn arn:aws:iam::<account>:role/stepfunctions-odcr-execution-role \
    --region <region-code>

Step 3: Execute Step Functions After Creating ASG

Make sure to change the <required-quantity>, <required-az>, <account> and <region-code> parameters according to your needs.

# Create Launch template
aws ec2 create-launch-template \
    --launch-template-name capacity-holder-template \
    --launch-template-data '{
        "ImageId": "<ami-id>",
        "InstanceType": "<instance-type>",
        "TagSpecifications": [{
            "ResourceType": "instance",
            "Tags": [
                {"Key": "Project", "Value": "BlueGreenDeployment"},
                {"Key": "Purpose", "Value": "CapacityReservation"}
            ]
        }]
    }' \
    --region <region-code>

# Create ASG
aws autoscaling create-auto-scaling-group \
    --auto-scaling-group-name capacity-holder-asg \
    --launch-template LaunchTemplateName=capacity-holder-template,Version='$Latest' \
    --min-size 0 \
    --max-size <required-quantity> \
    --desired-capacity <required-quantity> \
    --availability-zones <required-az> \
    --region <region-code>

# Start Step Functions execution
aws stepfunctions start-execution \
    --state-machine-arn arn:aws:states:<region-code>:<account>:stateMachine:odcr-capacity-automation \
    --input '{"asgName": ["capacity-holder-asg"], "runningInstances": 0}' \
    --region <region-code>

Step 4: Monitor Progress

Make sure to change the <account> and <region-code> parameters according to your needs.

# Check Step Functions execution status
aws stepfunctions list-executions \
    --state-machine-arn arn:aws:states:<region-code>:<account>:stateMachine:odcr-capacity-automation \
    --region <region-code>

# Check ODCR creation
aws ec2 describe-capacity-reservations \
    --filters Name=tag:CreatedBy,Values=StepFunctions-Automation \
    --query 'CapacityReservations[*].[CapacityReservationId,State,InstanceCount]' \
    --region <region-code>
AWS
EXPERT
published 7 months ago992 views