Skip to content

How do I recover AWS resources that were affected by the CrowdStrike Falcon agent?

10 minute read
12

I can't connect to AWS resources that the CrowdStrike Falcon agent is installed on. I want to troubleshoot how to recover the resources.

Short description

On July 19, 2024, at 04:09 UTC, an update to the CrowdStrike Falcon agent (csagent.sys) caused Windows-based devices to experience unplanned stop errors or a blue screen. Affected devices included Amazon Elastic Compute Cloud (Amazon EC2) instances and Amazon WorkSpaces Personal virtual desktops. This issue affected only Amazon EC2 Windows instances and personal WorkSpaces with CrowdStrike installed.

For more information, see Remediation and Guidance Hub: Channel File 291 Incident on the CrowdStrike website.

Often, a reboot of your instance or WorkSpace allows the CrowdStrike Falcon agent to successfully update.

Note: If your instance uses instance store volumes, then data that's stored on the volumes doesn't persist when you stop, hibernate, or terminate the instance. When you stop, hibernate, or terminate the instance, the instance store volume is cryptographically erased. For more information, see Instance store temporary block storage for EC2 instances.

Resolution

Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.

If a reboot doesn't restore the instance to a healthy state, then use either the AWS Systems Manager Automation runbook to restore your instances. Or, manually restore your instances.

To use the runbook, first review the following prerequisites:

  • If your root Amazon Elastic Block Store (Amazon EBS) volume is encrypted, then make sure that the encryption key exists in your account. Also, make sure that you have permission to use it.
  • The AWSSupport-StartEC2RescueWorkflow runbook stops your instance. If your instance uses instance store volumes, then use the manual recovery method to avoid data loss.
  • Before you start the AWSSupport-StartEC2RescueWorkflow runbook, make sure that your AWS Identify and Access Management (IAM) user or role has the required permissions. For more information, see the Required IAM permissions section of AWSSupport-StartEC2RescueWorkflow. You must also add the kms:CreateGrant permission to the IAM role.

Identify impaired instances

To identify failed instances, run the describe-instance-status AWS CLI command:

aws ec2 describe-instance-status --filters Name=instance-status.status,Values=impaired --query "InstanceStatuses[*].InstanceId" --region your-region

Note: Replace your-region with your AWS Region.

Use the Systems Manager Automation runbook to restore a single EC2 instance

To use AWSSupport-StartEC2RescueWorkflow to automate recovery, open the runbook on the Systems Manager console. Then select the Region and instance that you want to recover. If your Amazon EBS root volume is encrypted, then set AllowEncryptedVolume to True.

The runbook workflow launches a temporary EC2 instance (helper instance) in a virtual private cloud (VPC). The workflow automatically associates the helper instance with the default security group of the VPC. The instance must allow outbound HTTPS (port TCP 443) communication to both Amazon Simple Storage Service (Amazon S3) and Systems Manager endpoints.

You must launch the instance in one of the following subnets so that the instance reaches the required AWS services to complete the workflow tasks:

  • Public subnet with the AssociatePublicIpAddress parameter set to True.
  • Private subnet with internet access through NAT.

The helper instance mounts the root volume of the selected instances, and then runs the following command to delete the affected file:

get-childitem -path "$env:EC2RESCUE_OFFLINE_DRIVE\Windows\System32\drivers\CrowdStrike\" -Include C-00000291*.sys -Recurse | foreach { $_.Delete()}

To verify the content of the Base64 OfflineScript payload from the preceding command, run the following command:

PS C:\Windows\system32> [System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String("REPLACE_WITH_BASE64_HERE"))

Use the Systems Manager Automation runbook to restore multiple EC2 instances

To use the runbook on multiple EC2 instances, use either instance IDs, tags, or resource groups.

The runbook launches one helper instance for each selected instance. Make sure that you have enough IP addresses in the selected subnet for your instances. Also, make sure that you have enough instance quotas and EBS volume quotas.

Note: The amount of time that the automation runbook takes to complete depends on the concurrency amount that you selected.

Use instance IDs

Complete the following steps:

  1. Open the AWSSupport-StartEC2RescueWorkflow runbook in the Systems Manager console.
  2. Under Execute automation runbook, choose Rate control.
  3. In the Targets section, for Parameter, choose InstanceId. For Targets, choose Parameter Values.
  4. For Input parameters, select the instances that you want to restore.
  5. For Rate Control, choose your concurrency option for the number of resources that can simultaneously run the automation. For more information, see Control automations at scale.
  6. Choose Execute.

Use tags

Complete the following steps:

  1. Create a new, unique tag to use only for the instances that you want to restore. All instances with this tag are targeted for recovery, and can cause unintentional data loss or affect instance availability. For more information, see Tag your Amazon EC2 resources and What is Tag Editor?
  2. To verify that only the affected instances share the new tag, use AWS Resource Explorer or the Tag Editor.
  3. Open the AWSSupport-StartEC2RescueWorkflow runbook on the Systems Manager console.
  4. Under Execute automation runbook, choose Rate control.
  5. In the Targets section, for Parameter, choose InstanceId. For Targets, choose Parameter Values.
  6. For Tags, choose Edit. Enter the tag key and value, and then choose Save.
  7. For Input parameters, select the instances that you want to restore.
  8. For Rate Control, choose your concurrency option for the number of resources that can simultaneously run the automation. For more information, see Control automations at scale.
  9. Choose Execute.

Use resource groups

Complete the following steps:

  1. Create a new resource group to use only for the instances that you want to restore. All instances in this resource group are targeted for recovery, and can cause unintentional data loss or affect instance availability. For more information, see Creating query-based groups in AWS Resource Groups.
  2. To verify that only the affected instances belong to the new resource group, use the AWS Resources Groups console.
  3. Open the AWSSupport-StartEC2RescueWorkflow runbook on the Systems Manager console.
  4. Under Execute automation runbook, choose Rate control.
  5. In the Targets section, for Parameter, choose InstanceId. For Targets, choose Parameter Values.
  6. For Resource Groups, select the new resource group.
  7. For Input parameters, select the instances that you want to restore.
  8. For Rate Control, choose your concurrency option for the number of resources that can simultaneously run the automation. For more information, see Control automations at scale.
  9. Choose Execute.

Manually restore your instances

Complete the following steps:

  1. Create a snapshot of the instance's EBS root volume.
  2. Create a new EBS volume from the snapshot in the same Availability Zone.
  3. Launch a new Windows instance in the same Availability Zone.
  4. Attach the new EBS volume to the new instance as a data volume.
  5. Download the EC2Rescue for Windows Server tool to the helper instance.
  6. Right-click on EC2Rescue.exe, and then choose Run As Administrator.
    Select I agree on the License Agreement.
    On the Welcome screen, choose Next.
    On the Select Mode screen, choose Offline Instance.
    Select the offline disk, and then choose Next. When prompted, choose Yes and then choose OK.
    Keep EC2Rescue running.
    Note: If your original instance uses BitLocker to encrypt the EBS root volume, then follow the onscreen prompts to provide your password or BitLocker recovery key. Or, use manage-bde unlock from the command line. For more information, see manage-bde unlock on the Microsoft website. After you unlock the drive, repeat step 6.
  7. Navigate to the X:\Windows\System32\drivers\CrowdStrike\ folder on the attached volume, and then delete C-00000291*.sys.
    Note: In this example, X: is the drive letter that's assigned to the secondary EBS volume from the affected instance. It might be a different letter in your environment.
  8. Return to EC2 Rescue.
    Choose Diagnose and Rescue, and then choose Next.
    Keep all options as default.
    Choose Next, and then choose Next again.
    When prompted, choose Rescue. Choose OK, and then choose Next.
    Choose Finish.
    On the pop-up window, choose Fix disk signature, and then choose OK.
    If Fix disk signature is greyed out, then choose OK.
  9. Detach the EBS volume from the new instance.
  10. Create a snapshot of the detached EBS volume.
  11. Select the same volume type (for example, gp2 or gp3) as the affected instance, and then create an Amazon Machine Image (AMI) from the snapshot.
  12. Replace the root volume on the original EC2 instance and specify the AMI.

WorkSpaces

If multiple reboots don't return your WorkSpace to a healthy state, then restore the WorkSpace to a previous snapshot. If your WorkSpace restore doesn't return your WorkSpace to a healthy state, then rebuild the WorkSpace.

Troubleshooting

If the preceding steps don't resolve your connectivity issues, then use the following troubleshooting steps:

Systems Manager Automation runbook

Issue: The helper Instance can't connect to the AWS service endpoints. This issue can cause a failure in the waitForEc2RescueInstanceToBeManaged automation workflow step.

Solution: Confirm that the default security group allows outbound traffic (TCP port 443) to reach the Systems Manager and S3 endpoints. Also, make sure that the selected subnet can connect to these endpoints. To use a custom security group, update the HelperInstanceSecurityGroupId parameter value with your security group ID. If you chose a public subnet, then set the AssociatePublicIpAddress parameter to True. You can also set the SubnetId parameter as CreateNewVPC for the automation to create a new VPC with the required connectivity.

Issue: The affected instance fails to stop because stop protection is turned on.

Solution: Turn off stop protection for the affected instance, and then run the automation again.

Note: If your instance uses instance store volumes, then data that's stored on those volumes doesn't persist when you stop the instance.

Issue: The helper instance fails to launch.

Solution: The instance type that's selected for the EC2Rescue instance might be unavailable in the Availability Zone of the helper instance's subnet. Use a supported instance type in the same Availability Zone as the helper instance.

Issue: When the automation verifies that the AWS CloudFormation stack creation is complete, the automation fails with the error, "Stack AWSSupport-EC2Rescue-{UUID} entered unexpected state: DELETE_IN_PROGRESS".

Solution: To determine what caused the stack to fail, get the UUID and use the CloudFormation console. Stack failure might occur when you don't have permissions to create the stack resources. For more information, see the Required IAM permissions section of AWSSupport-StartEC2RescueWorkflow and How can I troubleshoot access denied or unauthorized operation errors with an IAM policy?

Issue: The runbook fails in the assertInstanceRootVolumeIsNotEncrypted automation workflow step because of an encrypted EBS volume.

Solution: If the volume uses EBS encryption, then set the AllowEncryptedVolume parameter to True.

Issue: The default VPC was deleted.

Solution: Set the SubnetId parameter as CreateNewVPC to create a new VPC that allows the instance to successfully recover.

Issue: The detachInstanceRootVolume automation workflow step fails with the error, "error occurred (IncorrectState) when calling the DetachVolume operation: Unable to detach root volume".

Solution: When you run the automation, keep the instance in the Stopped state.

Manual instance restore

Issue: The instance fails to boot with the error, "The application or operating system couldn't be loaded because a registered file is missing or contains errors".

Solution: If you didn't select Fix disk signature, then you might have a disk signature collision. To resolve this issue, follow step 8 in the manual restore steps. If you restored the instance without EC2Rescue, then see Troubleshoot issues with Amazon EC2 Windows instances.

Note: If the preceding troubleshooting steps don't resolve connectivity to your EC2 instance, then contact AWS Support. Make sure that you capture a screenshot of the unreachable instance.

Related information

Run the EC2Rescue tool on unreachable instances

3 Comments

I am using Windows 2019 R2 and getting Null when clicking "Choose Diagnose and Rescue, and then choose Next."

replied a year ago

The last portion of the cmd line immediately above "Option 2" is:

("[REPLACE_WITH_BASE64_HERE"))

Is this a missing right bracket? ]

replied a year ago

Hello @M, thank you for reporting the syntax error, it has been corrected now.

AWS
SUPPORT ENGINEER
replied a year ago