How do I recover AWS resources that were affected by the CrowdStrike Falcon agent?
I can't connect to AWS resources that the CrowdStrike Falcon agent is installed on.
Short description
On July 19th 2024 at 04:09 UTC, an update to the CrowdStrike Falcon agent (csagent.sys) caused Windows based devices to experience unplanned stop errors or blue screen. Affected devices include Amazon Elastic Compute Cloud (Amazon EC2) instances and Amazon WorkSpaces Personal virtual desktops. This issue affects only Windows Amazon EC2 instances and personal WorkSpaces with CrowdStrike installed.
For more information, see Remediation and Guidance Hub: Falcon Content Update for Windows Hosts on the CrowdStrike website.
Usually, a reboot of your instance or WorkSpace allows the CrowdStrike Falcon agent to successfully update.
Note: If your instance uses instance store volumes, then data that's stored on the volumes doesn't persist when the instance is stopped, hibernated, or terminated. When the instance is stopped, hibernated, or terminated, the instance store volume is cryptographically erased. For more information, see Amazon EC2 instance store temporary block storage for Amazon EC2 instances.
Resolution
If a reboot doesn't restore the instance to a healthy state, then use either the AWS Systems Manager Automation runbook to restore your instances. Or, manually restore your instances.
If you choose to use the runbook, then first take the following actions:
- If your root Amazon Elastic Block Store (Amazon EBS) volume is encrypted, then make sure that the encryption key exists in your account. Also, you must have permission to use it.
- The AWSSupport-StartEC2RescueWorkflow runbook stops your instance. If your instance uses instance store volumes, then use the manual recovery method to avoid data loss.
- Before you start the AWSSupport-StartEC2RescueWorkflow runbook, make sure that your AWS Identify and Access Management (IAM) user or role has the required permissions. For more information, see the Required IAM permissions section of AWSSupport-StartEC2RescueWorkflow. You must also add kms:CreateGrant permission to the IAM role.
Identify impaired instances
Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshoot AWS CLI errors. Also, make sure that you're using the most recent AWS CLI version.
To identify failed instances, run the describe-instance-status AWS CLI command:
aws ec2 describe-instance-status --filters Name=instance-status.status,Values=impaired --query "InstanceStatuses[*].InstanceId" --region your-region
Note: Replace your-region with your AWS Region.
Use the AWS Systems Manager automation runbook to restore a single EC2 instance
To use AWSSupport-StartEC2RescueWorkflow to automate recovery, open the runbook on the Systems Manager console, and select the Region and instance that you're recovering. If your EBS root volume is encrypted, then set AllowEncryptedVolume to True.
The runbook workflow launches a temporary EC2 instance (helper instance) in a virtual private cloud (VPC). The helper instance is automatically associated with the default security group of the VPC. The instance must allow outbound HTTPS (port TCP 443) communication to both Amazon Simple Storage Service (Amazon S3) and Systems Manager endpoints.
You must launch the instance in one of the following subnets so that the instance reaches the required AWS services to complete the workflow tasks:
- Public subnet with the AssociatePublicIpAddress parameter set to True.
- Private subnet with internet access through NAT.
The helper instance mounts the root volume of the selected instances, and then runs the following command to delete the affected file:
get-childitem -path "$env:EC2RESCUE_OFFLINE_DRIVE\Windows\System32\drivers\CrowdStrike\" -Include C-00000291*.sys -Recurse | foreach { $_.Delete()}
To verify the content of the Base64 OfflineScript payload from the preceding command, run the following command:
PS C:\Windows\system32> [System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String("REPLACE_WITH_BASE64_HERE"))
Use the AWS Systems Manager automation runbook to restore multiple EC2 instances
To use the runbook on multiple EC2 instances, use either instance IDs, tags, or resource groups.
Note:
- The runbook launches one helper instance for each selected instance.
- Make sure that you have enough IP addresses in the selected subnet for your instances.
- Make sure that you have enough instance quotas and EBS volume quotas.
- The amount of time that the automation runbook takes to complete depends on the concurrency amount selected.
Use instance IDs
Complete the following steps:
- Open the AWSSupport-StartEC2RescueWorkflow runbook on the Systems Manager console.
- Under Execute automation runbook, choose Rate control.
- In the Targets section, choose InstanceId for Parameter, and then choose Parameter Values for Targets.
- For Input parameters, select the instances that you want to restore.
- For Rate Control, choose your concurrency option for the number of resources that are allowed to simultaneously run the automation. For more information, see Control automations at scale.
- Choose Execute.
Use tags
Complete the following steps:
- Create a new, unique tag to use only for the instances that you want to restore. All instances with this tag are targeted for recovery and can cause unintentional data loss, or affect instance availability. For more information, see Tag your Amazon EC2 resources and Using Tag Editor.
- To verify that only the affected instances share the new tag, use AWS Resource Explorer or the Tag Editor.
- Open the AWSSupport-StartEC2RescueWorkflow runbook on the Systems Manager console.
- Under Execute automation runbook, choose Rate control.
- In the Targets section, choose InstanceId for Parameter, and then choose Parameter Values for Targets.
- For Tags, choose Edit, enter the tag key and value, and then choose Save.
- For Input parameters, select the instances that you want to restore.
- For Rate Control, choose your concurrency option for the number of resources that are allowed to simultaneously run the automation. For more information, see Control automations at scale.
- Choose Execute.
Use resource groups
Complete the following steps:
- Create a new resource group to use only for the instances that you want to restore. All instances in this resource group are targeted for recovery, and can cause unintentional data loss or affect instance availability. For more information, see Creating query-based groups in AWS Resource Groups.
- To verify that only the affected instances belong to the new resource group, use the AWS Resources Groups console.
- Open the AWSSupport-StartEC2RescueWorkflow runbook on the Systems Manager console.
- Under Execute automation runbook, choose Rate control.
- In the Targets section, choose InstanceId for Parameter, and then choose Parameter Values for Targets.
- For Resource Groups, select the new resource group.
- For Input parameters, select the instances that you want to restore.
- For Rate Control, choose your concurrency option for the number of resources that are allowed to simultaneously run the automation. For more information, see Control automations at scale.
- Choose Execute.
Manually restore your instances
Complete the following steps:
- Create a snapshot of the instance's EBS root volume.
- Create a new EBS volume from the snapshot in the same Availability Zone.
- Launch a new Windows instance in the same Availability Zone.
- Attach the new EBS volume to the new instance as a data volume.
- Download EC2Rescue for Windows Server tool to the helper instance.
- Right click on EC2Rescue.exe, and then choose Run As Administrator.
Select I agree on the License Agreement.
On the Welcome screen, choose Next.
On the Select Mode screen, choose Offline Instance.
Select the offline disk, and then choose Next. When prompted, select Yes and then OK.
Keep EC2 Rescue running.
Note: If your original instance uses BitLocker to encrypt the EBS root volume, then follow the on-screen prompts to provide your password or BitLocker recovery key. Or, you can use manage-bde unlock from the command line. For more information, see manage-bde unlock on the Microsoft website. After the drive is unlocked, repeat step 6. - Navigate to the X:\Windows\System32\drivers\CrowdStrike\ folder on the attached volume, and then delete C-00000291*.sys.
Note: In this example, X: is the drive letter that's assigned to the secondary EBS volume from the affected instance. It might be a different letter in your environment. - Return to EC2 Rescue.
Choose Diagnose and Rescue, and then choose Next.
Keep all options as default.
Choose Next, and then choose Next again.
When prompted, choose Rescue, choose OK, and then Next.
Choose Finish.
On the pop-up window that appears, select Fix disk signature, and then choose OK.
If Fix disk signature is greyed out, then choose OK. - Detach the EBS volume from the new instance.
- Create a snapshot of the detached EBS volume.
- Select the same volume type (for example, gp2 or gp3) as the affected instance, and then create an Amazon Machine Image (AMI) from the snapshot.
- Replace the root volume on the original EC2 instance and specify the AMI.
Amazon WorkSpaces
If multiple reboots don't return your WorkSpace to a healthy state, then restore the WorkSpace to a previous snapshot. If your WorkSpace restore doesn't return your WorkSpace to a healthy state, then rebuild the WorkSpace.
Troubleshooting
If the preceding steps don't resolve connectivity to your EC2 instance, then follow these troubleshooting steps for your recovery option.
Systems Manager automation runbook
Issue: The helper Instance can't connect to the AWS service endpoints. This issue can cause a failure in the waitForEc2RescueInstanceToBeManaged automation workflow step.
Solution: Confirm that the default security group allows outbound traffic (TCP port 443) to reach the Systems Manager and S3 endpoints. Also, make sure that the selected subnet can connect to these endpoints. To use a custom security group, update the HelperInstanceSecurityGroupId parameter value with your security group ID. If you chose a public subnet, then set the AssociatePublicIpAddress parameter to True. You can also set the SubnetId parameter as CreateNewVPC for the automation to create a new VPC with the required connectivity.
Issue: The affected instance fails to stop because stop protection is turned on.
Solution: Turn off stop protection for the affected instance, and then run the automation again.
Note: If your instance uses instance store volumes, then data that's stored on those volumes doesn't persist when the instance is stopped.
Issue: The helper instance fails to launch.
Solution: The instance type that's selected for the EC2Rescue instance might not be available in the Availability Zone of the subnet of the helper instance. Use a supported instance type in the same Availability Zone as the helper instance.
Issue: When the automation verifies that the AWS CloudFormation stack creation is completed, the automation fails with the error "Stack AWSSupport-EC2Rescue-{UUID} entered unexpected state: DELETE_IN_PROGRESS".
Solution: To determine what caused the stack to fail, get the UUID ID and use the CloudFormation console. Stack failure might occur when you don't have permissions to create the stack resources. For more information, see the Required IAM permissions section of AWSSupport-StartEC2RescueWorkflow and How can I troubleshoot access denied or unauthorized operation errors with an IAM policy?
Issue: The runbook fails in the assertInstanceRootVolumeIsNotEncrypted automation workflow step because of an encrypted EBS volume.
Solution: If the volume uses EBS encryption, then set the AllowEncryptedVolume parameter to True.
Issue: The default VPC was deleted.
Solution: Set the SubnetId parameter as CreateNewVPC to create a new VPC that allows the instance to successfully recover.
Issue: The detachInstanceRootVolume automation workflow step fails with the error message "error occurred (IncorrectState) when calling the DetachVolume operation: Unable to detach root volume".
Solution: When you're running the automation, keep the instance in the Stopped state.
Manual instance restore
Issue: The instance fails to boot with the error: "The application or operating system couldn't be loaded because a registered file is missing or contains errors".
Solution: If you didn't select Fix disk signature, then you might have a disk signature collision.
To resolve this issue, follow step 8 in the manual restore steps. If you restored the instance without EC2 Rescue, then see Troubleshoot issues with Amazon EC2 Windows instances.
Note: If the preceding troubleshooting steps don't resolve connectivity to your EC2 instance, then contact AWS Support. Make sure that you capture a screenshot of the unreachable instance.
Related information
I am using Windows 2019 R2 and getting Null when clicking "Choose Diagnose and Rescue, and then choose Next."
The last portion of the cmd line immediately above "Option 2" is:
("[REPLACE_WITH_BASE64_HERE"))
Is this a missing right bracket? ]
Hello @M, thank you for reporting the syntax error, it has been corrected now.
Relevant content
- Accepted Answerasked 2 years agolg...
- asked a year agolg...
- Accepted Answerasked 2 months agolg...
- asked a year agolg...
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago