Skip to content

How do I troubleshoot failed backup jobs in AWS Backup?

7 minute read
1

My backup job fails and enters the FAILED state in AWS Backup.

Short description

You can't retry a failed backup job.

If a scheduled backup plan activates a backup job, then AWS Backup creates a new backup job for the resource during the next scheduled runtime. If you activate a manual or on-demand backup job, then you must make a new StartBackupJob request to back up the resource.

Note: If the backup job for your resource fails to run, then see Why are my scheduled backup plans in AWS Backup not running?

To view a backup job's status, use monitoring tools or run the describe-backup-job AWS Command Line Interface (AWS CLI) command. The status message provides troubleshooting information for the failed backup job.

Note: If you receive errors when you run AWS CLI commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you use the most recent AWS CLI version.

Resolution

Resolve IAM permission issues

You receive one of the following error messages when there's a permissions issue with an AWS Identity and Access Management (IAM) role:

  • "You are not authorized to perform this operation"
  • "Backup job failed because of insufficient privileges"
  • Other permissions related errors

To resolve a permissions issue, make sure that you meet the following requirements for the IAM role that creates your backups:

Note: If you use the default IAM role that AWS Backup creates, then the role has permissions through the AWSBackupServiceRolePolicyForBackup and AWSBackupServiceRolePolicyForRestores AWS managed policies.

AWS Backup creates the AWSBackupDefaultServiceRole and AWSServiceRoleForBackup IAM roles with different permissions and uses. Make sure that you use the correct IAM role. When you create a backup plan or activate a manual backup job, choose the default AWSBackupDefaultServiceRole role.

Note: It's a best practice to use AWSBackupDefaultServiceRole to create an Amazon Simple Storage Service (Amazon S3) backup. For the default role to perform actions on S3 resources, see Permissions and policies for Amazon S3 backup and restore.

Resolve issues with backup job completion

You receive one of the following error messages when a backup job didn't complete on time:

  • "Backup Job did not complete within completion window"
  • "An AWS Backup job failed to complete in time"

When a backup job fails because of one of the preceding errors, the job also shows the EXPIRED status. To resolve these issues, see Why is my backup job in the EXPIRED status in AWS Backup?

You can also use the Complete within parameter to specify a backup window in your backup rule configuration. Set a longer duration time for the Complete within parameter to make sure that your backup jobs complete on time.

Note: The Complete within parameter sets the time window that your backup must complete in. The time that it takes for AWS Backup to complete a backup job varies. If the data transfer that backs up your resources doesn't complete during the Complete within time window, then AWS Backup stops the backup job. Then, the backup job shows the EXPIRED status.

Resolve lifecycle issues

You receive the following error message when the backup vault has a vault lock with MaxRetentionDays and MinRetentionDays:

"Backup job failed because the lifecycle is outside the valid range for backup vault"

When the retention doesn't occur within the specified maximum and minimum retention periods, you restrict backup creation in the vault.

To resolve this issue, change the backup retention in your backup plan to occur within the specified range. Or, update the vault lock retention period configuration.

Resolve VMware backup issues

You receive one of the following error messages when you perform a VMware backup:

  • "Unsupported disk size detected during backup creation. Aborted backup job"
  • "Failed to process backup data during backup data processing. Aborted backup job"

To resolve the preceding error messages, take the following actions:

  • Make sure that AWS Backup supports your VMware virtual machines (VMs).
  • Open all the required ports so that the AWS Backup gateway can connect to the host and back up your VM. Then, confirm that you correctly configured the DNS servers on the gateway appliance.
  • If you set the disk modes to independent-persistent or independent-non persistent on your VMs, then change the mode to dependent of all disks. AWS Backup supports only dependent of all disks.
  • Check the virtual disk size of your VMs. AWS Backup supports only VM virtual disk sizes that are a multiple of 1 KiB. To troubleshoot a VM with a failed backup job, run the fdisk -l command. For more information, see Managing partitions in Linux with fdisk on the Red Hat website.

Resolve AWS KMS errors

You receive one of the following AWS KMS key error messages:

  • "FAILED - KMS key is either disabled or pending deletion or access to KMS key is denied"
  • "KMS key access denied error"
  • "KMS validation error"

To resolve the preceding issues, take the following actions:

  • For resources that don't support full AWS Backup management, check whether you deleted the AWS KMS key that encrypts the resource. If you deleted the key, then create a new AWS KMS key.
  • For resources that support full AWS Backup management, check whether you deleted the AWS KMS key that encrypts the destination backup vault. If you deleted the key, then create a new AWS KMS key.
  • Check whether you deactivated the AWS KMS key. If you deactivated the key, then reactivate it and create the backup job again.
  • If the AWS KMS key is pending deletion, then cancel the deletion.
  • Verify that the IAM role that's associated with the backup job has the required permissions on the AWS KMS key. The IAM role must have at least KmsDecrypt, KmsCreateGrant, and KmsGenerateDataKey permissions.

Resolve Windows VSS backup failure errors for Amazon EC2

To resolve Windows Volume Shadow Copy Service (VSS) backup failures for Amazon Elastic Compute Cloud (Amazon EC2), take the following actions:

  • Make sure that your EC2 instance profile role or backup role has the required permissions.
  • Install the latest versions of AWS Systems Manager Agent (SSM Agent) and AWS Tools for PowerShell on your EC2 instances. To install SSM Agent, see amazon-ssm-agent on the GitHub website.
  • Check your system resources to determine whether the Windows VSS backup timed out because the system was too busy when you took the snapshot.
  • Check the SSM Agent logs at C:\ProgramData\Amazon\SSM\Logs to troubleshoot issues that occur during the snapshot process.
AWS OFFICIALUpdated 3 months ago