My Amazon OpenSearch Service domain is stuck or can't complete an upgrade process.
Short description
OpenSearch Service uses a blue/green deployment process for domain upgrades. During this process, issues with shard relocation can cause the domain to get stuck in the Modifying state. Or, the domain might fail the upgrade validation checks.
Note: You can't cancel an upgrade after you initiate it. If the upgrade gets stuck, then you must wait for AWS to resolve the underlying issue before the process can complete.
Resolution
The domain is stuck in the Modifying state during shard relocation
During the blue/green deployment, OpenSearch Service copies data from the existing nodes to the new nodes. If the shard relocation process gets stuck, then the domain remains in the Modifying state.
This issue occurs for the following reasons:
- Large shard sizes of over 50 GB increase the shard copy time.
- An index or search load causes high resource usage on the cluster.
- There isn't enough disk space to copy shards to the new nodes.
- You reached a shard count quota.
To troubleshoot these issues, use Amazon CloudWatch to view the FreeStorageSpace, ClusterStatus, and ClusterIndexWritesBlocked OpenSearch Service metrics for disk space issues or high cluster load.
Based on the issues that you identify, take the following actions:
For more information about how to delete indexes, see DeleteIndex or Delete Index API on the OpenSearch website.
To monitor the shard migration process, run the following command:
GET _cat/recovery?active_only=true
The command output shows the active shard recovery status, progress percentage, recovery time, failure status, and data transfer size. If shards aren't progressing or you receive an empty output, then there might be issues with the upgrade process. To troubleshoot upgrade issues, create an AWS Support case.
If your shard sizes are over 50 GB, then reindex your data to create more shards with a smaller size. For instructions, see Reindex data on the OpenSearch website.
If none of the preceding troubleshooting actions resolve the issue, then create an AWS Support case.
The domain fails the upgrade validation checks
During an upgrade, OpenSearch Service validates that your domain configuration is compatible with the new version. If the validation fails, then the domain becomes stuck in the Modifying state.
To resolve this issue, check the domain description for failed activities or validation-related error messages. Complete the troubleshooting steps for your error, and then retry your configuration change.
Best practices to prevent upgrade issues
Take the following actions:
Related information
Why is my OpenSearch Service domain stuck in the "Modifying" state?