OpenSearch Instance migration from Hot to Warm taking a few hours

0

Data instances (hot)

Nodes: 1

  • r6g.large.search
  • vCPU: 2 Memory (GiB): 16
  • SSD (gp3)
  • storage amount per volume (gp3): 2TB

Dedicated m@ster instances

  • Nodes: 3
  • r6g.largexlarge.search
  • vCPU: 4 Memory (GiB): 8

UltraWarm instances

  • Nodes: 914
  • ultrawarm1.largemedium.search

I have one index with ~150 GB of total storage and the ultrawarm endpoint has been stuck on RUNNING_FORCE_MERGE for several hours now. Is there anything I can do to make the process more transparent?

GET _ultrawarm/migration/my-index/_status
{
  "migration_status" : {
    "index" : "nextgen_hot",
    "state" : "RUNNING_FORCE_MERGE",
    "migration_type" : "HOT_TO_WARM"
  }
}

The cluster allocation explain endpoint returns an error

GET _ultrawarm/migration/my-index/_status
{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "unable to find any unassigned shards to explain [ClusterAllocationExplainRequest[useAnyUnassignedShard=true,includeYesDecisions?=false]"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "unable to find any unassigned shards to explain [ClusterAllocationExplainRequest[useAnyUnassignedShard=true,includeYesDecisions?=false]"
  },
  "status" : 400
}
1 Answer
0

Hello,

Thank you for reaching out to us. Please find information regarding migrating from Hot to Warm below.

OpenSearch Service automatically runs a force merge operation during the disk space validation.

Index migrations to UltraWarm storage require a force merge. Each OpenSearch index is composed of some number of shards, and each shard is composed of some number of Lucene segments. The force merge operation purges documents that were marked for deletion and conserves disk space.

If low 'Hot' disk space is detected, then the force merge operation request is blocked. The request is blocked because the force merge operation processes one shard at a time. Additionally, each shard requires three times the amount of occupied hot disk space and an additional 20 GB of free hot storage space per node before the index has been migrated to warm storage.

For the indices to be migrated from hot storage to warm storage, there must be enough space on hot storage for force merge operation that happens before the indices are migrated to warm storage.

Run the following command to see the available disk space per node:

GET _cat/allocation

The nodes must have enough free storage space to fulfill the disk space requirement for a migration (three times the shard size plus 20 GB).

If you don't have enough disk space, you can perform the following [1]:

  • Delete unused indices
  • Increase the EBS volume on hot nodes

Once the warm migration is successful you can reduce the EBS volume as per your requirements.

Force Merge is a slow operation and takes time depending upon the number of segments present in each shard and size of the shard. Ultrawarm migration process involves snapshot, force merge and shard relocation operations. One of the factors that determine time to complete all these operations is index size.

To note, the migration process has the following states:

  • PENDING_INCREMENTAL_SNAPSHOT
  • RUNNING_INCREMENTAL_SNAPSHOT
  • FAILED_INCREMENTAL_SNAPSHOT
  • PENDING_FORCE_MERGE
  • RUNNING_FORCE_MERGE
  • FAILED_FORCE_MERGE
  • PENDING_FULL_SNAPSHOT
  • RUNNING_FULL_SNAPSHOT
  • FAILED_FULL_SNAPSHOT
  • PENDING_SHARD_RELOCATION
  • RUNNING_SHARD_RELOCATION
  • FINISHED_SHARD_RELOCATION

Is there anything you can do to make the process more transparent?

You could keep monitoring the migration progress on your end by running the following command below till its complete:

GET _cat/recovery

By default, UltraWarm merges indexes into one segment You can change this value up to 1,000 segments using the index.ultrawarm.migration.force_merge.max_num_segments setting. Higher values speed up the migration process, but increase query latency for the warm index after the migration finishes [2].

To change the setting, make the following request:

PUT my-index/_settings
{
  "index": {
    "ultrawarm": {
      "migration": {
        "force_merge": {
          "max_num_segments": <a value between 1 to 1000>
        }
      }
    }
  }
}

Regarding the error for cat allocation explain API call, there are currently no unassigned shards on the domain, hence an error was thrown with reason: "unable to find any unassigned shards to explain"

However, for a more technical explanation regarding your domain, I request you to please open a support case with AWS Premium Support and we would help you out further.

I hope this information was helpful to you.

References:

[1] How do I troubleshoot an UltraWarm storage migration issue in Amazon OpenSearch Service? https://aws.amazon.com/premiumsupport/knowledge-center/opensearch-ultrawarm-troubleshoot/

[2] UltraWarm storage for Amazon OpenSearch Service - Migration tuning https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ultrawarm.html#ultrawarm-settings

AWS
answered a year ago
  • I'm sorry Cameron, but you just literally just copied/pasted content from those pages and read my error message back to me without offering any additional insight. I'll open a support case with AWS.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions