By using AWS re:Post, you agree to the AWS re:Post Terms of Use

How do I troubleshoot a circuit breaker exception in OpenSearch Service?

6 minute read
1

I want to send data to my Amazon OpenSearch Service cluster. However, I receive a circuit breaking exception error that states that my data is too large.

Short description

When a request reaches OpenSearch Service nodes, circuit breakers estimate the amount of memory required to load the data. OpenSearch Service then compares the estimated size with the configured heap size quota. If the estimated size of the data is greater than the available heap size, then OpenSearch Service terminates the query. To prevent overloading the node, OpenSearch Service shows a CircuitBreakerException error.

To resolve circuit breaking exception errors, first identify the circuit breaking exemption. Then, to reduce the load on your data nodes in the future, reduce high Java virtual machine (JVM) pressure.

Resolution

Identify the circuit breaking exemption

OpenSearch Service uses the following circuit breakers to prevent JVM OutofMemoryError exceptions:

  • Request
  • Field data
  • In flight requests
  • Accounting
  • Parent

Important: Identify each of the five circuit breakers that raises the exception. Each circuit breaker has its own tuning needs. For more information about circuit breaker types, see Circuit breaker settings on the Elasticsearch website.

To get the current memory usage per node and per breaker, run the following command:

GET _nodes/stats/breaker

Note that circuit breakers are a best-effort mechanism. Circuit breakers provide limited resiliency against overloading nodes. However, you might still receive an OutOfMemoryError. Circuit breakers can track memory only if it's explicitly reserved. It's not always possible to estimate the exact memory usage in advance. For example, if you have a small amount of memory heap, then the relative overhead of untracked memory is larger. For more information about circuit breakers and node resiliency, see Improving node resiliency with the real memory circuit breaker on the Elasticsearch website.

If you reach the circuit breaker quota and use Elasticsearch version 7.x or higher with 16 GB of heap, then you receive the following error:

{
    "error": {

        "root_cause": [{

            "type": "circuit_breaking_exception",

            "reason": "[parent] Data too large, data for [<http_request>] would be [16355096754/15.2gb], which is larger than the limit of [16213167308/15gb]",

            "bytes_wanted": 16355096754,

            "bytes_limit": 16213167308

        }],

        "type": "circuit_breaking_exception",

        "reason": "[parent] Data too large, data for [<http_request>] would be [16355096754/15.2gb], which is larger than the limit of [16213167308/15gb]",

        "bytes_wanted": 16355096754,

        "bytes_limit": 16213167308

    },

    "status": 503

}

The preceding example output shows that the data is too large for the parent circuit breaker to process. The parent circuit breaker manages the overall memory usage of your cluster. When a parent circuit breaker exception occurs, the total memory used across all circuit breakers exceeds the set limit. A parent breaker throws an exception when the cluster exceeds 95% of 16 GB (15.2 GB of heap).

To verify the logic, calculate the difference between memory usage and set circuit breaker limit. That is, with this example output, subtract the real usage of [15283269136/14.2gb] from the limit of [16213167308/15gb]. The example request needs 1.02 GB of new bytes reserved memory to process the request. However, the cluster has less than 0.8 GB of available free memory heap when the data request is received. As a result, the circuit breaker trips.

To interpret the circuit breaker exception message, use the following guidelines:

  • data for [<http_request>]: The client sends an HTTP request to a node in your cluster. OpenSearch Service either processes the request locally or passes it to another node for additional processing.
  • would be [#]: The heap size when the request is processed.
  • limit of [#]: The current circuit breaker limit.
  • real usage: The actual use of the JVM heap.
  • new bytes reserved: The actual memory needed to process the request.

Reduce high JVM memory pressure

High JVM pressure often causes circuit breaking exemptions. JVM memory pressure is the percentage of Java heap that all data nodes in your cluster use. Check the JVMMemoryPressure metric in Amazon CloudWatch to determine your current usage.

Note: The JVM heap size of a data node is set to half the size of physical memory (RAM) per node, up to 32 GB. For example, if the physical memory is 30 GB per node, then the heap size is 15 GB. If the physical memory is 128 GB per node, then the heap size is 32 GB.

To resolve high JVM memory pressure, take one or more of the following actions:

  • Reduce incoming traffic to your cluster, especially if you have a heavy workload. An increase in the number of requests to the cluster can cause high JVM pressure. Check the IndexRate and SearchRate metrics in CloudWatch to determine your current load.
  • Scale the cluster to get more JVM memory to support your workload.
  • If you can't scale the cluster, then delete old or unused indices to reduce the number of shards. Shard metadata is stored in memory. When you reduce the number of shards, you reduce the overall memory usage.
  • To identify faulty requests, turn on slow logs.
    Note: Before you make configuration changes, to avoid additional overhead to existing resources, verify that JVM memory pressure is below 85%.
  • Optimize search and indexing requests, and choose the correct number of shards for your use case. Unbalanced shard allocation across nodes or too many shards in a cluster can cause high JVM pressure. For more information about indexing and shard count, see Get started with Amazon OpenSearch Service: How many shards do I need?
  • Turn off and don't use the fielddata data structure to query data. By default, fielddata is set to false on a text field unless explicitly defined otherwise in index mappings. Fielddata can consume a large amount of heap space, and remains in the heap for the lifetime of a segment. As a result, when you use fielddata, JVM memory pressure remains high on the cluster. For more information, see fielddata on the Elasticsearch website.
  • Change your index type to a keyword. For more information, see Reindex API or Create or update index template API on the Elasticsearch website. You can use the keyword type as an alternative to aggregations and sorting on text fields.
  • To prevent increases in field data, avoid aggregating on text fields. When you use more field data, you consume more heap space. Use the cluster stats API operation on the Elasticsearch website to check your field data.
  • Remove aggregation, wildcards, and wide time ranges from your queries.
  • Clear the fielddata cache. Run the following API call:
    POST /index_name/_cache/clear?fielddata=true (index-level cache)POST */_cache/clear?fielddata=true (cluster-level cache)
    Important: If you clear the fielddata cache, then you might disrupt in-progress queries.

For more information, see How do I troubleshoot high JVM memory pressure on my OpenSearch Service cluster?

Related information

Operational best practices for Amazon OpenSearch Service

How can I improve the indexing performance on my Amazon OpenSearch Service cluster?

AWS OFFICIAL
AWS OFFICIALUpdated 5 months ago