Load Balancing Issue with OTEL Collector Gateways

1

Enter image description here

I'm seeking assistance with a load balancing problem I'm encountering with my OTEL (OpenTelemetry) collector gateways. Despite using a Route 53 weighted routing policy of 50/50 and a Network Load Balancer (NLB) with a load balancing algorithm, the sticky nature of OTEL data seems to create a bias toward one of the collector gateways, resulting in an uneven distribution of traffic.

I'm looking for a way to ensure a more balanced load across the two collector gateways. Additionally, I have a couple of specific challenges:

  1. If one of the collector gateways goes offline and comes back online later, how can I ensure the traffic rebalances across the two gateways without losing any data?
  2. Is there a recommended approach or best practice for managing this load balancing issue with OTEL collector gateways?

Any insights or suggestions from those with experience in this area would be greatly appreciated. I'm open to exploring different solutions or configurations to address this problem effectively.

1 Answer
0

You have two collector gateways for OpenTelemetry (OTEL) data, but traffic isn't splitting evenly (50/50) despite your load balancer settings. This is because OTEL data can "stick" to one gateway, causing an imbalance.

So,

  1. Use the Load Balancing Exporter: This built-in feature in the OTEL Collector helps distribute data across your gateways. You can choose how to distribute data (by trace or service) to maintain accuracy. See the official docs for details: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/loadbalancingexporter/README.md

  2. Consider a Different Load Balancer: Network Load Balancers (NLBs) might not be ideal due to their "sticky sessions" feature. Explore options like:

a. AWS Application Load Balancer (ALB): https://aws.amazon.com/elasticloadbalancing/application-load-balancer/ (offers configurable session stickiness)

b. HAProxy: https://www.haproxy.com/ (open-source load balancer with various distribution algorithms)

To minimize data loss during gateway outages, consider:

=> Enabling receiver buffers: These act as temporary storage within the collector. => Implementing retries: Configure retries for failed deliveries to the backend.

answered 25 days ago
  • Thanks for the suggestions. Will the load balancer exporter work with OTLP/HTTP? Or only work with OTLP/gRPC

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions