Apache MSK Prometheus JMX metric endpoint 429 errors

1

I am running into an issue with my MSK cluster's broker prometheus metrics. The JMX metrics endpoint constantly returns 429 (too many requests) errors when prometheus attempts to scrape the /metrics endpoint on port 11001 (JMX).

This does not seem to be related to broker instance type (3 m5.large brokers), as the Node metrics endpoint on port 11002 AND my consumers do not run into any throttling issues ever.

This is problematic, as I wish to monitor OffsetLag and other broker-specific metrics; the inconsistency of JMX metrics scrapes makes this nearly impossible. I have found no info anywhere else of anyone running to this particular error. Like I mentioned, I am only running into 429 errors on this JMX metrics endpoint, not anywhere else.

I have even pushed back the scrape interval to 2+minutes, and this does not solve the problems.

  • I'm having the same issue - the throttling on the endpoint is extreme and I'm not sure how to reliably get metrics. If the metrics are too expensive to calculate it should serve cached metrics not 429

Steven
posta 6 mesi fa519 visualizzazioni
1 Risposta
0

@Frank,

The only fix that worked for me deleting and recreating the MSK cluster itself; the new one does not seem to throttle.

Not the best solution, but just another AWS managed service quirk. Spent a lot of time and energy trying to find a solution that did not involve recreating the cluster...

Steven
con risposta 4 mesi fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande