Apache MSK Prometheus JMX metric endpoint 429 errors

1

I am running into an issue with my MSK cluster's broker prometheus metrics. The JMX metrics endpoint constantly returns 429 (too many requests) errors when prometheus attempts to scrape the /metrics endpoint on port 11001 (JMX).

This does not seem to be related to broker instance type (3 m5.large brokers), as the Node metrics endpoint on port 11002 AND my consumers do not run into any throttling issues ever.

This is problematic, as I wish to monitor OffsetLag and other broker-specific metrics; the inconsistency of JMX metrics scrapes makes this nearly impossible. I have found no info anywhere else of anyone running to this particular error. Like I mentioned, I am only running into 429 errors on this JMX metrics endpoint, not anywhere else.

I have even pushed back the scrape interval to 2+minutes, and this does not solve the problems.

  • I'm having the same issue - the throttling on the endpoint is extreme and I'm not sure how to reliably get metrics. If the metrics are too expensive to calculate it should serve cached metrics not 429

Steven
已提問 6 個月前檢視次數 519 次
1 個回答
0

@Frank,

The only fix that worked for me deleting and recreating the MSK cluster itself; the new one does not seem to throttle.

Not the best solution, but just another AWS managed service quirk. Spent a lot of time and energy trying to find a solution that did not involve recreating the cluster...

Steven
已回答 4 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南