We have several producers and several consumers connected via Kafka - basically, it's batch processing jobs that are created on demand, placed on Kafka in several queues, and the batch processors pick them up via Kafka and process them one by one.
I want to visualize and monitor the lengths of these queues in Kafka. The queue lengths will serve as proxies for the "load" of the overall system. The more jobs waiting in queues, the more "loaded" the system is.
Our Kafka is an AWS MSK cluster. I've enabled Prometheus JMX monitoring, and I'm scraping all metrics every 10 seconds.
Looking at the metrics, there's nothing that appears immediately obvious in terms of queue length. Is that parameter exposed as a metric by default?
If queue length is not exposed by default, what is a good way to collect that metric? Assume I can write a Python script with any libraries installed, and I have full access to the Kafka endpoints from it.
Note: I understand the basic concepts, but I don't have a lot of practical experience with Kafka (I've only used RabbitMQ before), so apologies if my vocabulary is not very precise. E.g. what I called "queue" is apparently called "topic".