I've seen this question on the mailing list a few times but haven't had a satisfactory answer.
How best to monitor that the pipeline isn't stuck? Clients -> logstash -> elasticsearch.
Logstash and especially elasticsearch are prone to resource starvation. They are both fantastic at picking up where they left off but how, exactly, are people watching their watchers?
Opinions welcome.
Personally i actually check that redis is still dequeuing on the central logging host, which is upstream of LS+ES.
i.e:
redis-cli llen logstash
is less than some fixed number.This may not indicate that logs are appearing in redis at all though, but that could be checked too i guess.
Something like checking that
redis-cli info | grep total_commands_processed
keeps increasing, maybe ?I use zabbix in my environment, but I suppose this method could work in other setups as well. I have configured the following command that zabbix is allowed to use:
This will return the number of elasticsearch records committed total. So I take this value and divide by the number of seconds since I took the last sample (I check every minute), if this number drops below an arbitrary limit I can alert off it. I also use zabbix to check to see if the logstash PID has died, and alert off that also, and run the following command:
This will return 1 if cluster health has gone red (yellow and green are okay), which I can also alert off.
Check to see that the logs per second at your final endpoint (e.g. elasticsearch) are above some baseline.
That is, do an end to end check, if your end result is working correctly, you know that all the steps in the pipeline work correctly.
If you frequently have problems, or need better introspection, start instrumenting each piece of the pipeline like redis as suggested above.
We use several approaches: