I'm currently using munin to monitor a bunch of linux servers (as well as a few WinXP clients). However, munin does its data collection sequentially, and seems to be very susceptible to timing out when clients disconnect in the middle.
Are there any parallel versions?
Is there any way to handle the case of a disconnected client quicker?
Right now, many of my data collection tasks take longer than the 5 minutes until the next collection starts, leading to both warnings and missed data points.
By default,
munin-update
should use--fork
which "if set, will fork off one process for each host." Check whether your distribution didn't disable it.Also, to reduce time spent waiting on dead clients, you can use
--timeout
to reduce the timeout for a host.