Ping a Specific Port

Question

sumek

Asked: 2011-01-22 01:53:13 +0800 CST2011-01-22 01:53:13 +0800 CST 2011-01-22 01:53:13 +0800 CST

Nagios graphing solutions vs Munin/Cacti/Ganglia

772

I've got a nagios server setup for monitoring ~ 30 Windows servers. I want to add some trending charts. I've read that nagios graphing plugins are simple and many people use seperate, standalone charting/trending tools.

What are the restrictions of the nagios graphing plugins vs standalone products like ganglia/munin/cacti?

I'm interested in specific features and advantages that standalone packages offer and nagios graphing plugins don't.

6 Answers

Voted

MadHatter · Answer 1 · 2011-01-22T03:18:22+08:00

I concur with lynxman. NAGIOS is for immediate qualitative data (is X OK or not?); munin is for historical quantitative data (how full is X now, and how full has it been this year?). All my NAGIOS installations, some of which monitor several hundred services, are linked to munin systems to do the quantitative monitoring.

Note also that munin has specific hooks for feeding data into NAGIOS. It understands the concept of WARNING and CRITICAL thresholds, and where notification (and a view on the NAGIOS "big board") is required it's very very easy to have a single munin variable inform the state of a single NAGIOS service.

The usual workflow is that noone looks at the munin graphs until NAGIOS alerts that a threshold has been breached, but then the munin graphs become invaluable for finding out whether something has been slowly ramping up over time, or this is an out-of-the-blue increase, or we have a weekly up-and-down cycle which is slowly increasing in amplitude, or what.

As lynxman says, the UNIX way is "one task, one tool". Making a toolchain of munin and NAGIOS works very well for me to provide quantitative and qualitative monitoring as well as notifications. It also has the distinct advantage of keeping the interfaces clean: when you look at NAGIOS, you see a simple view of how well things are working right now, with no historical data cluttering up the view; when you look at munin, you see historical information pertinent to the issue ready for your analysis, without "host is down" or "sshd won't talk to me" errors cluttering the view.

Matthew Wall · Answer 2 · 2011-02-05T18:11:17+08:00

given that you already have a nagios installation, consider nagiosgraph or pnp4nagios.

nagiosgraph and pnp4nagios do a pretty nice job of plotting nagios performance data. nagiosgraph has a parameter-based approach to configuration, pnp4nagios has a template-based approach.

both automatically detect new hosts/services whenever the nagios configuration changes
both do graph zooming
both provide graphs when you mouseover specific hosts/services
both provide many ways to slice and dice your data
both detect and graph the critical and warning levels you have already defined in nagios
both can be embedded directly into the nagios frame for seamless, uncluttered navigation from current status to history and back

slicing and dicing the data are pretty important, imho. for example, you can view all services on a single host, or view all hosts with a specific service, or view arbitrary collections of graphs for arbitrary hosts and services.

installation is not trivial, but not difficult. a lot depends on how much you want to customize things. for example, nagiosgraph is 'install.pl' or 'rpm -i nagiosgraph.rpm' or 'dpkg -i nagiosgraph.deb'. pnp4nagios is './configure; make; make install'.

n2rrd can do some of these things as well, but it is not as polished and requires more work to configure.

rrdtool has quirks wrt data storage, and any system will have sampling issues. rrdtool does some data smoothing by default, but you can capture (and graph) maximums and/or minimums in addition to averages if necessary.

every rrdtool-based approach suffers from data/graph staleness since the schema in each rrd file is static and most systems use the rrd filename to identify the data. data are typically never lost when a hostname or service name changes; the rrd files still exist on disk. but some user interfaces provide ways to see 'stale' rrd files, others require manual housekeeping via command line. on many installations this is only an issue when initially configuring the system, but in dynamic environments (e.g. monitoring virtual machines whose lifetime is only a few months) it can become tedious.

one final note. there are actually two parts to trending: data collection and data display. if you go with a standalone graphing system rather than extending your existing nagios installation, then you might have to install additional components on your windows machines in order to collect the data.

lynxman · Answer 3 · 2011-01-22T02:55:02+08:00

lynxman

2011-01-22T02:55:02+08:002011-01-22T02:55:02+08:00

Nagios graphing plugins as you say are very restricted, they offer a very basic rrdtool interface and the UI design is a bit counter intuitive, it's basically a hack over nagios, tried to use that just for fun but it broke several times without warning.

Going for a standalone product (especially munin or ganglia) offers you a big range of services that nagios can't accomplish, as the unix mantra it's better to be good at just one thing than try to be good at many, nagios is amazing for monitoring and munin/ganglia/cacti are amazing at graphing.

3

Kyle Brandt · Answer 4 · 2011-01-22T05:25:26+08:00

At Stack Overflow we use n2rrd which is a Nagios plugin for graphing performance data. To an extent I would agree with lynxman that it does have a big of a hackish feel.

However:

With n2rrd you can have Cacti do the graphing based of the data instead of the rrd2graph.cgi that comes with n2rrd
n2rrd with the rrd2graph.cgi does support zooming
As far as complicated aggregate graphs -- you basically manipulate the rrd graphs by hand and can do whatever you want with them.

The rrd graphs are stored according to the server names, so if you change the name of something you sort of loose the data... You could always just rename the files are symlink them though and you won't loose the data.

I have some examples of these graphs up at my recent Some Tips for Better RRD Graphs Server Fault Blog post. Also, the n2rrd page includes both the cacti demo as well as rrd2graph.

I think the bottom line is that going the Nagios route might be lacking in a feature or two but is pretty complete if you don't mind getting your hands dirty with the details of writing rrd templates yourself*. It is probably going to take more of your time but it will encourage to develop more expertise in rrd.

mark seger · Answer 5 · 2011-02-05T04:33:22+08:00

I demand accurate data and rrd's data display is not accurate - it's normalized! For most users this is fine because they're not using very accurate data to begin with. They're using data whose sample rates are often at a minute or more and that isn't going to give you a very accurate description of what is happening. This also means that if you have a spike in your data somewhere you may never see it.

Consider this - say your Gb network is humming along at about 10MB/sec and all of a sudden there is a spike of 100MB/sec for a couple of minutes. Also note if it was only a 30 second spike you might not even see it at sampling rates of a few minutes. If you look at the data for the day, that 'spike' may only show up as 15MB/sec, though the actual value depends on a number of other factors as well. There's also a very likely probability you'll assume your network is happy when it isn't!

What's even more frustrating for me is the data normalized to the physical width of the graph and range of the x-axis. What this means is that spike I mentioned you didn't see? If you zoom in it magically appears! I'll stick to gnuplot - the graphs may not be as pretty but they're rock solid and gnuplot never modifies the data before displaying it.

-mark

Matthew Thode · Answer 6 · 2011-02-05T06:35:42+08:00

Matthew Thode

2011-02-05T06:35:42+08:002011-02-05T06:35:42+08:00

I find using pnp4nagios works quite well for graphing. It supports zoom as well. It is not the easiest to implement, but nothing with nagios ever is.

0

Nagios graphing solutions vs Munin/Cacti/Ganglia

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?