I'm looking for suggestions for a good monitoring tools, or tools, to handle a mixed Linux (RedHat 4-5) and HPUX environment.
Currently we are using Hobbit which is working reasonably well but it is becoming harder to keep track of what alerts are sent out for what servers.
Features I'd like to see:
- Easy configuration of servers.
- The ability to monitor CPU, network, memory, and specific processes
I've looked into Nagios but from what I have seen it won't be easy to set up the configuration for all of our servers ~200 and that without installing a plugin into each agent I won't be able to monitor processes.
Nagios may have a bit of a learning curve, but you can define templates within its configuration files that can be reused by other objects in it to save you time. It's a great monitoring system. You typically don't need a client installed on each host it is monitoring so long as the hosts have SNMP running.
Monitoring Windows systems with it can be a little different. For them NSClient++ works very well and is easy to install, even via a script, SMS, etc. http://nsclient.org/nscp/
Set up SNMP on your servers, preferably via some configuration management tool like Puppet.
Then, use a monitoring tool like Zenoss Core to monitor them. Zenoss can scan a subnet for hosts, which makes it easy to add 200 servers, and you can group/organize the servers in various ways, to determine what exactly is monitored.
We're only monitoring a dozen devices so far, but Zenoss is very powerful yet user friendly. It has a friendly GUI, history graphs, alerts, etc.
My understanding is that Nagios is more suited for smaller installations. While I have not used it, it seems that OpenNMS is better suited for the scale of your installation.
Someone wrote a comparison between Nagios and OpenNMS
The good news is that there are many solutions to handle your requirements, now you get to choose. I'd look into the following products:
Zenoss
Groundworks
Zabbix
Hyperic
If you are allowed to use SNMP, give a look at Cacti. It's more easier to add / remove hosts than Nagios and i like their interface more. Cacti has ability to monitor CPU, network interfaces, memory usage, disk space usages, and services.
I would recommend Zabbix, It can monitor your hosts with SNMP or via a agent installed on the servers, it is very flexible and scalable. Zabbix provide host discovery, but you can also make a XML file to import your devices into its database. They recently released an API interface wich make easy to integrate the datas from the monitoring into other applications (We've successfully build an Iphone app on top of this API).
Hope this helps.