I'd like to know what's the most publicly installed or widely used Linux system performance monitoring software that is available and compatible with Ubuntu 64 bit ?
I've installed Nagios in place for availability report but for the monitoring performance of more than 10 it is impractical to open SSH console running top for each server.
what I'd like to monitor is: 1. Disk space issue. 2. Resource hog. 3. Failed root / sudo login attempt. 4. anything else here ?
any kind of suggestion would be greatly appreciated.
Thanks.
I use munin, they have a live demo here. Rather then opening an ssh connection munin has its own munin-node service on each machine you are monitoring. These can be configured to restrict access via IP.
The conversation between a node and the main machine gathering all of the logs is pretty light weight, only consisting of RRD datapoints for each value being monitored, and occurs every 5 minutes. Scripts will run on the node, the stock ones being bash and maybe perl.
I haven't looked into doing your 3rd example with munin so I don't know if that is possible with the stock plugins. There is a repository of additional plugins at the munin exchange.
http://studyhat.blogspot.com/2010/06/install-and-configure-munin-for-server.html
may help you above link :)
You want an SNMP server on each monitored Ubuntu host for monitoring performance and disk space, and a central syslog server for monitoring log messages.
There are scads of tools that will collect and graph data from the SNMP server, and all syslog servers I'm aware of can collect syslog events from remote machines.
The metrics you cite are performance constraints - you're not measuring the server performance - although having said that, there are very few Nagios plugins available off-the-shelf for performance monitoring.
I would suggest Nagios as the tool for measuring and reporting performance problems - but you say you've already got it installed but "it is impractical to open SSH console running top for each server." - this doesn't make a lot of sense to me - Nagios is specifically designed to do that for you! Have a look at NRPE for details of how to manage monitoring from a central server.
"Disk space issue" - in the standard nagios plugins
"Resource Hog" - that's a rather meaningless metric. You can get current/cumulative CPU and memory usage, # open files and other stats per process from the /proc filesystem - wrapping them in a a script to create a nagios plugin is trivial. For measuring per-process disk I/O, this was always a bit of a problem on the 2.4 and early 2.6 kernels - but I understand its now possible in more recent kernels - see iotop for an implementation in Python.
"Failed root / sudo login attempt". As I've often said in the past, most of the security stuff written to logs tells you where the security is working properly - i.e. most of it is of no interest. The important things are where your security is compromised. What you should be looking at is successful root access. Nagios has plugins for log monitoring.
"Anything else" - well yes, performance monitoring. There are tools for injecting transactions on various services available as plugins for Nagios, bit without knowing what services you need to measure its hard to be more specific.