I run a few Windows servers and (Debian and Ubuntu) Linux and AIX servers.
I would like to continously monitor performance on these systems in order to easily identify bottlenecks as well as to have an overview of the general activity on the servers.
On Windows, I use Windows Performance Monitor (perfmon) for this. I set up these counters:
For bottlenecks:
- Processor utilization : System\Processor Queue Length
- Memory utilization : Memory\Pages Input/Sec
- Disk Utilization : PhysicalDisk\Current Disk Queue Length\driveletter
- Network problems: Network Interface\Output Queue Length\nic name
For general activity:
- Processor utilization : Processor\% Processor Time_Total
- Memory utilization : Process\Working Set_Total (or per specific process)
- Memory utilization : Memory\Available MBytes
- Disk Utilization : PhysicalDisk\Bytes/sec_Total (or per process)
- Network Utilization : Network Interface\Bytes Total/Sec\nic name
(More information on the choice of these counters on: http://itcookbook.net/blog/windows-perfmon-top-ten-counters )
This works really well. It allows me to look in one place and identify most common bottlenecks.
So my question is, how can I do something equivalent (or just very similar) on Linux servers?
I have looked a bit on nmon (http://www.ibm.com/developerworks/aix/library/au-analyze_aix/) which is a free performance monitoring tool developed for AIX but also availble for Linux. However, I am not sure if nmon allows me to set up the above counters. Maybe it is because Linux and AIX does not allow monitoring these exact same measures. Is so, which ones should I choose and why?
If nmon is not the tool to use for this, then what do you recommend?
Looking at basic system metrics does not give a good indication of performance. It can indicate how performance is constrained - but if you want to measure the performance of your applications then you really need to look at real transactions.
Regardless, there are no end of tools for measuring performance. I use nagios. It's a bit lacking in trending / capacity management but is amazingly flexibile in reporting, escalation, fault isolation and to add custom scripts (which you'll need if you want to measure your transacions). Certainly there are probes available to cover all the metrics you've listed for both MSWindows and Linux.
There are a number of good options, some of them F/OSS (some F/OSS with support contracts available, some full commercial, for this.
I use http://collectd.org/ with my own script (based on this) to draw pretty pictures from the resulting data in rrd files and send me the occasional email. This may not be as practical for you though (I'm only monitoring a couple of machines).
For a larger install you probably want something like Zabbix (another open source option, but considered more "enterprise grade" than collectd).
You can find a fuller list at http://en.wikipedia.org/wiki/Comparison_of_network_monitoring_systems
I like munin because it's easy to install and use. (apt-get install munin munin-node)
We are using Nagios for basic monitoring and Graphite for the performance monitoring. Graphite is a very scalable solution. In combination with the Diamond plugin you can almost measure anything without too much effort.
In general, there are some steps that I follow as a sysAdmin to keep track of all the servers I use. System commands like top, free -m, vmstat, iostat, iotop, sar, netstat etc. Nothing comes close to these linux utility when you are analysing/debugging a problem. These commands give you a clear picture of what is going inside your server
Nagios: It tops all monitoring/alerting tools. It is very much customizable but very difficult to setup for beginners. Although there are some nagios plugins.
Server density: A cloudbased paid service that collects important Linux metrics and gives users ability to write own plugins.
New Relic, Zabbix and Munin are some other well-known services.
I have come across a similiar question earlier. You can see if the other answers help you.