I have been thinking about getting started with monitoring software for a while now, but never seem to get started with it well.
I have heard Nagios is a pretty decent open-source solution for this, but have never been able to properly get started with it.
Does anyone have any tips with some good approaches to getting started on server monitoring? I am thinking of things like number of network connections, load average, maybe bandwidth used by the server, etc. The basics involved, largely (which may include basics that I do not know about).
The basics of nagios monitoring is stuff like ping and SNMP. There's a whole host of packages available in Ubuntu to support nagios monitoring apt-cache search nagios.
SNMP bears mentioning: its typically deployed insecurely, so don't expose any write strings and dont send anything that you don't want anyone/thing else on the network to know about.
UbuntuGeek publishes a walkthrough of setting up nagios.
For graphing long term trends we use OpsView, which publishes apt repositories for their web frontend.
After you have installed it, I would recommend the following to get a quick head start:
This will let you get going right away with results. Once you've got the problem areas covered, begin applying the checks to all known services (http, https, ssl certificate checking, pop3, etc).
For long term trending, give serious consideration to a tool like Cacti. This is great for gathering SNMP info across Unix and Windows boxes (if using windows, make sure that you install the free SNMP Informant) and allowing to see how it changes over a period of time.