My department maintains 6 servers running both Windows Server 2003 and Ubuntu Server.
We have to report and track our uptime. I believe we have to have a 95% uptime and we have no real way to track and report this data. Currently we are just doing this manually with a text file and estimations on downtime.
What tools are out there to help with this task or how do you currently report and track your server uptime?
Ah, one of my favourite topics.
First, you need to define 'uptime'.
Do you mean the server is running? (in which case, just ping it regularly in a script).
Or do you mean the application is running? (connect to the application's 'home page' regularly, assuming it's a web app)
Or do you mean the application is providing the business services it is supposed to? (in which case, you need to runs some sort of synthetic transaction.
I think only the last one is in any sense correct. The others are technically easier to do, but don't really correlate with "is this server providing value to the business".
As you will see if you click on the link I added, there are many companies selling solutions that do this, or you can roll your own. I've experience with NetIQ's products, and Microsoft MOM (thw two have a shared history), but I'm sure others work as well.
When you do pick a tool, consider how to account for planned upgrades and maintenance periods - a naive approach will record these as downtime.
Also, 95% is very undemanding - it's equivalent to 72 minutes of downtime each day, or more than 8 hours a week. Try taking your server out of service for all of the working day each Thursday, say, and I think you'll discover your SLA is actually a bit more demanding than that ...
I use http://mon.itor.us/ (but it is down at the moment).
nagios will give you downtime reports, and is available in the standard ubuntu repositories.