Possible Duplicate:
What tool do you use to monitor your servers?
Can anyone recommend a website monitoring service?
My company builds and maintains sites for our clients. On any given date, we have 400-600 sites active. We build 5-10 per week and take down a similar number. We have basic "is the site live" monitoring enabled for each site, but I'd like to expand that.
Before we build our own, I'm looking for suggestions.
Monitoring would be centered around each site, not servers (all sites run on a load balanced pool of physical servers). For each site, we want to monitor different categories of information, such as:
- registration items: domain name expiration, SSL cert expiration. These data are derived from live checks.
- implementation checklist: Data checked by database queries and/or making an HTTP request and comparing resulting HTML against regular expressions
- performance metrics: Visits, conversion, etc. Data checked by database queries.
Because of how regularly we add/remove sites, we need this to be be able to add and remove through an API or other automated manner.
I've looked a bit at Nagios, Icinga, and Zenoss. Although these have plugins and APIs for expanding, each of these centers on a physical host. We really need to be able to have an at-a-glance view which highlights sites having trouble and then drill down to see what the problem is.
Are there tools that I've not found which address my needs? Has anybody used one of these others in a non-server-centric way?
There are plenty of costly tools to do this (or at least your 2nd and 3rd items). HP OVIS used to be one, now they've got something in the BA suite of tools. They will do synthetic transactions as an HTTP(S) user agent to verify certain values, keep logs of the page load times, etc.
I'm sure there's inexpensive tools to do this, as well, I'm just not that familiar with any of them. There's also commercial services you can pay per-month (which are often also per-site.) The benefit of those is that they will have multiple testing servers in geographically-dispersed areas, so if one of their servers or ISPs is down, you won't think that your customer's website is down.
If I were doing this, I would see if there were any existing monitoring tools/frameworks (like the ones you mentioned) that could use cURL or a similar tool as a plugin. Then, you set up actions with results that get parsed into the outputs you care about, store/track them, and you monitor/send alerts based on them as well. That also ought to be something you can use for your first item, which I think most people just try to track using static documentation. Querying the registrars' pages directly and keeping it up to date is a great idea though; if the correct actions are taken you won't get bit in the butt as often as some folks do.
Zabbix (FOSS) and InterMapper (commercial) are other options to look into. I know the latter can be manipulated through an XML API (if you ask their support folks nicely they'll give you very good unofficial help).
Pretty much any monitoring system can do what you want - You may be monitoring "localhost" with some parameters that get magically passed to a script, but you can monitor the stuff you want to see.
From the standpoint of pretty much every monitoring system each (web)site is a server -- You refer to it by a domain name & URL/path to access, talk some http(s) to it, and get a response that you can then analyze.
It doesn't matter if it's a physical server - As far as monitoring goes it's a discrete entity.
Having said that, make sure you aren't lax in monitoring the actual underlying servers and the health of the load balancer. Having sickly servers in your LB pool sets you up for a terrible domino-effect scenario (Server1's disk fills. It blows up. The LB sends its load to servers 2 and 3, which fills up Server2's disk...etc...)