I just want to monitor a small handful of servers (less than 10).
From reading various places it sounds like the top leading contenders (for open source at least) are:
- nagios
- munin
- zabbix
From what I have read a lot of people tend to use munin and nagios together -- munin for history and graphs, and nagios for alerting.
On the other hand it sounds like Zabbix is a more complete solution and easier to configure than either of the other two. So I was thinking of going that route.
My thoughts right now are:
- What are the general disadvantages of Zabbix?
- Does Zabbix have a small footprint on boxes it is monitoring?
- Do I really need to setup an entire other server for it? I currently have a server that is under very light load -- can I dual purpose it?
I think it would be best to concentrate on answering the specific questions you had, taking into account the size of your planned deployment (~10 monitored hosts).
What are the general disadvantages of Zabbix?
Does Zabbix have a small footprint on boxes it is monitoring?
Yes, definitely. Zabbix can monitor using methods like SNMP, simple network checks (is a port open?), and it also has native agent for many platforms. As the agent is written in C, it has an extremely small footprint (as opposed to bunch of interpreted scripts...). You can easily combine different checks on a single monitored host. Note that you are not limited to monitoring servers, you can also add network devices and other things.
Do I really need to setup an entire other server for it? I currently have a server that is under very light load -- can I dual purpose it?
Depends - if it's running one of the supported operating systems for the server - definitely. For that environment requirements will be really low. Make sure to use default templates only as a guideline, it's suggested to create your own with longer intervals between checks. Basically, Zabbix consists of 3 components - DB, frontend, server. If you desire so, you can reuse existing database server and existing webserver in the company for the first two components, and then run Zabbix server on any supported platform - that's a perfectly valid configuration.
Any specific queries would be very welcome in #zabbix on Freenode.
I use Zabbix for 2 years now, before I used Nagios...
In my opinion, the big difference is: with Nagios you get a status(OK/WARNING/CRITICAL), with Zabbix you get a data (integer, float, string...)
It's a really good point for Zabbix because:
Usage of agent to easily/rapidly collect basic system data is also very nice.
Disadvantages of Zabbix:
What are your goals for monitoring? Uptime? Performance? Billing metrics? Some of the utilities you listed above are better for each of those uses, and some are worse.
For uptime ensurance, we use monit, which is both free, and simple to set up on Unix/Linux systems. That utility monitors whether a process is alive, and ensures that it's not using more than its fair share of resources (CPU, memory) -- and if it's mis-behaving, monit will restart the process.
For performance monitoring, I suggest munin. It is easy to configure, and uses perl/bash/python/whatever as a data collection method. Munin has the ability to collect performance from multiple machines in one place, and builds easy to understand graphs.
For billing metrics (bandwidth consumption), I suggest PRTG. It's not free, but provides professional-level reports and statistics that can easily be used as part of your customer's billing report, if you do that sort of thing. We replaced our Zabbix installation, which required the use of agents on each monitored machine, with PRTG, which uses SNMP, and we have never looked back.
I have also used Zenoss, which was very nice, and was simple to install and configure. Zenoss required a long training period to learn how to get all the metrics we needed.
I use zabbix to monitor our company's infrastructure (which is only 6 servers + all the networking stuff). I've had zabbix for over two years and it works great. I like the fact that it's all in one app and doesn't require installing tons of plugins. The interface doesn't win any design awards, but it is laid out surprisingly well in terms of functionality. I've had some intermittent hardware problems on our servers in the past and having lots of historical data in zabbix definitely helped a lot in straitening them out.
Some versions seemed to have stability issues and crashed once in a while, but monit took care of that.
I recommend putting zabbix on a separate box(and some decommissioned server hardware from 3-4 years ago will work pretty well) The application itself is not very heavy, but it does put a significant strain on the database(mysql in my case) - saving all the historical data doesn't come cheap.
I've used both Zenoss and Zabbix. The one complaint I have about Zabbix is that it always seems to crash on me. I had one installation at my old work where we doubled up on Zabbix and Zenoss (Kamil can answer more on this one) and also an installation at home and at Free Geek Vancouver where I do there sysadmin work. All three crash on a regular basis and the daemon would need to be restarted.
Zenoss I find is nice because it is quite stable and has a much much nicer UI, however it's very resource intensive.
With all that being said I would still go with Zabbix for a monitoring solution just because the way the SOP is for setting up new devices is more in tune with the way I think. The best thing to do is to set up all of them and see which one you like the best.
We've been using Zabbix for over 4 years now (running 1.6 now) and it still hasn't crashed even once (running on RHEL5). My only complaint in the past was the lack of documentation and friendly support (talking about the free community support). There is better documentation now I noticed.