I run a medium-sized Nagios server. It monitors roughly 40 servers with 180 services currently and is only growing by the day.
I migrated from an old Nagios setup that was configured in a very esoteric fashion, forcing me to reconfigure everything from scratch.
Now that the server is running and works for most of what we need it for, I'm looking into making it a bit more scalable; currently each hosts is its own file in /etc/nagios/hosts/
, and each host has all of its services in the same file. This is obviously not optimal, but neither is obfuscating all of my configuration into hundreds of different files.
So my question is this: to any experienced Nagios admins out there, what is the best way to make use of hostgroups/servicegroups without over-complicating the configuration?
Hostgroups and templates.
Templates let you define classes for your hosts and services, e.g. "normal service", "critical service", "low-priority host". They also serve as a useful way to divide responsibilities if you've got multiple teams with different responsibilities, so you can have a "linux host" template and a "windows host" template, with each one defining the appropriate contact info.
You can use multiple templates on a single resource, so you can compose appropriately-orthogonal templates. For example, you can have
which would pull in the contact info (and escalations) for the Windows team and the polling rates and thresholds for a "normal" host.
Hostgroups let you group together all of the checks for a subset of your hosts. Have things like "baseline-linux-hosts" that check load, disk space,
ssh
ability, and whatever other things should be on every host you monitor. Add groups like "https-servers" with checks for HTTP connectivity, HTTPS connectivity, and SSL certificate expiration dates; "fileservers" with checks for NFS and SMB accessibility and maybe more aggressive disk checks; or "virtual-machines" with checks for whether the VM accessibility tools are running properly.Put each host and hostgroup in its own file. That file should contain the host or hostgroup definition first, followed by the definitions of the services that apply to it.
If you use the
cfg_dir
directive in yournagios.cfg
file, Nagios will search recursively through that directory. Make use of that. For a setting ofcfg_dir=/etc/nagios/conf.d
, you can have a directory tree like the following:I tend to make a directory for each resource type (commands, contactgroups, contacts, escalations, hostgroups, hosts, servicegroups, timeperiods) except for services, which get grouped in with the hosts or hostgroups that use them.
The precise structure can vary according to your organizational needs. At a past job, I used subdirectories under
hosts.d
for each different site. At my current job, most of the Nagios host definitions are managed by Puppet, so there's one directory for Puppet-managed hosts and a separate one for hand-managed hosts.Note that the above also breaks out commands into multiple files, generally by protocol. Thus, the
nrpe.cfg
file would have the commandscheck_nrpe
andcheck_nrpe_1arg
, whilehttp.cfg
could havecheck_http
,check_http_port
,check_https
,check_https_port
, andcheck_https_cert
.1I don't typically have a tremendous number of templates, so I usually just have a
hosts.d/templates.cfg
file and aservices.d/templates.cfg
file. If you use them more heavily, they can go into appropriately-named files in atemplates.d
directory.1 I like to also have a
check_http_blindly
command, which is basicallycheck_http -H $HOSTADDRESS$ -I $HOSTADDRESS$ -e HTTP/1.
; it returns OK even if it gets a 403 response code.Make extensive use of service and hostgroups, and templating. Create hostgroups, and assign services to the hostgroups. Use servicegroups for dependencies, escalations, and logical grouping in the web UI.
If you have groups for everything, adding a new host is just 3 or 4 lines: name, address, template(s), and (optionally) hostgroups. Everything can be templated.
Be sure to read the docs on inheritance, and also the time-saving tricks page. Multiple inheritance can get tricky, but when used correctly it's a huge time-saver.
I was used to configure my nagios servers (before I switched to Icinga) this way, and there is no lack of performances until you reach more than 500 services at least with a 512Mb Memory / 1 CPU server. hostgroups and servicegroups can be treated completely separately, and I would recommand this approach since it allows having one file per server (services for this server defined in this file) and then, on file per hostgroup/servicegroups. This is only more understandable/clear.
If you run into scalability troubles, you may want to have a look at nagios-nrpe-server, which performs checks on client side and all your nagios server does is asking for results only; which spare the resource of the check. (Nagios launches check_nrpe, the client is requested, performs checks locally and replies back to nagios). Keeping in mind that all checks can't be treated this way (SNMP for instance).
To finish with, and even if it may seem out of scope regarding your question, I would suggest to switch to Icinga, which is ways more scalable, hold by a stronger community really caring on new feature implementations and user support. The configuration is the same (same configuration files, same syntax).
I am using this scheme:
Each entity has its own file. Besides with the templates you can always make your config cleaner an more readable. For instance, you might have load average, disk space, memory on every host. So it's quite easy and handy to create a generic template and use it.
You cannot complicate the configuration with making groups. As asciiphil say, you make a file or you can define the same groups in some of the existing files like (hosts.cfg or what ever),and you make this file or you say to nagios that this file is active ( this is if you create new fiel, if not it is already active), and this is in nagios.cfg file where you put the path of the newly created file. "cfg_file=/usr/local/nagios/etc/objects/NEW_FILE.cfg"
The other thing is just making groups depending on your infrastructure. If for example i have linux and windows server i will make two different groups one for linux and other for windows. It is the same with the services. Depending on how you would like to configure and see when you monitoring on the monitor, how would like you to see them as groups.
And for the file or the part how to make a group it is simple.
And on the host configuration / or if you using template or if you have define already a host template or service and using use, you can tell automatically to all hosts/ windows or linux hosts to be a members of a defined hostgroup that you created.