Ping a Specific Port

Question

Michael Pobega

Asked: 2014-01-16 08:33:21 +0800 CST2014-01-16 08:33:21 +0800 CST 2014-01-16 08:33:21 +0800 CST

Nagios server best practices?

772

I run a medium-sized Nagios server. It monitors roughly 40 servers with 180 services currently and is only growing by the day.

I migrated from an old Nagios setup that was configured in a very esoteric fashion, forcing me to reconfigure everything from scratch.

Now that the server is running and works for most of what we need it for, I'm looking into making it a bit more scalable; currently each hosts is its own file in /etc/nagios/hosts/, and each host has all of its services in the same file. This is obviously not optimal, but neither is obfuscating all of my configuration into hundreds of different files.

So my question is this: to any experienced Nagios admins out there, what is the best way to make use of hostgroups/servicegroups without over-complicating the configuration?

5 Answers

Voted

asciiphil · Answer 1 · 2014-01-17T12:23:42+08:00

Hostgroups and templates.

Templates let you define classes for your hosts and services, e.g. "normal service", "critical service", "low-priority host". They also serve as a useful way to divide responsibilities if you've got multiple teams with different responsibilities, so you can have a "linux host" template and a "windows host" template, with each one defining the appropriate contact info.

You can use multiple templates on a single resource, so you can compose appropriately-orthogonal templates. For example, you can have

host foo {
    use windows-host,normal-priority-host
    ...
}

which would pull in the contact info (and escalations) for the Windows team and the polling rates and thresholds for a "normal" host.

Hostgroups let you group together all of the checks for a subset of your hosts. Have things like "baseline-linux-hosts" that check load, disk space, sshability, and whatever other things should be on every host you monitor. Add groups like "https-servers" with checks for HTTP connectivity, HTTPS connectivity, and SSL certificate expiration dates; "fileservers" with checks for NFS and SMB accessibility and maybe more aggressive disk checks; or "virtual-machines" with checks for whether the VM accessibility tools are running properly.

Put each host and hostgroup in its own file. That file should contain the host or hostgroup definition first, followed by the definitions of the services that apply to it.

If you use the cfg_dir directive in your nagios.cfg file, Nagios will search recursively through that directory. Make use of that. For a setting of cfg_dir=/etc/nagios/conf.d, you can have a directory tree like the following:

/etc/nagios/conf.d/
- commands.d/
  - http.cfg
  - nrpe.cfg
  - smtp.cfg
  - ssh.cfg
- hosts.d/
  - host1.cfg
  - host2.cfg
  - host3.cfg
- hostgroups.d/
  - hostgroup1.cfg
  - hostgroup2.cfg

I tend to make a directory for each resource type (commands, contactgroups, contacts, escalations, hostgroups, hosts, servicegroups, timeperiods) except for services, which get grouped in with the hosts or hostgroups that use them.

The precise structure can vary according to your organizational needs. At a past job, I used subdirectories under hosts.d for each different site. At my current job, most of the Nagios host definitions are managed by Puppet, so there's one directory for Puppet-managed hosts and a separate one for hand-managed hosts.

Note that the above also breaks out commands into multiple files, generally by protocol. Thus, the nrpe.cfg file would have the commands check_nrpe and check_nrpe_1arg, while http.cfg could have check_http, check_http_port, check_https, check_https_port, and check_https_cert.¹

I don't typically have a tremendous number of templates, so I usually just have a hosts.d/templates.cfg file and a services.d/templates.cfg file. If you use them more heavily, they can go into appropriately-named files in a templates.d directory.

¹ I like to also have a check_http_blindly command, which is basically check_http -H $HOSTADDRESS$ -I $HOSTADDRESS$ -e HTTP/1.; it returns OK even if it gets a 403 response code.

Keith · Answer 2 · 2014-01-17T09:42:08+08:00

Keith

2014-01-17T09:42:08+08:002014-01-17T09:42:08+08:00

Make extensive use of service and hostgroups, and templating. Create hostgroups, and assign services to the hostgroups. Use servicegroups for dependencies, escalations, and logical grouping in the web UI.

If you have groups for everything, adding a new host is just 3 or 4 lines: name, address, template(s), and (optionally) hostgroups. Everything can be templated.

Be sure to read the docs on inheritance, and also the time-saving tricks page. Multiple inheritance can get tricky, but when used correctly it's a huge time-saver.

6

philippe · Answer 3 · 2014-01-16T12:57:46+08:00

I was used to configure my nagios servers (before I switched to Icinga) this way, and there is no lack of performances until you reach more than 500 services at least with a 512Mb Memory / 1 CPU server. hostgroups and servicegroups can be treated completely separately, and I would recommand this approach since it allows having one file per server (services for this server defined in this file) and then, on file per hostgroup/servicegroups. This is only more understandable/clear.

If you run into scalability troubles, you may want to have a look at nagios-nrpe-server, which performs checks on client side and all your nagios server does is asking for results only; which spare the resource of the check. (Nagios launches check_nrpe, the client is requested, performs checks locally and replies back to nagios). Keeping in mind that all checks can't be treated this way (SNMP for instance).

To finish with, and even if it may seem out of scope regarding your question, I would suggest to switch to Icinga, which is ways more scalable, hold by a stronger community really caring on new feature implementations and user support. The configuration is the same (same configuration files, same syntax).

user3120146 · Answer 4 · 2014-01-16T12:58:56+08:00

user3120146

2014-01-16T12:58:56+08:002014-01-16T12:58:56+08:00

I am using this scheme:

hosts,
hostgroups,
remote services,
local services.

Each entity has its own file. Besides with the templates you can always make your config cleaner an more readable. For instance, you might have load average, disk space, memory on every host. So it's quite easy and handy to create a generic template and use it.

1

IvanAK · Answer 5 · 2014-01-17T12:52:39+08:00

You cannot complicate the configuration with making groups. As asciiphil say, you make a file or you can define the same groups in some of the existing files like (hosts.cfg or what ever),and you make this file or you say to nagios that this file is active ( this is if you create new fiel, if not it is already active), and this is in nagios.cfg file where you put the path of the newly created file. "cfg_file=/usr/local/nagios/etc/objects/NEW_FILE.cfg"

The other thing is just making groups depending on your infrastructure. If for example i have linux and windows server i will make two different groups one for linux and other for windows. It is the same with the services. Depending on how you would like to configure and see when you monitoring on the monitor, how would like you to see them as groups.

And for the file or the part how to make a group it is simple.

    define hostgroup{
    hostgroup_name novell-servers
    alias Novell Servers
    members netware1,netware2,netware3,netware4
    }

And on the host configuration / or if you using template or if you have define already a host template or service and using use, you can tell automatically to all hosts/ windows or linux hosts to be a members of a defined hostgroup that you created.

Nagios server best practices?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?