Ping a Specific Port

Question

Marcus Downing

Asked: 2014-04-26 01:36:39 +0800 CST2014-04-26 01:36:39 +0800 CST 2014-04-26 01:36:39 +0800 CST

Delaying a Nagios/Icinga check

772

When monitoring the healthy of a server, some faults or warnings are immediately urgent but others only matter if they persist. I'm thinking of things like:

Some software needs to be updated
Time offset differs from NTP

If unaddressed these could become real problems, but there are already background services in place to take care of them - unattended upgrades, an NTP client service etc. There's always a short delay between the problem arising and these background processes kicking in to address them, and our monitor is sending out a series of emails in that gap - then again a minute later when the issue is fixed. I generally wake up to a large pile of "PROBLEM" emails, each with a corresponding "RESOLUTION" emails sent a minute later. The danger is that in dismissing a hundred irrelevant warnings, I could miss the one that's real.

So is there any way of instructing Icinga or Nagios to only report an issue if it's continued for more than a certain time, say 5 minutes?

2 Answers

Voted

MadHatter · Answer 1 · 2014-04-26T01:43:10+08:00

SvW is not wrong in what (s)he writes, but you should also investigate the variable max_check_attempts, which defines how many checks a service has to fail before going HARD error and notifying.

For some of my hair-trigger services, I have

max_check_attempts              2
check_interval                  2
retry_interval                  1

which means that NAGIOS will check more often than usual, and once it notices something's down, it'll wait 1 minute, check once more, then notify. For other services, where I don't care until it's been down a while, I have

max_check_attempts              12
check_interval                  5
retry_interval                  5

which means that once NAGIOS notices something's down, it'll carry on checking every 5 minutes as usual, and not tell me until it's been down for an hour.

It is definitely worth tuning your NAGIOS until it tells you about the things you care about, at the time you care about them, and nothing else; a monitoring system that emits a cloud of false positives (ie, sends you loads of notifications you don't really care about) is nearly as useless as one that has false negatives (ie, fails to notice a real problem).

Sven · Answer 2 · 2014-04-26T01:41:31+08:00

Sven

2014-04-26T01:41:31+08:002014-04-26T01:41:31+08:00

You can define detailed configurations to tell Nagios every detail about the check for a service.

Look up the check_interval and retry_interval config options, and while you are at it, learn about time periods in general.

3

Delaying a Nagios/Icinga check

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?