We use Nagios to monitor quite a few (~130) servers. We monitor CPU, Disk, RAM and a few other things on each server. I've always used SSH to run the remote commands, purely because it requires little to no additional config on the remote server, just install nagios-plugins, create the nagios user and add the SSH key, all of which I've automated into a shell script. I've never actually considered the performance implications of using SSH over NRPE.
I'm not too bothered about the load hit on the Nagios server (It's probably over-speced for what it does, it's never been over 10% CPU), but we run each remote check every 30 seconds and each server has 5 different checks performed. I assume SSH requires more resources for each check but is there a huge difference? (I.E. enough of a difference to warrant the switch to NRPE).
If it's any help, we monitor a mix of physical servers (Normally with 8, 12 or 16 physical cores) and Amazon EC2 medium/large instances.
NRPE is a nagios plugins, it's easy to install and it'll manage the check ask in the probe configuration file. There is only one bad thing about nrpe, you need to install it on every server you want to monitor, on linux os it's really simple, just yum/apt-get install nrpe, but on windows server you need to install it via .exe, and sometime you'll need to reboot your server.
I think SSH is not the most optimized way to use nagios. NRPE might be much more efficient.
Here on the nagios documentation there is this sentence :
"Using SSH is more secure than the NRPE addon, but it also imposes a larger (CPU) overhead on both the monitoring and remote machines. This can become an issue when you start monitoring hundreds or thousands of machines. Many Nagios admins opt for using using the NRPE addon because of the lower load it imposes."
there is the documentation. I'ts a .pdf
As for me, I use snmp protocol, which is simple to use, and don't need any third party installed on the servers.
I've always believed the administration advantage of SSH (I use push_check) outweighs any additional load. Modern CPUs are so fast that the cost of encrypting a handful of bytes is pretty minimal, so it comes down to running two processes (SSH and the check script) vs one (check script fired off by NRPE).
For check scripts written in an interpreted language, I would expect the overhead of firing up the interpreter (Perl, Python, Bash) to exceed the CPU cost of starting an SSH session. Given modern CPUs, your machines are more likely to be disk or memory limited rather than CPU limited.
Provided your Nagios machine is coping -- it has to set up 20 SSH connections every second -- I would err on the side of convenience.
Not really an answer to your question, more of an argument that life is too short to worry about it :)
Besides the suggestions given in other answers, have you considering enabling
ControlMaster
in nagios' .ssh/config file to take full advantage of ssh multiplexing?In other words your SSH connection would 'stay on' so the overhead of establishing it is minimal as this would happen only once. This would still guarantee privacy with encryption, and protect you from leaving open TCP ports on the servers (albeit firewalled). Plus, you can limit what a user does via ssh by limiting the commands it can execute
I've had nothing but issues trying to compile NRPE on various OS's. SSH has worked smoothly and efficiently and is much easier to script out.