I have an old server (P4 based) on which nagios (and all the other tools to monitor) is running.
In the last few weeks we are seeing a strange behavior.
In the /var/spool/pnp4nagios (where temporary files are stored before getting processed by pnp4nagios daemon) we have many files like perfdata.1274949941-PID-18839 and we get an error in npcd.log:
[05-27-2010 11:17:46] NPCD: ThreadCounter 0/15 File is perfdata.1274951306-PID-27849
[05-27-2010 11:17:46] NPCD: File 'perfdata.1274951306-PID-27849' is an already in process PNP file. Leaving it untouched.
Sometimes some graph are not drawn.
The server is pretty loaded (around 5-6 normally) and i suspect that npcd goes in timeout and leave those files behind.
What could I do (apart from change the server)?
Few infos about the system:
centos 5.5
nagios 3.2.1
pnp4nagios 0.6 (from sources)
Thanks
i am not sure if this is what you are looking for, but you can take a look at increasing the timeout in the process_perfdata.cfg. found this info on the nagios-portal site
This error went away when i move from the classic pnp4nagios configuration (bulk) to a more efficient npcd mode and nagios module (npcdmod).
I was able to speed up a bit the server using google perftools in Nagios and pnp4nagios. At least now we're not losing any perfdata.
Probably the best way remains changing the server.