I use a zabbix 2.2.7 to monitor our servers. I mostly use zabbix-agent with active checks as many of the monitored machines are behind NAT.
One of the checks on all linux servers is net.tcp.service[smtp]
(used as active agent check), and it works on all servers but one. Other net.tcp.service
active checks work on the server just fine.
The monitored server runs exim4 (stock debian buster), and can receive and send e-mails just fine.
I enabled debugging for the given zabbix-agent, and this is the most informative line I got from it:
28545:20200204:103404.692 for key [net.tcp.service[smtp]] received value [0]
Which is not informative at all. :(
One anomaly I can detect is that if I telnet into exim, it gives the greeting line really slowly (about 15s).
My questions are:
- How can I debug what the zabbix-agent does during the
net.tcp.service[smtp]
check? - How can I change the behaviour of that check?
EDIT: The problem really was the slowness of the server, which was caused by a bad primary DNS (it took that many seconds to start to use the secondary). However, my questions still stand, as it would have been much easier to debug if I could have just got the info that the check timed out in 5 seconds.
I had the same issue on zabbix 4.4 with exim. I changed smtp check item key
from
net.tcp.service[smtp]
to
net.tcp.service[smtp,127.0.0.1]
in Template App SMTP Service to connect127.0.0.1
instead of zabbix-agent public IP.