I have a zabbix server and an agent, on two different computers. The agent runs in active mode, e.g. I have this in the config file:
StartAgents=0
ServerActive=my.zabbix.server.com
Hostname=my.zabbix.agent.com
The zabbix server can be reached from the machine with the agent, e.g.:
telnet my.zabbix.server.com 10051
Trying 111.111.111.111...
Connected to my.zabbix.server.com.
Escape character is '^]'.
Connection closed by foreign host.
Moreover, the host auto registration was turned on on the server, and the agent has successfuly registered the host when I first started it. So the connection must be alive. This is what I see in the agent's log when I start it:
83074:20171128:082440.324 Starting Zabbix Agent [my.zabbix.agent.com]. Zabbix 3.4.1 (revision 71734).
83074:20171128:082440.324 **** Enabled features ****
83074:20171128:082440.324 IPv6 support: YES
83074:20171128:082440.324 TLS support: YES
83074:20171128:082440.324 **************************
83074:20171128:082440.324 using configuration file: /usr/local/etc/zabbix34/zabbix_agentd.conf
83074:20171128:082440.324 agent #0 started [main process]
83076:20171128:082440.325 agent #1 started [collector]
83077:20171128:082440.326 agent #2 started [active checks #1]
In other words, the agent could connect to the server, it even recognized its version. Nothing else happens in the agent log.
On the server, it still says that the host is unreachable!
What could be the problem?
UPDATE: on the front end, I see this message:
I'm not sure why it wants to connect to 10050? It is used for passive agents. My agent should be active.
UPDATE2: If I delete the host from the zabbix server, and restart the agent, then the following happens:
The host is auto-registered on the server again. The agent log:
14551:20171128:193954.483 Starting Zabbix Agent [my.zabbix.server.com]. Zabbix 3.4.1 (revision 71734).
14551:20171128:193954.484 **** Enabled features ****
14551:20171128:193954.484 IPv6 support: YES
14551:20171128:193954.484 TLS support: YES
14551:20171128:193954.484 **************************
14551:20171128:193954.484 using configuration file: /usr/local/etc/zabbix34/zabbix_agentd.conf
14551:20171128:193954.484 agent #0 started [main process]
14553:20171128:193954.485 agent #1 started [collector]
14554:20171128:193954.485 agent #2 started [active checks #1]
14554:20171128:193954.614 no active checks on server [my.zabbix.server.com:10051]: host [my.zabbix.agent.com] not found
where:
- my.zabbix.server.com is the server's FQDN
- my.zabbix.agent.com is the agent's FQDN, and also the HostName parameter in the agent's config.
So it seems, that the agent registers the host successfully, but for some reason, the server tries to get the information from the agent in passive mode. Despite the fact, that the agent was configured in active mode.
UPDATE 3: Although the agents are sending data, the host list still shows a problem:
Availability/ZBX has a red flag, and a message saying that "Get value from agent failed: cannot connect to [[ip_address_here]:1050]: [4] interrupted system call". I have checked every single item and every discovery for these hosts, and all of them have type="Zabbix Agent Active". So I don't understand why the server is trying to connect to them in passive mode??? This does not cause a real "problem" (e.g. something that generates an action and sends out notifications from the zabbix server), but it is very disturbing to see red flags on the screen.
Until this problem is fully solved, I won't even accept my own answer.
UPDATE 4: after changing all item types, discovery types, and the types of the item prototypes of the low level discovery rules of all templates that are connected to my hosts, and all of the templates linked form there, the ZBX red flags finally disappeared. I believe that I'm an experienced software user, but it was quite difficult to understand what is going on, and change all of the parameters to make it work.
For active checks to work, agent hostname must match the host hostname in the Zabbix server. "Agent hostname" is not the system hostname necessarily - it depends on the configuration parameters "Hostname" and "HostnameItem". Host hostname in Zabbix is not the DNS or the IP address - it's the contents of the "Host" field in the host properties.
When agent starts, it prints out the hostname it is sending to the server. In your example:
Starting Zabbix Agent [my.zabbix.server.com]
- that is, agent identifies itself to the server asmy.zabbix.server.com
. Make this value match the proper hostname (note that it is case sensitive) and active checks will start working. Note that the other host could have incorrect values if two or more agents were sending data, identifying as it.Note that the version printed in the agent log is the agent version, not the server version - agent cannot determine server version.
Short answer: the problem was that all Items had Type="Zabbix Agent" instead of Type="Zabbix Agent Active".
Long answer: a host will either be an active agent or a passive agent. (Well maybe if you try to start two agents on the same machine then you could do both on a single host, but it seems pointless.) Logical, right?
So in reality, being active or passive is a property of the host, not the item. Despite this fact, the mode of data collection (e.g. passive or active) is bound to the item, not the host. I see this as a design flaw in zabbix. This is very counter intuitive. They ONLY way I could overcome this problem is this:
You need to clone almost all templates in the system. You cannot have a single trigger for a single item that is independent of the item type, because there is no such thing. If you want to change something in an environment where there are passive and active agents mixed, then you have to do everything twice.
Finally, when you add a host, you need to assign the active or the passive template version, depending on which mode you want to use for that particular host.
All of this because the active/passive mode cannot be a property of the host. It must be a property of the item. It is worse than that: it is also the property of discovery rules, and the prototypes of discover rules (and prototype items cannot be mass updated, you have to do this one by one,by hand). Seriously, an item like "cpu.load" is absolutely unrelated to how the data was collected. I mean come on, you could change your mind and switch from active mode to passive, or back. This should not force you to delete all the old items, create new ones. But if you decide to do so, you will loose all history, because you are not just changing the items, you are replacing them. This is really really annoying!
I hope that they will fix this in the upcoming 4.0 version.
change the value of /etc/zabbix/zabbix_agentd.conf and put the ip adress of the zabbix