Ping a Specific Port

Question

dimisjim

Asked: 2019-08-22 01:38:22 +0800 CST2019-08-22 01:38:22 +0800 CST 2019-08-22 01:38:22 +0800 CST

Occasional Timeouts with Zabbix agents behind an AWS Network Load Balancer

772

My architecture in AWS is as follows:

There are 2 identical zabbix agents (based on zabbix/zabbix-agent:centos-4.0.11) each one running on a different EC2 instance. Zabbix server runs on a third instance (also dockerized with dockbix using 4.0 version as well), all three of them inside the same VPC.

The idea is to have a Network Load Balancer that listens to the port that both agents run (10050) and have those 2 aforementioned instances being registered on the target group. Then, the DNS of this NLB would be provided to the Zabbix host configuration as the interface. The goal is to have multiple zabbix hosts targeting the same NLB and their requests being routed according to traffic load to the different agent. There is a zabbix agent item in each host that invokes a UserParameter (a python script) that is defined in each one of the two zabbix agent conf file.

My problem is as follows: zabbix_get (and the equivalent call made automatically according to the interval set in the host conf) timeouts occasionally. One time I get a successful response

{"response":"success","info":"processed: 4; failed: 0; total: 4; seconds spent: 0.000106"}

(python script used is pretty fast, it just takes 1 second) and other times I get a response such as:

zabbix_get [4515]: Timeout while executing operation.

This happens one after another. So one succeeded and the next timeouts, then the next succeeds and so on.

I have tried to test the connection with telnet, and it works all the time. I have even tried to use a simple tcp echo container, which also worked fine all the time.

Any ideas on what might be wrong would be greatly appreciated :)

EDIT: Just wanted to note that this behavior occurs not just with my custom UserParameter defined script, but also with built-in agent calls such as agent.version or agent.ping or net.tcp.port[<serverIp>, 10051] etc

EDIT2: With tcpdump src <serverIp> run inside the agent instances it seems there is similar traffic happening with a successful and a timed out response

1 Answers

Voted

dimisjim · Answer 1 · 2019-08-22T04:35:51+08:00

Best Answer

dimisjim

2019-08-22T04:35:51+08:002019-08-22T04:35:51+08:00

So apparently I needed to enable cross-AZ load balancing for my internal nlb. That's why it was timing out every second request, as all my instances were in one availability region.

0

Occasional Timeouts with Zabbix agents behind an AWS Network Load Balancer

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?