gertvdijk

Asked: 2019-02-12 09:22:31 +0800 CST2019-02-12 09:22:31 +0800 CST 2019-02-12 09:22:31 +0800 CST

Kafka broker listing empty nodes list for a while

For integration testing purposes I'm creating a very simple single-node Kafka deployment:

1x Zookeeper
1x Kafka
1x Kafka client (e.g. AdminClient, creating topics)

(All cleanly deployed in fresh Docker containers.)

I'm seeing intermittent failures to connect from the client to Kafka, but whenever I try to connect again it will just connect fine. I've enabled debug logging on the Kafka client and this is what I'm seeing on the first connection:

Client connects to its configured bootstrap server just fine:

[main] DEBUG o.a.k.c.a.i.AdminMetadataManager - 
       [AdminClient clientId=adminclient-1] Setting bootstrap cluster metadata
       Cluster(id = null, nodes = [kafka:9092 (id: -1 rack: null)], 
       partitions = [], controller = null).
[...]
[kafka-admin-client-thread | adminclient-1] DEBUG 
       o.apache.kafka.clients.NetworkClient - [AdminClient clientId=adminclient-1]
       Completed connection to node -1. Fetching API versions.

Hundreds of lines showing that requesting the nodes in the cluster yields no entries:

[kafka-admin-client-thread | adminclient-1] DEBUG
       o.a.k.c.a.i.AdminMetadataManager - [AdminClient clientId=adminclient-1]
       Updating cluster metadata to Cluster(id = q7XgghZqQUW_o5W2-Nn5Qw,
       nodes = [], partitions = [], controller = null)

Note the nodes = [] part here in particular.

This goes on for at least a few seconds, sometimes even 30 seconds! I can't seem to understand why the Kafka server can't list itself as a node.

In an unlucky case, the timeout is reached and I see the infamous exception thrown:

 Exception in thread "main" java.util.concurrent.ExecutionException:
     org.apache.kafka.common.errors.TimeoutException: Timed out waiting
     for a node assignment.

In a lucky case, after a while, it lists itself and it can connect just fine:

[kafka-admin-client-thread | adminclient-1] DEBUG 
       o.a.k.c.a.i.AdminMetadataManager - [AdminClient clientId=adminclient-1]
       Updating cluster metadata to Cluster(id = q7XgghZqQUW_o5W2-Nn5Qw,
       nodes = [kafka:9092 (id: 0 rack: null)], partitions = [],
       controller = kafka:9092 (id: 0 rack: null))                                     
[kafka-admin-client-thread | adminclient-1]
       DEBUG o.apache.kafka.clients.NetworkClient -
       [AdminClient clientId=adminclient-1] Initiating connection to node
       kafka:9092 (id: 0 rack: null) using address kafka/xxxxx

Note the presence of nodes = [kafka:9092 (id: 0 rack: null)] here.

My problem is the huge variance in time between 1 and 2. I fail to understand why a client is not given a node rightaway (itself in this case!). It not only slows down my automated testing, it also hurts scrolling through logs full of failures with timeouts, as it regularly does not connect within the default timeout window. Any successive attempt after waiting long enough will immediately connect just fine without all of the oddness, though...

Kafka client + server versions: 2.1.0, running on Kubernetes. Kafka server logs don't indicate anything special, all silent already seconds before the attempt by the client is made, until after the connection is established (if at all).

What am I missing here? Am I looking at random pressure back-offs or do I have to tell Zookeeper/Kafka that it's fine to run on it's own, without having to wait for other nodes?

I have read this infamous blog article about listener configurations, but that's not relevant in my case: my Kafka does not advertise the node(s) for a while, until it does, despite having a list of advertised hosts. And after that, it connects just fine - those having issues with advertised listeners cannot connect at all.

Kafka broker listing empty nodes list for a while

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?

Kafka broker listing empty nodes list *for a while*

0 Answers

Kafka broker listing empty nodes list for a while