UPDATE 2: I've answered this via my new question at the link below. The root cause is behaviour by telegraf where by default it disconnects the TCP connection 5 seconds after the last received message. This may be by design, however I have an issue with their documentation which made this difficult for me to spot as a potential fix.
Perhaps this question can now be deleted?
UPDATE 1: rather than edit this question extensively, making the current answers make no sense, I have posed a new question based on new information I received as a result of posting this question.
syslog-ng / telegraf : EOF occurred when idle - incompatible?
I'm using syslog-ng Open-Source Edition (OSE) v3.31.2 in a docker-compose stack.
I have syslog messages arriving over the network from various hosts via UDP (which I'm constrained to because my clients use Boost::Log and this does not support syslog over TCP, only UDP), and I have syslog-ng set to forward these to another service downstream. This happens to be telegraf utilising a inputs.syslog
module, but I'm not sure that matters yet.
My config looks like this:
@version: 3.29
@include "scl.conf"
options {
flush-lines(1);
};
source s_network {
udp(ip(0.0.0.0) port(514));
};
destination d_file {
file("/var/log/messages");
};
destination d_telegraf {
syslog("telegraf" port(6514) transport(tcp));
};
log {
source(s_network);
destination(d_telegraf);
destination(d_file);
};
I have explicitly set the global flush-lines
value to 1. I think this is the default, but I want to be sure. I want log messages to be forwarded as soon as they are received.
Most of the time this works - individual "lines" of logs arrive into syslog-ng via UDP 514, and are immediately written to the file /var/log/messages
, and in almost all cases they are also immediately forwarded to telegraf on TCP port 6514.
The problem I'm seeing is that quite often syslog-ng is holding back many lines of incoming logs for up to around 30-60 seconds, then delivering them to telegraf in a big chunk. There doesn't seem to be much pattern to this, but it happens a lot. The odd thing is that the /var/log/messages
file has the missing log entries written immediately, it's just the network delivery that is delayed. I had thought that flush-lines(1)
would avoid this buffering, but it doesn't seem to.
I've used Wireshark to determine where the delay is, and it's in the output of packets from syslog-ng, between syslog-ng and telegraf TCP port 6514.
I did wonder if this might be a TCP Nagle's Algorithm thing - if so, is there a way to turn on the TCP_NO_DELAY socket option for syslog-ng's syslog destination driver?
Ultimately what I'm looking for is a fast, low-latency syslog service that can aggregate and relay logs as quickly as possible for real-time review downstream.
EDIT: I tried switching over to UDP transport between syslog-ng and telemetry and this seems to be much more responsive and the long, occasional delays have disappeared. However this will make it difficult to secure the connection in future.
What you experience is not normal. The above configuration should forward logs to
d_telegraf
andd_file
at the same time, as soon as possible.I believe you are having connection issues, that must be the reason for the 60-second delay, which is the default value of the reconnection timer.
You can lower this value using the
time-reopen()
global option, for example:You can also start syslog-ng in the foreground (in debug mode) to investigate the connection issues:
Try flush-lines(0) by just deleting that line all together.
How does syslog-ng handles flush_lines(0)?
https://github.com/syslog-ng/syslog-ng/issues/1411