I've a syslog-ng instance on 2.0.9, which is old, but... this is enterprise IT and upgrading the versions is... fun... running on Solaris 10. I have this strange problem where some clients stop being able to stay connected to the server on TCP.
When a client is working I can start syslog-ng on the client, it connects and sends data, and STAYS connected...
12:20:13.200547 IP (tos 0x0, ttl 64, id 13064, offset 0, flags [DF], proto: TCP (6), length: 60) 10.37.128.185.35765 > 10.37.141.31.shell: S, cksum 0xade4 (correct), 1572869826:1572869826(0) win 5840 <mss 1460,sackOK,timestamp 958735818 0,nop,wscale 7>
12:20:13.202279 IP (tos 0x0, ttl 63, id 27707, offset 0, flags [DF], proto: TCP (6), length: 64) 10.37.141.31.shell > 10.37.128.185.35765: S, cksum 0x434d (correct), 3180100791:3180100791(0) ack 1572869827 win 32942 <nop,nop,timestamp 2210148518 958735818,mss 1460,nop,wscale 2,nop,nop,sackOK>
12:20:13.202327 IP (tos 0x0, ttl 64, id 13065, offset 0, flags [DF], proto: TCP (6), length: 52) 10.37.128.185.35765 > 10.37.141.31.shell: ., cksum 0x0499 (correct), ack 1 win 46 <nop,nop,timestamp 958735820 2210148518>
12:20:13.202823 IP (tos 0x0, ttl 64, id 13066, offset 0, flags [DF], proto: TCP (6), length: 140) 10.37.128.185.35765 > 10.37.141.31.shell: P, cksum 0x179d (correct), 1:89(88) ack 1 win 46 <nop,nop,timestamp 958735820 2210148518>
12:20:13.204061 IP (tos 0x0, ttl 63, id 27708, offset 0, flags [DF], proto: TCP (6), length: 52) 10.37.141.31.shell > 10.37.128.185.35765: ., cksum 0x83d6 (correct), ack 89 win 32920 <nop,nop,timestamp 2210148518 958735820>
12:20:13.205558 IP (tos 0x0, ttl 64, id 13067, offset 0, flags [DF], proto: TCP (6), length: 124) 10.37.128.185.35765 > 10.37.141.31.shell: P, cksum 0xc071 (correct), 89:161(72) ack 1 win 46 <nop,nop,timestamp 958735823 2210148518>
12:20:13.206247 IP (tos 0x0, ttl 63, id 27709, offset 0, flags [DF], proto: TCP (6), length: 52) 10.37.141.31.shell > 10.37.128.185.35765: ., cksum 0x839d (correct), ack 161 win 32902 <nop,nop,timestamp 2210148518 958735823>
When a client is failing to stay connected, I see the server instantly disconnecting with a FIN...
12:20:02.441949 IP (tos 0x10, ttl 64, id 8231, offset 0, flags [DF], proto: TCP (6), length: 60) 10.37.128.185.46121 > 10.37.141.31.shell: S, cksum 0xeb7e (correct), 1553390564:1553390564(0) win 5840 <mss 1460,sackOK,timestamp 958725059 0,nop,wscale 7>
12:20:02.443817 IP (tos 0x0, ttl 63, id 27678, offset 0, flags [DF], proto: TCP (6), length: 64) 10.37.141.31.shell > 10.37.128.185.46121: S, cksum 0xe379 (correct), 3007391908:3007391908(0) ack 1553390565 win 32942 <nop,nop,timestamp 2210147442 958725059,mss 1460,nop,wscale 2,nop,nop,sackOK>
12:20:02.443840 IP (tos 0x10, ttl 64, id 8232, offset 0, flags [DF], proto: TCP (6), length: 52) 10.37.128.185.46121 > 10.37.141.31.shell: ., cksum 0xa4c5 (correct), ack 1 win 46 <nop,nop,timestamp 958725061 2210147442>
12:20:02.445689 IP (tos 0x0, ttl 63, id 27679, offset 0, flags [DF], proto: TCP (6), length: 52) 10.37.141.31.shell > 10.37.128.185.46121: F, cksum 0x2444 (correct), 1:1(0) ack 1 win 32942 <nop,nop,timestamp 2210147442 958725061>
12:20:02.445737 IP (tos 0x10, ttl 64, id 8233, offset 0, flags [DF], proto: TCP (6), length: 52) 10.37.128.185.46121 > 10.37.141.31.shell: F, cksum 0xa4c1 (correct), 1:1(0) ack 2 win 46 <nop,nop,timestamp 958725063 2210147442>
12:20:02.447244 IP (tos 0x0, ttl 63, id 27680, offset 0, flags [DF], proto: TCP (6), length: 52) 10.37.141.31.shell > 10.37.128.185.46121: ., cksum 0x2441 (correct), ack 2 win 32942 <nop,nop,timestamp 2210147442 958725063>
Now this issue was originally seen to be on different clients, but in this case it's the ery same box. I generated the successful messages by restarting the client syslog-ng service and the unsuccesful ones from a telnet to the server port.
I've also started a new instance of the syslog-ng server on a different port, and on localhost a telnet to 514 connects and disconnects...
$ telnet localhost 514
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Connection to localhost closed by foreign host
but on the other port, on a new process we get a connection left open nicely...
$ telnet localhost 1140
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
^]
telnet
quit
Connection to localhost closed.
So something in syslog-ng or in Solaris 10 appears to take a disliking to SOME of these connections after an undefined period of time of the process running. This is linked against tcpwrappers, with "syslog-ng: ALL" defined in hosts.allow, and the behaviour i'm seeing looks similar to that which would happen if tcpwrappers was preventing the connection I think, but I don't think that that IS the part of the system at fault, as it seems to generic.
the "localhost to new process" behaviour looks to be the same as the remote connections, it doesn't look like a firewall getting in the way doing strange things or anything. And I'm lost.
Any guesses, pointers appreciated!
Check the
max-connections
setting in syslog.conf - it defaults to 10, which is probably too low for you.