I'm using an Ubuntu 14.04.4 server, running sshd OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.8, OpenSSL 1.0.1f 6 Jan 2014
.
The server is also running Intel's DPDK framework, to develop network software; part of doing that involves bringing down interfaces at the Linux level to bind them to DPDK. However, the network interface used to ssh from the outside is never brought up or down, only others are touched.
Most of the time ssh works fine, but once every few days it stops working; ssh sessions are interrupted, and trying to reconnect by running ssh -v
halts at the message Local version string SSH-2.0 ...
(i.e. the client can establish a connection, it's the SSH part that fails).
Directly connecting to the machine doesn't work either, the command-line interface doesn't show up, just a blank screen.
TCP connections can be established, and the machine still answers pings.
This is pretty annoying, since the server then needs to be rebooted.
I enabled debug3 logs on the server, and the log in /var/log/auth.log
when a client tries (and fails) to connect look like this:
sshd[1688]: debug3: fd 5 is not O_NONBLOCK
sshd[1688]: debug1: Forked child 39149.
sshd[1688]: debug3: send_rexec_state: entering fd = 13 config len 724
sshd[1688]: debug3: ssh_msg_send: type 0
sshd[1688]: debug3: send_rexec_state: done
sshd[39149]: debug3: oom_adjust_restore
sshd[39149]: Set /proc/self/oom_score_adj to 0
sshd[39149]: debug1: rexec start in 5 out 5 newsock 5 pipe 12 sock 13
This log doesn't seem any different from the one for successful connections, except that it stops there, whereas successful connections continue (the next line is then debug1: inetd sockets after dupping: ...
).
The problem seems to arise right when an interface is bound or unbound from DPDK.
What could be causing this? Are there workarounds?
I had issues with ssh timeouts, I found a workaround by using: