I'm getting some bizarre behavior with Ubuntu Server 10.04 64bit on two of our new servers (both fresh installs). I have ubuntu server (same version) deployed on 4-5 other servers without this issue.
Initially I cannot ssh into a fresh server install until I manually set the address that the ssh server is listening on in /etc/ssh/sshd_config. Once I've connected, I seem to be kicked out at random intervals with the following error:
Write failed: Broken pipe
Using "ssh -vv" doesn't show any other information. When I'm kicked out in this manner, I cannot reconnect for another seemingly random period of time. Sometimes a few seconds, others a few minutes. If I run "netstat -nap|grep :22", I can see that my connection still exists after the write failed error. I can't seem to re-connect until that connection drops.
After one of these errors, if I hop onto the server from the console, ssh into another machine, and then attempt to ssh back into the server, everything works fine.
Using "-o TCPKeepAlive=yes" client side doesn't seem to effect anything. I've disabled both iptables and ufw on the server. AppArmor is not showing any enforced profiles and SELinux isn't installed.
My logs aren't reporting any errors and I don't have any custom configs. This is a box-stock install. Note that when I try to get back in after the broken pipe error, this is the error I get:
ssh: connect to host 172.22.50.92 port 22: Connection refused
And nmap no longer shows port 22 as being open, though netstat on the server says it's still listening on port 22.
EDIT - I'm not sure if it means anything, but I've installed KVM on these hosts and I can ssh into the guests (ubuntu server 64bit as well) without any issue.
UPDATE - I've tried purging openssh and re-installing with apt. I've also purged and installed openssh from source with no luck. traceroutes and pings overnight show no packet loss whatsoever.
YET ANOTHER UPDATE - Dell seems to think that we've got a bad motherboard in the server. Having that replaced to see if it resolves the issue.
Use mtr to check the network. Try a command like
mtr -i 15 remotehost
. Leave this running in a window, or use screen so you can detach. It should catch any problems with the network. Packet loss is typically 0% on most of my systems.EDIT: What does the output of
arp -n
show for your IP address before and after ssh drops. You may want to try this on another server on the same subnet. There should be only one HW address for the IP address and it should not change. If it does you have an IP address conflict.This post resolved the issue: massive packet loss when servers are brought online
Ok.. sooo from what i can assume from glancing at this...
your basically getting extended drop outs..
1.) You have a bad network connection..
2.) The network the server is on, has a bad network connection / bad router / bad something :P
3.) Your servers have conflicting addresses / problem hardware.
My solution..
Run a ping overnight.. and see how many packets you lose in the morning :D (just to see if i was heading in the right direction )
Hope this helps..
You can get flakey connections with certain NIC/switch combo's when autonegotiate is turned on, and it negotiates to half-duplex.
Use "ethtool eth0" to verify that the speed and duplex settings are correct, and to change them if you need to.