We have quite an old installation of SUSE LINUX 10.1 (i586) in the office.
The problem shortly: I can successfully ssh to it from machines in the same LAN (192.168.1.0) but not from machines in the other LAN (10.23.0.0).
The SuSE has SSH server openssh-4.2p1-18.12. I have ruled out the firewall and hosts.allow and hosts.deny files.
When my ssh login attempt fails, here is what the logs say:
on the client:
$ ssh -vvv 192.168.1.5
OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug2: ssh_connect: needpriv 0
debug1: Connecting to 192.168.1.5 [192.168.1.5] port 22.
debug1: Connection established.
debug1: identity file /home/nbuild/.ssh/identity type -1
debug1: identity file /home/nbuild/.ssh/identity-cert type -1
debug1: identity file /home/nbuild/.ssh/id_rsa type -1
debug1: identity file /home/nbuild/.ssh/id_rsa-cert type -1
debug1: identity file /home/nbuild/.ssh/id_dsa type -1
debug1: identity file /home/nbuild/.ssh/id_dsa-cert type -1
on the server:
Aug 21 16:34:25 serverhost sshd[20736]: debug3: fd 4 is not O_NONBLOCK
Aug 21 16:34:25 serverhost sshd[20736]: debug1: Forked child 20739.
Aug 21 16:34:25 serverhost sshd[20736]: debug3: send_rexec_state: entering fd = 7 config len 403
Aug 21 16:34:25 serverhost sshd[20736]: debug3: ssh_msg_send: type 0
Aug 21 16:34:25 serverhost sshd[20736]: debug3: send_rexec_state: done
Aug 21 16:34:25 serverhost sshd[20739]: debug1: rexec start in 4 out 4 newsock 4 pipe 6 sock 7
Aug 21 16:34:25 serverhost sshd[20739]: debug1: inetd sockets after dupping: 3, 3
Aug 21 16:34:25 serverhost sshd[20739]: debug3: Normalising mapped IPv4 in IPv6 address
Aug 21 16:34:25 serverhost sshd[20739]: Connection from 10.23.1.11 port 44340
The above log on the server is when I enable DEBUG3 log level. However, with the default log level (INFO), the only thing the server logs is this:
Aug 21 16:38:32 serverhost sshd[20749]: Did not receive identification string from 10.23.1.11
Any hints? I feel I've tried everything already.
Update: The machines that cannot ssh are in another VLAN if that matters. I have tried CentOS 6.5 and Ubuntu.
Sounds like you have not exchanged keys with the server. Have you tried connecting with the username/password?
You can exchange the keys with something like this:
I have resolved the problem. It appears to be some sort of strange networking/routing issue, that is, packets going back and forth between the two subnets pass through different routes.
192.168.1.1 is our office router (Cisco RV042) that connects us to the Internet. 192.168.1.200 is our office smart, managed, VLAN-aware Cisco switch (SG300) that connects us all to each other and to the router. This switch is running in System Mode L3 which means that it can also act as a router between the VLANs. It has two VLANs configured - VLAN 1 (the default) and VLAN 2. Hosts with IPs starting with 192.168.1.x are in VLAN 1 and 10.23.x.x are in VLAN 2.
First, this is the setup that DID NOT work:
The way it was, traceroute showed that packet from VLAN 1 went through 3 hops to reach VLAN 2:
CASE 1
whereas packets from VLAN 2 went through 2 hops to reach VLAN 1:
CASE 2
In Case 1
The host 192.168.1.5 has default gateway 192.168.1.1 (our office router). So, a packet first goes to the router, then the router forwards it to 192.168.1.200 (our smart switch) because I have explicitly configured the router with a static rule, otherwise nothing happens (I guess because by default 10.0.0.0 network is private and non-routable or whatever, I'm no network expert). From there, our smart switch acts a little bit as a router (L3, remember?), forwarding the packet to its final destination 10.23.1.11.
In Case 2
The host 10.23.1.11 has default gateway 10.23.1.1. This is again the switch, but this time with another interface in VLAN 2. So a packet first hits the switch and the switch, just like above acts like a router and directs the packet via it's other interface to the proper host 192.168.1.5 in VLAN 1. This time we have a shortcut bypassing the router.
Now, here is the thing that resolved the SSH problem in my original posting (although I still don't know why):
A colleague of mine suggested that we make both CASE 1 and CASE 2 behave similarly and see what happen i.e. eliminate the superfluous hop in CASE 1 and skip the router altogether. So, I changed the default gateway in CASE 1. The entries for 169.254.0.0 and 127.0.0.0 were already there, don't know why, it's a legacy system:
so now the default gateway is changed from 192.168.1.1 (the router) to 192.168.1.200 (the switch) and packets destined for hosts in VLAN 2 don't have to go to the router and then back to the switch, but just shortcut like in CASE 2. Now we have:
And most importantly (and astonishing at the same time), the SSH issue resolved itself!!! Now I can SSH from 192.168.1.5 to 10.23.1.11. I still think that SSH shouldn't care where the packets go through, though, but go figure...
For the routing setup above, the down side is that if host 192.168.1.5 wants to access the Internet, it has to go through the switch first, then the router and finally out. This adds one unnecessary hop and I have failed to optimize it (see notes below).
Note 1:
I've tried adding gateway 192.168.1.200 specifically for destination 10.23.0.0 without deleting 192.168.1.1 like this:
with the intention that only packets destined for VLAN 2 will go through 192.168.1.200 and those for the Internet will go through 192.168.1.1 but that didn't work. Packets destined for LAN 10.23.0.0 still went through 192.168.1.1 and SSH still didn't work.
Note 2:
I've tried adding 192.168.1.200 with another command:
still no luck.
Note 3:
Finally, I've tried leaving 2 default gateways at the same time, i.e. adding .200 without deleting .1.
I don't know if this is a healthy thing to do. I don't know how the operating system decides where to send the packets each time, because the last two entries in the routing table seem identical. Behavior seems random. Networking experts out there, please, explain.