I'm a programmer whose been pushed into server admin duty, and I've got a problem that's confusing me good. Lack of knowledge is no doubt the culprit, so educate me if you can. :)
PROBLEM IN BRIEF: Two physical servers hosted by same dedicated hosting service. Web server (running in a VM) on one server cannot be reached by the other server, but can be reached by anyone else on the Internet that tries.
SETUP:
We have two servers hosted by ServerBeach. Both running Debian, one running VMWare Server 2 with two VMs - each running Debian also. The VMs each are running Apache and serving up a website. Some fake IPs for clarity:
SERVER #1 (eth0): 10.0.1.1
SERVER #2 (eth0): 11.0.0.1
SERVER #2 secondary IP (eth0:1) - for VM #1: 10.0.2.1
SERVER #2 secondary IP (eth0:2) - for VM #2: 10.0.2.2
The VMs on Server #2 are networked to the host via host-only networking:
SERVER #2 (vmnet1): 192.168.0.1
VM #1: 192.168.0.2
VM #2: 192.168.0.3
... while iptables rules on Server #2 take Internet traffic bound for those secondary IPs and changes the destination IP to go to the VMs, and back again for traffic heading out to the Internet from the VM:
-A PREROUTING -d 10.0.2.1 -i eth0 -p tcp -m tcp --dport 80 -j DNAT --to-destination 192.168.0.2:80
(...)
-A POSTROUTING -s 192.168.0.2 -o eth0 -j SNAT --to-source 10.0.2.1
This works. A computer out there on the Internet can point its browser at http://10.0.2.1 and it gets the web server running on the VM. This sort of setup, where the secondary IPs are aliases on the host machine, and don't live on the VMs themselves, is how ServerBeach insists VMWare setups like this should be configured. And it does the job.
The only strange thing is that, when Server #1 tries to access the Server #2 VM like any other client out there on the Internet, it times out. (I'm logged on to Server #1 via SSH and using links to try and browse to the site, or even telnet on port 80)
If I run tshark on VM #1, I see the SYN packets make it from Server #1 through Server #2 to the VM:
4.607664 10.0.1.1 -> 192.168.0.2 TCP 44983 > http [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=318986 TSER=0 WS=7
52.596287 10.0.1.1 -> 192.168.0.2 TCP 44983 > http [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=330986 TSER=0 WS=7
(etc...)
The SYN packets keep coming, but the VM never sends back a SYN-ACK.
Now, if I hop on any other computer and go to that URL in a browser, I see the SYN, SYN-ACK, and ACK, and of course the traffic that follows (we'll call this other system 170.0.0.1):
8.456176 170.0.0.1 -> 192.168.0.2 TCP 16945 > http [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=1 TSV=972883011 TSER=0
8.456243 192.168.0.2 -> 170.0.0.1 TCP http > 16945 [SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 TSV=718068724 TSER=972883011 WS=4
8.522374 170.0.0.1 -> 192.168.0.2 TCP 16945 > http [ACK] Seq=1 Ack=1 Win=66608 Len=0 TSV=972883012 TSER=718068724
(... let the GETs begin! ...)
Same thing happens on VM #2. Everyone can reach out and communicate with the web server except Server #1.
Server #1 can, of course, reach any other website out on the Internet.
EDIT: If I run nmap -sS 10.0.2.1 from Server #1, port 80 (and any other port that Server #2 is set to pass along to the VM) appear as Filtered. However, if I do the same nmap from any other machine, the ports appear as Open.
I know this question might be hard to follow, and I certainly don't expect anyone that isn't hands-on to just dream up the answer on the spot. But I do wonder if anyone can answer... what might be a reason why VM #1 gets the SYN packets from Server #1, but does NOT attempts to send a SYN-ACK back? I thought that the problem might have been related to the host machine, but the SYNs clearly do make it to the VM, it just seems to ignore them once they get there - but it immediately responds to a SYN from any other client.
Just looking for clues here.
EDIT #2: Following kubanskamac's suggestions, I may have found the problem.
On VM #1, netstat -rn offers up:
Destination Gateway Genmask Flags MSS Window irtt Iface
192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
10.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 eth0
0.0.0.0 192.168.0.1 0.0.0.0 UG 0 0 0 eth0
So if I'm reading this right, anything that VM has destined for 10.x.x.x does NOT go to 192.168.0.1 (the VMWare host's adapter, and the only path VM #1 has to the outside world).
So how do I make VM #1 at least route packets destined to 10.0.1.x through the 192.168.0.1 gateway? Looking at Server #2's netstat -rn, it appears to me that it would properly route the packet if it received it.
EDIT #3: SOLVED!
Edit #2's clues were right. I answered my own question using the "route" command:
route add -net 10.0.2.0 netmask 255.255.255.0 gw 192.168.0.1
Last question: how do I make the above command permanent?
The Server1 seems to be on the same subnet as Server2's interface eth0:1, but you have not provided netmasks so I'm not sure.
Your POSTROUTING rule will only fire up after Server2 decides to send the packet out through eth0 or eth0:1 or eth0:2. To send the packet, Server2 needs to find out which MAC address is desired destination (it uses ARP to find MAC). If the Server1 is on different subnet, then packet should be sent to default gateway's MAC. If Server1 is on the same IP subnet (appears so), no need to bother default gateway, and Server2 alone tries to resolve the IP to some usable MAC. If unsuccessful, packet cannot be sent - it has nowhere to go.
Your NAT is getting in the way. Specifically, the return packets have the original source address as the dest and as such aren't passing through the NAT device to be de-NATted.
You're aware that only 10.x.x.x is properly available to use, and not 11.x.x.x? Other IPs available are 172.16-32.x.x and 192.168.x.x. 170.x.x.x is out. You mentioned that the IPs given are fake, so this may not help.
Is the IP of Server #1 listed in the /etc/hosts.deny file of either the VM or its host?
I presume you've checked the firewall rules of the VM to make sure it's not being dropped.