Problem: I occasionally have network issues with my Ubuntu VPS. I cannot SSH to the box, I cannot ping the box by IP address. I can access the box via host Serial terminal. When I access the box via serial, I can't ping out anywhere (far as I can tell), even when pinging by IP address. After some amount of time the network comes back, sometimes without my intervention. Sometimes it comes back when I am fiddling around. But it is hard to tell why. (Edit: It is very consistently out for 1 hour)
Questions: How can I proceed in troubleshooting this issue? What things can I do in order to rule out configuration/software problems in my control so that I can feel more comfortable bringing up the issue to my VPS host?
Things I have tried:
- Bring eth0 down and up
- Disable firewall temporarily
- Checked VPS host advisories for network issues - haven't seen any
- Reboot the server via Web console
- Note: None of these have worked for me
Details:
- Ubuntu 10.04.1 LTS
- Hosted with Xen virtualization
- Have root access (SSH) to perform my own upgrades, installs, etc.
- I have the VPS setup as a VPN server so that I can connect to it "Road Warrior" style and forward all my traffic through the VPS first. So that is the junk with 10.8.X.X
- All traffic including DNS lookups are forwarded through the VPS
- Use uncomplicated firewall (ufw) with some basic rules
- Also acts as a server for some services including Mumble and web server
- I setup a script on the VPS as a cron job to ping some common internet entities by IP address every 5 minutes. If there is failure in the ping, then it logs it to a file. Simple enough. Consistently the network outage lasts for an hour. It does not always happen at the same time of day. On almost all of the occurrences, the network is down for an hour and then it "magically" comes back.
- Memory usage on my VPS is typically very high. Usually I am maxed out and using some swap. The memory hog is java, if that detail helps.
- My provider has been very unhelpful. It has ranged from "we are sorry, we had an unfortunate issue" to "there is no problem now". This is frustrating to me because typically I make a ticket when there is a problem, but the problem is gone by the time the ticket is addressed. The most recent communication has been that they suggest reformatting my VPS and starting over, which i am not keen about.
- Consistently network outages start on the hour (within 5-10 minutes). That is, network outages do not start around XX:30, XX:45, etc.
netstat -rn
Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 10.8.0.2 0.0.0.0 255.255.255.255 UH 0 0 0 tun0 XX.57.166.0 0.0.0.0 255.255.255.128 U 0 0 0 eth0 192.168.50.0 10.8.0.2 255.255.255.0 UG 0 0 0 tun0 10.8.0.0 10.8.0.2 255.255.255.0 UG 0 0 0 tun0 0.0.0.0 XX.57.166.1 0.0.0.0 UG 0 0 0 eth0
ip route list
10.8.0.2 dev tun0 proto kernel scope link src 10.8.0.1 XX.57.166.0/25 dev eth0 proto kernel scope link src XX.57.166.59 192.168.50.0/24 via 10.8.0.2 dev tun0 10.8.0.0/24 via 10.8.0.2 dev tun0 default via XX.57.166.1 dev eth0 metric 100
cat /etc/network/interfaces
auto eth0 iface eth0 inet static address XX.57.166.59 gateway XX.57.166.1 netmask 255.255.255.128 auto lo iface lo inet loopback