I've got an Ubuntu 12.04 server running on Amazon's EC2 that runs a web crawling process. We're running into a problem where some of the webservers hosting the sites we need to crawl are blocking all EC2 IP addresses.
My brilliant idea was to tunnel outgoing HTTP requests through a VPN. I was able to get the VPN set up but it routed ALL traffic through the VPN which meant that I couldn't SSH into the machine and it wouldn't respond to any incoming http requests. (This server also hosts a web service that we need to be able to access)
Really I just want to "proxy" all outgoing HTTP requests through the VPN so we can access sites that have all EC2 IPs blocked.
It's very possible I'm going about this the wrong way and I welcome any other suggestions that might be simpler or more robust.
What you need is source policy routing to route responses to incoming connections out the EC2 gateway instead of the VPN. Assuming that your instance's internal IP is 1.0.0.20 with default gateway at 1.0.0.1, and VPN IP is 10.8.0.20:
Create named routing tables (only needs to be done once)
Configure the new routing tables with a default route over their respective gateways
Add routing rules to select the correct routing table based on the source address
This should allow you to set the default gateway to be the VPN and still have incoming connections work.
What you could do instead however, is configure your application to explicitly bind to the VPN IP (10.8.0.20) when creating outgoing connections, which will cause all connections from that application to go over the VPN, but all other outgoing connections go out directly. If you can't configure your application to bind to the VPN IP, you could add an HTTP proxy server to do this part.