I have a set of servers inside Amazon EC2 in VPC. Inside this VPC I have a private subnet and a public subnet. In the public subnet I have set up a NAT machine on a t2.micro instance that basically runs this NAT script on startup, injecting rules into iptables. Downloading files from the internet from a machine inside the private subnet works fine.
However I compared the download speed of a file on an external high-bandwidth FTP server directly from my NAT machine to the download speed from a machine inside my private subnet (via the same NAT machine). There was a really significant difference: around 10MB/s from the NAT machine vs. 1MB/s when downloading from the machine inside the private subnet.
There is no CPU usage on the NAT machine, so this cannot be the bottleneck. When trying the same test with bigger machines (m3.medium with "moderate network performance" and m3.xlarge with "high network performance"), I also could not get download speeds greater than 2.5MB/s.
Is this a general NAT problem that can (and should) be tuned? Where does the performance drop come from?
Update
With some testing, I could narrow this problem down. When I am using Ubuntu 12.04 or Amazon Linux NAT machines from 2013, everything runs smoothly and I get the full download speeds, even on the smallest t2.micro instances. It does not matter whether I use PV or HVM machines. The problem seems to be kernel-related. These old machines have a Kernel version 3.4.x, whereas the newer Amazon Linux NAT machines or Ubunut 14.XX have Kernel version 3.14.XX. Is there any way to tune the newer machines?
We finally found the solution. You can fix the download speed by running on the NAT machine (as root):
This disables scatter-gather mode, which (as far as I understand this) stops offloading some network work on the network card itself. Disabling this option leads to higher CPU usage on the client as the CPU now has to do the work itself. However on a t2.micro machine we only saw around 5% of CPU usage when downloading a DVD image.
Note that this won't survive a restart, so make sure to set this in
rc.local
or at least before setting up NAT.I also use NAT boxes in a similar setup in production so very interested in your findings. I haven't had similar findings before production, but maybes it's an issue that I haven't paid attention to before.
Let's do some science!
============================================================================
Theory: NAT boxes can download and upload faster then a client who is using the NAT.
Experiment: Match the questioners experiment. t2.micros with Amazon NAT 2014.09 2 subnets with the NAT going to an IGW and private subnet pointing to the NAT. (Shared Tenancy. General Purpose SSD)
Procedure:
Data:
Conclusion: OP is not lying.
============================================================================
Theory: Different kernel versions lead to different results.
Experiment: Set up 3 nat boxes, each with magnetic SSD, m3.medium (no bursting), and dedicated tenancy. Run a speed test.
Procedure: See last experiment. Also, set up a routing table for each NAT box. Used a blackhole routing table to prove that the changes propagated when I swapped routing tables.
curl google.com
works.curl google.com
to fail on the client.curl google.com
works.Here are my 3 nat boxes: 2014.09 3.14.20-20.44.amzn1.x86_64 2014.03 3.10.42-52.145.amzn1.x86_64 2013.09 3.4.62-53.42.amzn1.x86_64
Data:
All 3 boxes get very similar results when running
speedtest-cli --server 935
From the client:
Conclusion: Is there degradation? No. Is there any difference between the kernel versions? No.
============================================================================
Theory: Dedicated versus shared tenancy makes a difference.
Experiment: 2 NAT boxes. Both using NAT 2014.09. One with shared tenancy, one with dedicated tenancy.
Data: Both boxes have similar performance:
They also have similar standard deviations:
And when you run the 2x2 combinations:
Conclusion: Really unclear the shared versus dedicated doesn't seem to make a big difference.
Meta conclusions:
The test that's probably worth redoing would be OP's test with m3.mediums. I was able to duplicate the t2.micro results, but my m3.medium seems to conflict with OP's m3.medium results.
I'd be interested in seeing your data on kernel versions as well.
Perhaps the most interesting part is how I was unable to get a m3.medium NAT to go quickly.
My tests showed this made my downloads worse.
m3.large speedtest server m3.medium dedicated NAT server.
No other traffic in this environment.
sg on average Download speed: 292.19 sg off average Download speed: 259.21
My test was: for ((n=0;n<10;n++)); do speedtest-cli --simple ; done