We have been trying to do some performance benchmarking for our webapp written using python that uses mongodb heavily and we have found the following.
We tried using 1st Gen Extra Large ec2 servers with 8 ecu's and 15 GB memory
- The python on ec2 servers is at least 30% slower than the local machine
- The disk I/O is extremely slow. The mongostat and iostat results show that the disk write is around 1MBpS
- The programs themselves run so much slower than the local machine
We have not been able to figure out why all this is happening. The local machines we are talking about have 8GB RAM and i5 processors.
UPDATE The way we tested python was that we ran a loop that would take 10 seconds to complete with no disk read or write. The same took at least 30% more time on each trial.
Does this relate to this?
There are a number of factors for EC2 machines to be slow. Disks are not attached directly to the instance. Instead the ebs volumes are large network disks and whatever you write to them is sent across the network to these disks. Now usually the latency is quite low but, of course, in comparison to something which is directly attached to your machine it will appear slow.
It is a virtual machine. No matter what you do it has to compete with other machines for CPU cycles. Run top if you are using Linux and check out CPU steal percentage. A non zero number will indicate that there is high competition for CPU. In any case virtual CPUs are not as fast as actual CPUs for comparable Processors.
Another personal observation is that luck plays an important role in EC2 (yes!). At times you get an older hardware which is just not as fast. Another personal experience is that at times you get amd opteron processors which are usually not as fast as Intel based. I am not suggesting that AMD processors are bad but it seems that in this case Intel ones work faster. Maybe they are of newer generation.
Having maintained mongo on EC2, I totally understand your pain. I would suggest that try to keep as much data in-memory as possible. In general, EC2 is not actually designed for vertical scaling. It is beneficial to have a lot of smaller instance dividing work then have a huge instance doing everything alone.
I got very good results with software-raiding EBS volumes to RAID0 -- saw slightly better than 50% increase in read speed, a little less for writes. We had an application that was completely useless on AWS until we did this and it saved our butts.
Also, your IO will fluctuate with time of day and how much other customers are using that cluster/machine. As I recall, the AWS fine print guarantees no more than 100 iops on a standard EBS volume, though you'll get much more most of the time. My phone is probably faster than 100 iops. If those fluctuations are not acceptable, create your EBS volumes with provisioned iops. That's a little more expensive, but allows you to set the minimum throughput your application can live with, and guarantees that you'll get it no matter what other AWS customers are doing.
As for this:
"The python on ec2 servers is at least 30% slower than the local machine"
This could also be distro-related, or specific to your app. You didn't say exactly what you mean by "30% slower." If your test involved disk access, see above (promise, it helps.) If not, you might want to provide more information on what you were testing.
If you're really concerned about disk io, AWS offers Provisioned IO EBS volumes and EBS Optimized instances.
With Provisioned IO you can specify the average io performance you need from an EBS volume. Normal EBS volumes have a value of 100, and you can specify up to 2000. Additionally you could setup software RAID arrays out of multiple Provisioned IO EBS volumes to get higher throughput. http://aws.amazon.com/about-aws/whats-new/2012/07/31/announcing-provisioned-iops-for-amazon-ebs/
Additionally EBS optimized images offer faster dedicated connectivity to EBS, at 500 and 1000 Mbps on m1.large, m1.xlarge and m2.4xlarge. http://aws.amazon.com/ebs/
You can also check out the high-io instance, with attached SSD drives: