I'm specifically looking at the Standard Large On-Demand instance which has 850 GB of instance storage.
What I really don't understand is why have 850 GB of storage on the instance if that data will disappear as soon as the instance is shut down and/or destroyed.
It seems like EBS is the standard way of having permanent disk space for instances, including root volumes. You can stripe them. You can back them up in different availability zones and/or S3 and/or off of Amazon's system completely.
Why keep anything at all on the instance storage if you would have to back it up frequently off-instance (EBS/S3?) to avoid losing it all? Is it a speed advantage even beyond what striping EBS volumes would give you?
It seems like the best configuration for safety and ease of setup would be to just not use the instance storage at all, and to have instead striped EBS volumes with backup to S3 or off site.
Am I right, or is there a good reason for using that 850 GB of instance storage?
Thank you
One thing to remember is that not all data needs to be permanent. Instance store provides a cost effective solution for dealing with temporary data.
Let me provide a few examples.
The most obvious is swap space. If you want to allocate a few GB of swap space, a file on an instance store device is perfect - no cost to the I/O operations, and the data does not even need to persist between reboots.
More practically though, consider that AWS caters to a wide variety of computing tasks - not just to web infrastructure. So, for instance, instance storage is perfect for certain build processes that generate large quantities of intermediate files but a small final product. The need for this kind of storage is not uncommon in scientific applications and even some map-reduce applications.
Temporary files (i.e. /tmp), some caches, and even certain types of logs may also not need to be permanently stored and are well suited to the instance-store model.
Especially given the larger instances which have multiple instance-store volumes attached, you can set them up in RAID0 to improve performance - get large quantities of storage for no additional cost, and do not have to pay for I/O operations.
Consider, for a moment that an m1.xlarge (if purchased as a 3 yr reserved instance) will cost $116.8+$94.44=$211.24/mo - and it includes 1690GB of storage. To provision that same amount of EBS storage would cost $169/mo - plus the I/O costs (which can be substantial). Especially is someone has a cluster with many servers, the cost savings can merit implementing copy all data to a permanent store, but use the instance-store as the primary storage of the server.
The above said, however, in most common cases, EBS is the better way to go - especially with the ease of backups (EBS-snapshots) - which will even work for RAID arrays (and are differential and compressed).
Is there a good reason ? -
P.S. instance storage may be actually slower than EBS
You can hardly get a million iops from EBS, right ? Or multiple gigabyte per second sequential speed.
And even if you buy the highest end EBS with 80k IOPS and 1GB/sec speed, you'll pay 10k USD/month for a storage that's slower than a 800$ desktop ssd stripe.
Instance store is often the only way to operate a fast DB on amazon, EBS is extremely expensive and at the same time very slow.
High latency, very low single-thread performance, extreme pricing for the storage that's faster than 250gb/sec (a multiple of the normal ssd ebs (gp2) storage).
You could setup a mirror of slow GP2 storage (or ST1) and an instance store, this way you have a fast disk to work on and a slow backup storage if you need to reboot.