When booting up an application from an AMI we noticed increased response times and increased rate of request time-out errors, slowly tapering off and going back to normal. I figured this is due to EBS lazy initialization (a well-documented performance characteristic of EBS). The application has a 24 GB EBS data volume.
I tried increasing instance sizes and noticed no difference. So, taking a step back to try to isolate the performance bottleneck, I ran some benchmarks with different instance sizes to try to find the one with the best pure EBS initialization performance, under the assumption that this will serve as a good proxy for "performance with lazy initialization during normal use of the application".
And I ran into a major surprise:
A t3.medium
instance performs the same as a c5.18xlarge
!
How can this be?
I'm using the fio
command recommended by AWS here:
sudo fio --filename=/dev/nvme0n1 --rw=read --bs=128k --iodepth=32 --ioengine=libaio --direct=1 --name=volume-initialize
(modified for device /dev/nvme0n1)
the larger instance has nominally 5x the network performance of the smaller one (25 Gbps vs "Up to 5 Mbps").
Both plod along at about 35 MiB/s.
Bonus question: What instance type will give me the fastest EBS and S3 peformance, including, EBS initialization from snapshot?
UPDATES
- Adding an S3 endpoint to the VPC made no difference.
- When I increase the EBS volume size to the maximum 10,000 IOPS (i.e. 3333 GB), the speed goes up to about 45 MiB/s. I'm only testing on the c5.18xlarge at this point
Background
EBS snapshots are stored on S3 (this is documented on the link you provided above). When you restore a snapshot it pulls in blocks from S3 when they're required (documented here, copied below).
Updated Idea Again
As Michael points out below, the bottleneck here is likely between S3 and EBS. I'm surprised it's so low at 35MB/sec aka 280Mbps, but I guess it's a single S3 object it's retrieving from. S3 can sustain huge bandwidth, but typically with multiple objects.
Based on what Michael says I think you just have to put up with this relatively slow restore. You don't have to pre-warm the volume, you can let it happen on demand and take the hit over the first minutes / hours / days the instance is up.
Answer: The initialization speed for EBS Snapshots is not affected by EC2 instance type.
As of 12/1/2018.
Apparently 42 MiB/s is the maximum pre-warm / initialization rate that can be achieved from an EBS Snapshot to a single 10,000 IOPS volume. While the speed is not impacted by instance type, it does drop to 35 MiB/s on smaller volumes (100 IOPS). The speed is also not impacted by the presence of an S3 Endpoint on the VPC.
For comparison, copying directly from a live EBS volume to another, bypassing the snapshot process, performs at 128 MiB/s on an r5d.large instance, single-threaded, using a
tar|pv|tar
pipe, over an ext4 filesystem.