I wanted to share an issue (I guess I may be misunderstanding some concepts) that I'm facing with some benchmarks I'm doing to XFS setups, as we are going to recently migrate a service to a new instance and we would like to have the max. amount of IOPS possible.
We have a Gitolite instance that currently works with a 500GB io1 volume (25K IOPS), we would like to move this service to a new instance and I was considering the possibility of improving the underlying filesystem. At this moment the filesystem the instance has it's XFS on top of LVM on that single volume.
I have been doing some benchmarks on moving the service to an instance with:
- 8 volumes of 50GB - 2500IOPS each of those
These 8 volumes are included in the same LVM group in an stripped configuration. The commands I used to create this stripped setup are:
## Create the LVM PV's
$ pvcreate /dev/nvme[12345678]n1
## Create the volume group:
$ vgcreate test_vol /dev/nvme[12345678]n1
## Create the stripe configuration:
$ lvcreate --extents 100%FREE --stripes 8 --stripesize 256 --name test test_vol
## XFS format the new volume:
$ mkfs.xfs /dev/mapper/test_vol-root -f
And that should be it. Now, benchmarks.
Running this fio test over this virtual volume:
io --name randwrite --ioengine=libaio --iodepth=2 --rw=randwrite --bs=4k --size=400G --numjobs=8 --runtime=300 --group_reporting --filename=/test/testfile --fallocate=none
Shows the following report:
Jobs: 8 (f=8): [w(8)][100.0%][w=137MiB/s][w=35.1k IOPS][eta 00m:00s]
randwrite: (groupid=0, jobs=8): err= 0: pid=627615: Wed Nov 25 13:15:33 2020
write: IOPS=23.0k, BW=93.7MiB/s (98.2MB/s)(27.4GiB/300035msec); 0 zone resets
slat (usec): min=2, max=132220, avg=141.07, stdev=2149.78
clat (usec): min=3, max=132226, avg=143.46, stdev=2150.25
Which is not bad at all, but executing the very same fio
benchmark on another instance with a single volume of 500GB (25K IOPS) shows:
Jobs: 8 (f=8): [w(8)][100.0%][w=217MiB/s][w=55.6k IOPS][eta 00m:00s]
randwrite: (groupid=0, jobs=8): err= 0: pid=11335: Wed Nov 25 12:54:57 2020
write: IOPS=48.2k, BW=188MiB/s (198MB/s)(55.2GiB/300027msec); 0 zone resets
slat (usec): min=2, max=235750, avg=130.69, stdev=1861.69
Which is by far much better output than the stripped setup.
We are going to use this instance to host an internal Git server, so I was assuming than an stripped setup would be much better than an instance with a single volume, but those benchmarks show the best setup (in terms of IOPS/bandwidth) is the one with the single disk.
Am I assuming anything wrong? Will the stripped setup work better for random writers(ie. not running out of IOPS)
I'm not sure how AWS abstracts the storage hardware being presented to EC2 instances, but I'm willing to bet they already have some sort of RAID configuration going on, and it's not a 1:1 physical drives to EBS volumes thing. It wouldn't make sense for them.
So what you're doing is stripping over several logical volumes that are already stripped over several physical drives which might explain why "single drive" numbers are better since there isn't an additional stripping going on at the virtual machine's OS level.
Also, have you considered giving CodeCommit a try? I find that there's a few features missing on the web console side, but if you use a "normal" git client it works just fine. YMMV
One problem: git is not going to be using libaio, so your numbers are at least a little off. Probably use the Linux default ioengine=psync.
And another: is 100% 4k sized writes accurate for a git server? Seems like there would be reads for clients to fetch from repos. And sequential I/O when reading and writing pack files. Possibly a more accurate simulation would include read and write jobs, in approximate R/W ratios for how many fetches versus pushes this serves.
115% and 193% of the IOPS quotas is expected and a little anomalous, respectively. Exceeding quota, assuming this isn't some artifact of the test, doesn't necessarily mean striping is inherently worse. Could be you got lucky with physical placement and your neighbors were idle.
With these caveats, assume this 400 GB volume can deliver at least the 20k IOPS provisioned. Do you anticipate the need for more?
Yes, LVM striping can exceed the limits of any one LUN. But these SSD disks can in theory max out at 60k IOPS and 16 TB size. It will be simpler operationally to use only one.