Ping a Specific Port

Question

Ryan Patterson

Asked: 2024-09-30 22:25:08 +0800 CST2024-09-30 22:25:08 +0800 CST 2024-09-30 22:25:08 +0800 CST

Why is MD RAID1 so much slower than the raw network disk?

772

I'm trying to configure MD RAID1 (using mdadm) with the --write-mostly option so that a network (EBS) volume and a local drive are mirrors of one another (the idea being that the local drive is ephemeral to my instance, but has better performance).

To vet this idea, I get a baseline performance estimate of my drive using the following two scripts.

fio -name=RandWrite -group_reporting -allow_file_create=0 \
  -direct=1 -iodepth=128 -rw=randwrite -ioengine=io_uring -bs=32k \
  -time_based=1 -ramp_time=10 -runtime 10 -numjobs=8 \
  -randrepeat=0 -norandommap=1 -filename=$BENCHMARK_TARGET

# Read performance
fio -name=RandRead -group_reporting -allow_file_create=0 \
  -direct=1 -iodepth=128 -rw=randread -ioengine=io_uring -bs=32k \
  -time_based=1 -ramp_time=10 -runtime 10 -numjobs=8 \
  -randrepeat=0 -norandommap=1 -filename=$BENCHMARK_TARGET

Results:

Network drive: 117 MiB/s write, 117 MiB/s read
Local drive: 862 MiB/s write, 665 MiB/s read

The problem comes when I introduce mdadm. Even when using a trivial no-mirror "RAID1", the write performance is severely worse when using the network drive.

mdadm --build /dev/md0 --verbose --level=1 --force --raid-devices=1 "$TARGET"
# mdadm --detail /dev/md0
/dev/md0:
           Version :
     Creation Time : Mon Sep 30 14:22:41 2024
        Raid Level : raid1
        Array Size : 10485760 (10.00 GiB 10.74 GB)
     Used Dev Size : 10485760 (10.00 GiB 10.74 GB)
      Raid Devices : 1
     Total Devices : 1

             State : clean
    Active Devices : 1
   Working Devices : 1
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : resync

    Number   Major   Minor   RaidDevice State
       0       8       16        0      active sync   /dev/sdb

0-mirror RAID1 array backed by network drive: 69.9 MiB/s write, 118 MiB/s read
0-mirror RAID1 array backed by local drive: 868 MiB/s write, 665 MiB/s read

As we can see here, the write performance didn't change much for local drive (MD-raid vs. raw access), but it's severely impaired when using the network drive via MD-raid. Why does this happen?

3 Answers

Voted

Tero Kilkanen · Answer 1 · 2024-10-01T04:58:15+08:00

Tero Kilkanen

2024-10-01T04:58:15+08:002024-10-01T04:58:15+08:00

Without knowing exact mdadm implementation, I'll write my educated guess on this.

I think that in RAID 1 setup, the RAID subsystem waits for both drives to acknowledge write operations before processing next file events. And then there might be additional delays introduced by the mismatch of performance between the drives, which would then contribute to the 69.9 MiB/s vs 117 MiB/s write speed.

I don't think it is feasible to create RAID arrays with devices where access speed is vastly different. RAID wasn't designed for this use case.

You might want to look at a cluster filesystem such as GFS2 or OCFS2, those might be better suited for your use case.

7

Ryan Patterson · Answer 2 · 2024-10-01T11:34:50+08:00

As near as I can tell, this is a failure mode caused by overloading the MD kernel module with IOPS.

When I modify my scripts to use iodepth=64 numjobs=1, I see no loss in performance on the raw drives, and my RAID1 write performance impact disappears.

Here are the final scripts:

fio -name=RandWrite -group_reporting -allow_file_create=0 \
    -iodepth=$IODEPTH -numjobs=$NUMJOBS \
    -direct=1 -rw=randwrite -ioengine=io_uring -bs=16k \
    -time_based=1 -ramp_time=10 -runtime 10 \
    -randrepeat=0 -norandommap=1 -filename=$DEVICE
fio -name=RandRead -group_reporting -allow_file_create=0 \
    -iodepth=$IODEPTH -numjobs=$NUMJOBS \
    -direct=1 -rw=randread -ioengine=io_uring -bs=16k \
    -time_based=1 -ramp_time=10 -runtime 10 -randrepeat=0 \
    -norandommap=1 -filename=$DEVICE

And here are the net results:

Metric	iodepth=128 numjobs=8	iodepth=64 numjobs=1
Local disk, write	629	766
Local disk, read	754	877
Cloud disk, write	117	117
Cloud disk, read	118	117
RAID1, local, write	526	775
RAID1, local, read	751	853
RAID1, cloud, write	59	117
RAID1, cloud, read	115	117
RAID1, both, write	61	117
RAID1, both, read	755	879

I am guessing that too many IOPS combined with the slower drive leads to excessive queue length, which then leads to some sort of lock contention in the kernel module. But I don't know enough of the details to be sure. What I have learned, is that I'll need a more accurate benchmark to properly decide if this approach is viable for my use case.

shodanshok · Answer 3 · 2024-10-01T17:02:03+08:00

shodanshok

2024-10-01T17:02:03+08:002024-10-01T17:02:03+08:00

You are probably bound by MD write bitmap. You can try disabling it (via --bitmap=none during creation or later with --grow), but be sure to understand that an unclean shutdown on a bitmap-less array means a full resync after restart.

-1

Why is MD RAID1 so much slower than the raw network disk?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?