I have a server with Ubuntu 20.04.1 LTS
with NVME ssd devices running on raid1 and the hard drive operates so slow! to open a gzip file to of 500MB to a 3.7GB takes a while.. way more then it should. this is a development server just for me, so even when I use MariaDB, loading SQL dumps take around 30 minutes that when I try to load them locally on my home computer it takes several minutes, everything is slow! even upgrading ubuntu packages takes a long time!
so I gathered some specs:
Linux Kernel: 5.4.0-42-generic
CPU: Intel(R) Xeon(R) D-2141I CPU @ 2.20GHz
Memory: 32GB
two WDC CL SN720 SDAQNTW-512G-2000 hard drives with software raid1 (nvme ssd)
and information from some commands
# cat /proc/mdstat
Personalities : [raid1] [raid0] [raid6] [raid5] [raid4] [linear] [multipath] [raid10]
md2 : active raid1 nvme1n1p2[1] nvme0n1p2[0]
523200 blocks [2/2] [UU]
md3 : active raid1 nvme1n1p3[1] nvme0n1p3[0]
498530240 blocks [2/2] [UU]
bitmap: 4/4 pages [16KB], 65536KB chunk
unused devices: <none>
md3
is used as the root partition and I test on that.
# lsblk -io KNAME,TYPE,SIZE,MODEL,MOUNTPOINT
KNAME TYPE SIZE MODEL MOUNTPOINT
loop0 loop 55M /snap/core18/1880
loop1 loop 70.6M /snap/lxd/16894
loop2 loop 29.9M /snap/snapd/8542
loop3 loop 70.6M /snap/lxd/16922
loop4 loop 55.3M /snap/core18/1885
loop5 loop 29.9M /snap/snapd/8790
md2 raid1 511M /boot
md2 raid1 511M /boot
md3 raid1 475.4G /
md3 raid1 475.4G /
nvme0n1 disk 477G WDC CL SN720 SDAQNTW-512G-2000
nvme0n1p1 part 511M /boot/efi
nvme0n1p2 part 511M
nvme0n1p3 part 475.4G
nvme0n1p4 part 511M [SWAP]
nvme1n1 disk 477G WDC CL SN720 SDAQNTW-512G-2000
nvme1n1p1 part 511M
nvme1n1p2 part 511M
nvme1n1p3 part 475.4G
nvme1n1p4 part 511M [SWAP]
and
# madam -detail /dev/md3
/dev/md3:
Version : 0.90
Creation Time : Thu Jul 30 13:49:54 2020
Raid Level : raid1
Array Size : 498530240 (475.44 GiB 510.49 GB)
Used Dev Size : 498530240 (475.44 GiB 510.49 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 3
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Tue Sep 8 13:37:54 2020
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Consistency Policy : bitmap
UUID : 9dd3cf94:cfc5c935:a4d2adc2:26fd5302
Events : 0.13
Number Major Minor RaidDevice State
0 259 3 0 active sync /dev/nvme0n1p3
1 259 8 1 active sync /dev/nvme1n1p3
I tried testing the speed of the drive using fio
with the command
fio --name=randwrite --ioengine=libaio --iodepth=64 --rw=randwrite --bs=64k --direct=1 --size=32G --numjobs=8 --runtime=240 --group_reporting
and the results are:
Jobs: 8 (f=8): [w(8)][100.0%][w=776MiB/s][w=12.4k IOPS][eta 00m:00s]
randwrite: (groupid=0, jobs=8): err= 0: pid=1028157: Tue Sep 8 13:09:05 2020
write: IOPS=11.1k, BW=692MiB/s (726MB/s)(162GiB/240041msec); 0 zone resets
slat (usec): min=127, max=567387, avg=385.31, stdev=5093.87
clat (usec): min=2, max=1044.7k, avg=45818.20, stdev=55462.71
lat (usec): min=268, max=1045.0k, avg=46206.45, stdev=55680.36
clat percentiles (msec):
| 1.00th=[ 10], 5.00th=[ 22], 10.00th=[ 23], 20.00th=[ 26],
| 30.00th=[ 29], 40.00th=[ 33], 50.00th=[ 36], 60.00th=[ 41],
| 70.00th=[ 46], 80.00th=[ 53], 90.00th=[ 64], 95.00th=[ 75],
| 99.00th=[ 443], 99.50th=[ 493], 99.90th=[ 550], 99.95th=[ 567],
| 99.99th=[ 600]
bw ( KiB/s): min=48768, max=1394246, per=99.97%, avg=708325.21, stdev=25832.90, samples=3840
iops : min= 762, max=21784, avg=11066.98, stdev=403.63, samples=3840
lat (usec) : 4=0.01%, 10=0.01%, 50=0.01%, 250=0.01%, 500=0.01%
lat (usec) : 750=0.02%, 1000=0.02%
lat (msec) : 2=0.08%, 4=0.18%, 10=0.80%, 20=1.91%, 50=74.09%
lat (msec) : 100=20.72%, 250=0.61%, 500=1.14%, 750=0.41%, 1000=0.01%
lat (msec) : 2000=0.01%
cpu : usr=9.10%, sys=41.41%, ctx=1203665, majf=0, minf=95
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=0,2657370,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=692MiB/s (726MB/s), 692MiB/s-692MiB/s (726MB/s-726MB/s), io=162GiB (174GB), run=240041-240041msec
Disk stats (read/write):
md3: ios=0/3319927, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/2692572, aggrmerge=0/634101, aggrticks=0/79485190, aggrin_queue=74460696, aggrutil=94.36%
nvme0n1: ios=0/2692573, merge=0/634101, ticks=0/83651179, in_queue=78562212, util=94.36%
nvme1n1: ios=0/2692572, merge=0/634102, ticks=0/75319202, in_queue=70359180, util=94.04%
I tried googling and found that people said that if I'll change Intent Bitmap
from Internal
to none it should speed things up, but after changing it and running fio it made things a bit slower.. maybe I needed to wait a while? I don't know.
so I'm pretty much lost.. I really don't know how to continue investigating from here, so really.. any information regarding this issue would be greatly appreciated. of course I also monitored the CPU to make sure that it's related but it looks like the cpu is not used greatly at all.
thank you!
update
someone on IRC asked me if I set writethrough
by accident, I tried to google about it and found this https://www.kernel.org/doc/html/latest/driver-api/md/raid5-cache.html
it talks about raid4/5/6 and I use raid1 so maybe it's not relevant,
also the file sys/block/md3/md/journal_mode
as stated in this document does not exist.
update 2
found a way to test cache reads and writes
# hdparm -tT /dev/md3
/dev/md3:
Timing cached reads: 1006 MB in 1.99 seconds = 504.40 MB/sec
HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
Timing buffered disk reads: 664 MB in 3.01 seconds = 220.88 MB/sec
I hope this information is also useful
0 Answers