I have 3x 14TB toshiba drives mdadm-ed (/dev/md0
) raid 5'd together that are setup as a bcache. I have a 256GB fast SSD as the front of the bache.
write-back is enabled on bcache.
After a few days, the device (/dev/bcache0
) becomes extremely slow. I mean like 1000th of it's normal speed.
My 2 questions are:
For /dev/md0, what tuning should I do for those toshiba drives? It's 4k chunks 64k blocks.
Is there any bcache tuning I could do?
I'm not even really sure what other info I should put here. But if you ask, I'll update this post. Thanks!
Update 1- my IOSTAT while getting 100mb/sec read, only 3mb/sec write: https://pastebin.com/wKKf4LTq
The computer is an amd 2990wx w/32gb ram. CPU isn't the issue.
My old 3770k from 2010ish would get hard better read and write speeds than this. It's got to be some sort of setting or tuning. Thanks!
Update 2- While the system is running normally, below is the hdparm output. hdparm takes to long to run when it's not running normally.
/dev/md0:
Timing cached reads: 11148 MB in 2.00 seconds = 5578.33 MB/sec
Timing buffered disk reads: 1372 MB in 3.00 seconds = 456.84 MB/sec
/dev/bcache0:
Timing cached reads: 12564 MB in 2.00 seconds = 6286.57 MB/sec
Timing buffered disk reads: 1226 MB in 3.00 seconds = 408.66 MB/sec
Thanks!
With Samsung TLC memory, I'd stick to 512k bucket size. This will align with the page size for every 3 buckets (usually you would match the other way around but there's no sane way to align 1.5MB with any bucket size = 2^n). Use a sector size of 4k. BTW: This assumes, Samsung TLC uses 1.5MB page size but this is not officially documented somewhere. But 512k is still a safe value also for 2MB page size because it would align every 4 buckets.
Also, please align your data offset with your RAID5 setup. The bcache docs give some hints for that. It's very important to get that right. Personally, I didn't yet try such a setup but I guess
[sysfs]/bdev*/partial_stripes_expensive
may also be interesting in RAID-5.I'm also guessing that the slowdowns show up when the cache has filled. You should disable discard for the cache, it's a synchronous operation for many drives due to firmware bugs. Instead, remove the bcache cdev, trim the whole partition, then resize the partition to 80-90% of its original size, align it to a 2MB boundary, and recreate bcache. Then, never touch this free partition space, it allows the drive to do background wear-leveling, discard is no longer needed then. You could create a protective partition to reserve this space, this makes it also easy to trim the reserved space.
To recreate the cache device, detach it from the backing device via sysfs, wait for completion, then unregister it, follow the steps for recreating it correctly, then attach the backing device back to the new cache. This can all be done online without reboot. But if you're not comfortable with it, make backups first.
This had to still be building or re-indexing or something. Out of no where, it started running very fast.
So anyone else that has this issue, look at your mdadm status. If it's doing anything, that might be the cause. Also, by default, it reindexes the first Sunday of every month.