In case of RAID4 or RAID5, for each stripe of data bits, a parity bit is stored. For example if I write 0 to drive A and 1 to drive B, then parity bit 1 is stored to drive C. Isn't this a huge load to CPU in case of Linux software-RAID if for each bit of data, a parity bit needs to be calculated? For example if I write a 1GB file to RAID5 array, then 8000000000 XOR calculations needs to be performed by CPU?
As TomTom has said, it's not as brutal as it used to be; but then disc drives have got bigger while CPUs were getting faster.
Which is why it's not a good idea to do RAID-5 in software unless you really don't care about performance. RAID-5 in hardware at least ensures there's a reserved processor whose sole job is to do those parity calculations; also, the hardware will often have things like NVRAM to prevent array corruption, and the ability to optimise the calculations, eg by knowing that a whole stripe is being written and skipping the (hugely expensive) read-modify-write cycle in favour of a simple parity recalculation.
Even with hardware RAID acceleration, applications that modify very small chunks of data at a time - particularly databases - can perform very badly indeed on RAID-5 (and RAID-6, which is even more expensive in terms of parity calculation). For that kind of application, just put your hand in your pocket, get the extra discs, and do RAID-1+0.
You mean for a modern CPU with like 6-8 cores all many times more powerfull than that on a RAID card?
Not today, This is 2013. CPU's can handle a LOT of stuff these days. You will have problems using up the power of a single core, unless you run quite a lot of SSD's to RAID.
Since 3.12 kernel (see commit 851c30c9badfc6b294c98e887624bff53644ad21) there is now a parameter
/sys/block/mdX/md/group_thread_cnt
(where X is your raid device's block device number) which controls the number of threads the kernel can use to perform parity calculations.echo
ing a number greater than one into that file (e.g.4
if you want 4 CPUs to be used) can be especially helpful when you have extremely fast disks (e.g. NVMe).NB: this option only exists for RAID5/6 setups where parity is being calculated.