If one happens to have some server-grade hardware at ones disposal, is it ever advisable to run ZFS on top of a hardware-based RAID1 or some such? Should one turn off the hardware-based RAID, and run ZFS on a mirror
or a raidz
zpool
instead?
With the hardware RAID functionality turned off, are hardware-RAID-based SATA2 and SAS controllers more or less likely to hide read and write errors than non-hardware-RAID controllers would?
In terms of non-customisable servers, if one has a situation where a hardware RAID controller is effectively cost-neutral (or even lowers the cost of the pre-built server offering, since its presence improves the likelihood of the hosting company providing complementary IPMI access), should it at all be avoided? But should it be sought after?
The idea with ZFS is to let it known as much as possible how the disks are behaving. Then, from worst to better:
As ZFS is quite paranoid about hardware, the less hiding there is, the more it can cope with any hardware issues. And as pointed out by Sammitch, RAID Controller configurations and ZFS may be very difficult to restore or reconfigure when it fails (i.e. hardware failure).
About the issue of standardized hardware with some hardware-RAID controller in it, just be careful that the hardware controller has a real pass-through or JBOD mode.
Q. If one happens to have some server-grade hardware at ones disposal, is it ever advisable to run ZFS on top of a hardware-based RAID1 or some such?
A. It is strongly preferable to run ZFS straight to disk, and not make use of any form of RAID in between. Whether or not a system that effectively requires you make use of the RAID card precludes the use of ZFS has more to do with the OTHER benefits of ZFS than it does data resiliency. Flat out, if there's an underlying RAID card responsible for providing a single LUN to ZFS, ZFS is not going to improve data resiliency. If your only reason for going with ZFS in the first place is data resiliency improvement, then you just lost all reason for using it. However, ZFS also provides ARC/L2ARC, compression, snapshots, clones, and various other improvements that you might also want, and in that case, perhaps it is still your filesystem of choice.
Q. Should one turn off the hardware-based RAID, and run ZFS on a mirror or a raidz zpool instead?
A. Yes, if at all possible. Some RAID cards allow pass-through mode. If it has it, this is the preferable thing to do.
Q. With the hardware RAID functionality turned off, are hardware-RAID-based SATA2 and SAS controllers more or less likely to hide read and write errors than non-hardware-RAID controllers would?
A. This is entirely dependent on the RAID card in question. You'll have to pore over the manual or contact the manufacturer/vendor of the RAID card to find out. Some very much do, yes, especially if 'turning off' the RAID functionality doesn't actually completely turn it off.
Q. In terms of non-customisable servers, if one has a situation where a hardware RAID controller is effectively cost-neutral (or even lowers the cost of the pre-built server offering, since its presence improves the likelihood of the hosting company providing complementary IPMI access), should it at all be avoided? But should it be sought after?
A. This is much the same question as your first one. Again - if your only desire to use ZFS is an improvement in data resiliency, and your chosen hardware platform requires a RAID card provide a single LUN to ZFS (or multiple LUN's, but you have ZFS stripe across them), then you're doing nothing to improve data resiliency and thus your choice of ZFS may not be appropriate. If, however, you find any of the other ZFS features useful, it may still be.
I do want to add an additional concern - the above answers rely on the idea that the use of a hardware RAID card underneath ZFS does nothing to harm ZFS beyond removing its ability to improve data resiliency. The truth is that's more of a gray area. There are various tuneables and assumptions within ZFS that don't necessarily operate as well when handed multi-disk LUN's instead of raw disks. Most of this can be negated with proper tuning, but out of the box, you won't be as efficient on ZFS on top of large RAID LUN's as you would have been on top of individual spindles.
Further, there's some evidence to suggest that the very different manner in which ZFS talks to LUN's as opposed to more traditional filesystems often invokes code paths in the RAID controller and workloads that they're not as used to, which can lead to oddities. Most notably, you'll probably be doing yourself a favor by disabling the ZIL functionality entirely on any pool you place on top of a single LUN if you're not also providing a separate log device, though of course I'd highly recommend you DO provide the pool a separate raw log device (that isn't a LUN from the RAID card, if at all possible).
I run ZFS on top of HP ProLiant Smart Array RAID configurations fairly often.
Why?
An example:
RAID controller configuration.
block device listing
zpool configuration
zpool detail
zfs filesystem listing
Typically you should never run ZFS on top of disks configured in a RAID array. Note that ZFS does not have to run in RAID mode. You can just use individual disks. However, virtually 99% of people run ZFS for the RAID portion of it. You could just run your disks in striped mode, but that is a poor use of ZFS. Like other posters have said, ZFS wants to know a lot about the hardware. ZFS should only be connected to a RAID card that can be set to JBOD mode, or preferably connected to an HBA. Jump onto IRC Freenode channel #openindiana ; any of the ZFS experts in the channel will tell you the same thing. Ask your hosting provider to provide JBOD mode if they will not give a HBA.
Everybody tells that ZFS on top of RAID is a bad idea without even providing a link. But the developers of ZFS - Sun Microsystems even recommend to run ZFS on top of HW RAID as well as on ZFS mirrored pools for Oracle databases.
The main argument against HW RAID is that it can't detect bit rot like ZFS mirror. But that's wrong. There is T10 PI for that. You can use T10 PI capable controllers (that at least all LSI controllers that I used are) Majority of enterprise disks are T10 PI capable. So if it is appropriate for you, you can build T10 PI capable array, create ZFS pool without redundancy on top of it, and just make sure you follow the guidelines regarding to your use case in the article. Though it is written for Solaris, IMHO it is also suitable for the other OS.
The benefits for me is that the replacing disk in HW controller is really easier ( especially in my case, because I don't use whole disk for zpool for performance reasons ) It requires NO intervention at all and can be done by client's staff.
The downside is that you have to make sure that disks you buy are actually formatted to support T10 PI, because some of them though capable of T10 PI but sold formatted as regular disks. You can format them yourself, but it's not very straightforward and potentially dangerous if you interrupt the process.
In-short: using RAID below ZFS simply kills the idea of using ZFS. Why? — Because it's designed to work on pure disks, not RAIDs.
For all of you... ZFS over any Raid is a total PAIN and is done only by MAD people!... like using ZFS with non ECC memory.
With samples you will understand better:
Where ZFS is good is in detecting Bits that changed when disk where without power (RAID controllers can not do that), also when something changes without been asked to, etc.
It is the same problem as when a bit in a RAM module spontaneously changes without being asked to... if memory is ECC, memory corrects it self; if not, that data had changed, so that data will be sent to disks modified; pry that change is not on the UDEV part, if the fail is in the VDEV part... the whole ZPOOL looses all its data forever.
That is a weakness on ZFS... VDEVs fails implies all data get lost for ever.
Hardware Raid and Software Raid can not detect spontaneous bit changes, they do not have checksums, worst on Raid1 levels (mirros), they read not all parts and compare them, they supose all parts will allways have the same data, ALLWAYS (i say it loudly) Raid suposes data has not changed by any other thing/way... but disks (as memory) are prone to spontaneous bit changes.
Never ever use a ZFS on a non-ECC RAM and never ever use ZFS on raided disks, let ZFS see all the disks, do not add a layer that can ruin your VDEV and POOL.
How to simulate such fail... power off the PC, took out one disk of that Raid1 and alter only one bit... reconect and see how Raid controller can not know that has changed... ZFS can because all reads are tested against the checksum and if does not match, read form another part... Raid never read again because a fail (except hardware impossible read fails)... if Raid can read it thinks data is OK (but it is not on such cases)... Raid only try to read from another disk if where it reads says "hey, i can not read from there, hardware fail"... ZFS read from another disk if checksum does not match as also as if where it reads says "hey, i can not read from there, hardware fail".
Hope i let it very clear... ZFS over any level of Raid is a toal pain and a total risk to your data! as well as ZFS on non-ECC memories.
But what no one says (except me) is:
So what disks to use?
But, hey, most people do not know all of this and never ever had a problem... i say to them: wow, how lucky you are, buy some lottery tickets, before lucky goes away.
The risks are there... such failures conincidences may occur... so the better answer is:
What i personally do?
Hope i could give a little light on ZFS against Raid, it is really a pain when things go wrong!