4-Drive Software RAID1 vs RAID10 heading tells you what I am contemplating about.
Hardware: 2x 1TB Enterprise-class HDDs + 2x 1TB Consumer-class HDDs.
OS and Software: Linux Debian Jessie (stable) with mdadm
.
Intended purpose: Extreme reliability storage. Cannot afford data loss. Such thing would simply be unacceptable. That is why I am considering RAID1 instead of RAID10, because fault tolerance with RAID1 should be 3-drive failure.
I see one downside: limiting the global storage size to 1/4. Crazy.
Apart from this decision RAID1 vs RAID10, which I probably already made, RAID1 that is, unless you advise me otherwise, I have a question regarding RAID1:
Supposing I am limited to 4 drives, I would have limited posibilities with RAID10, as opposed to RAID1, where I could define 3 drives active and 4th as a spare. Either that or directly define active 4 HDDs.
Please tell me what you think?
In such a setup (4 disks and RAID1 only), it is better to directly use the 4th disk as an array member rather than as a spare.
Using it as a spare will not buy you anything on the redundancy side, while using the 4th disk as a full array member increases your redundancy from 3 to 4 copies, enabling you to survive 3-disks failures.
Anyway, if you are so much concerned about data redundancy/availability to afford to lose 3/4 of your raw space, you are probably approaching the problem from the wrong side.
Remember: RAID is not a backup!!!
Rather then increasing your RAID1 setup over 3-way mirror, please be sure to have a strong backup/recovery plan.
If you are looking for a system with high availability and you are worried about crashing drives, the RAID1 is surely the best solution. But if you want more space, a RAID6 might be a compromise. You "loose" the space of two disks to parity, but you are save for up to two failed disks as well.
If high availability is really your concern, you should maybe think about a synced second server as well. If data loss is your concern, than you should primarily make sure to have a good backup. A RAID is never a substitute for backups, since it only secures against failed disks, not against accidental deletion of data, malware or attacker encrypting or deleting your data and so on.
Agreed, if you do have so strict requirements for storage, I would also recommend to go multinode approach. Currently we're running 2 nodes backup repository with RAID 10 arrays on each of the server. Looks stable and redundant.
If you're so incredibly concerned about data availability and integrity, and you're willing to do something like a four member RAID-1 to get it, then you should probably be looking at redundancy at the node level.
No matter how many disks you put in a controller, there is still a single point of failure, and that is the machine itself. Rather than worrying about packing in redundancy at the RAID level, you could implement something like DRDB, GlusterFS, or Ceph.
DRDB would act more like a network RAID-1, to describe it simplistically. Gluster and Ceph can behave this way as well, but can also scale massively by both replicating to nodes and distributing data across replica sets.
You can still implement RAID at the node level using these types of storage, but with these inter-node replicating systems it becomes much less of a concern and reduces scalability in larger deployments. It's also easy to take an entire node out of the cluster, fix it, and then put it back in. In storage clouds, RAID is being used less and less often for these reasons.