A provider(data center) recommended I go with 1TB SSDs in a software RAID 1 over HW RAID 10 with mechanical drives.
Their quote:
Typically SSDs are most reliable than RAID cards and since you have less parts, there are less points of failure. There won't be much of a CPU load since RAID1 is extremely simple storage.
How true is that and when running virtual machines is RAID 1 SW even ideal? They say so.
Some more details: I plan to run XEN/XEN-HvM/KVM -- in other words, it will be Linux running as the HOST and I want a setup where the guests can host Windows to Linux and can compile their own kernels.
What I want to accomplish: To be able to quickly recognize a drive failure and have a replacement thrown in with little to no downtime or performance hits.
It depends on the drives, the disk controller, the type of SSD, the RAID implementation, the Operating System(s) involved, the server, monitoring ability, whether you have out-of-band access to the server, etc.
Edit: you'll be on Linux + KVM.
Envision a drive failure of a hardware RAID solution that takes out one disk. You receive an alert and have the drive hot-swapped. Easy.
Imagine a software RAID SSD drive failure that goes undetected (no explicit monitoring) and requires downtime or may be more of an involved process to remediate.
Nothing precludes you from using SSDs with hardware RAID, correct?
But it all depends...
I would push for SSD with hardware RAID if you need SSD performance. I wouldn't necessarily want to boot off of software RAID, but that's your choice. For virtualization, you'll probably have a mix of random read/write activity. Hardware RAID's caching can be helpful. If this is a datacenter, you may not have to worry about sudden power-loss, though.
In RAID10 any one of your drives can fail and the array will survive, the same as RAID1. While RAID10 can survive four of the six "two drives failed at once" circumstances the main reason to use R10 with four drives instead of R1 with two is performance rather than extra reliability, and the SSDs will give you a greater performance jump.
Early SSDs had reliability issues, but most properly run tests I've seen suggest that those days are long gone and they tend to be no more likely to fail than spinning metal based drives - the overall reliability has increased and wear levelling tricks are getting very intelligent.
I'm assuming you are running the RAID array on the host, in which case unless you have a specific load pattern in your VMs (that would be a problem on direct physical hardware too) the difference between soft RAID and hard RAID is not going to be dependent on the use of VMs. If you are running RAID inside the VMs then you are likely to be doing something wrong (unless the VMs are for learning or testing RAID management of course).
The key advantages of hardware RAID are:
The key advantage of good software RAID (i.e. Linux's mdadm managed arrays) is:
SSD over-provision space for two reasons: it leaves plenty of blocks free to be remapped if a block goes bad (traditional drives do this too) and it stops the write performance hole (except for huge write-heavy loads) even where TRIM is not used as the extra blocks can cycle through the wear levelling pool along with all the others (and the controller can pre-wipe them ready for next use at its leisure). Consumer grade drives only really under-allocate enough for the remapping use and a small amount of performance protection, so it is useful to manually under-allocate (partitioning only 200GiB of a 240GB drive for instance) which has a similar effect. See reports like this one for details on this (that report is released by a controller manufacturer but seems a general description of the matter rather then a sales pitch, you'll no doubt find manufacturer-neutral reports on the same subject if you look for them). Enterprise grade drives tend to over-provision by much larger amounts (for both the above reasons: reliability and performance).
Speed vs Reliability imo
Most raid controllers do NOT fully support SSD's, or they only support a specific brand of ssd (see Dell perc 6xx's). Also, Friends don't let friends SR... Unless its their home gaming system.
(HW raid + ssd raid 1) vs (HW raid + physical disks raid 10)
The speed difference between SSD's (when fully supported by the raid controller) and HD's, is like comparing formatting floppy drives vs formatting usb sticks. One takes 3 min, the other takes 3 seconds. So if you need that kind of speed go with the ssd's...and make sure you have a good backup. If not, use physical disks, and have a good backup. ;-)
Which solution did you go with? Yes, SSDs are fast, and they give you real boost in performance if you use them for specific purpose e.g. host database server. I support a number of servers running with SSDs in Linux software RAID1. They all work OK except one. On that one server, RAID repeatedly reports disk failure for one of SSDs (randomly, not always the same disk (disk1 / disk2)). So far, I was unable to identify why. Also, consider how will host OS see these two SSDs, because there could be an issue with replacing disk (you would not be able to do hot swop). Can you hot swop disk in software raid if disk is also used for OS?
On the other hand, old school network storage with enclosure, good RAID controller and large number of disks (in RAD10) gives you peace of mind. Hot swop of failed disk is a must for production servers.
What ever you do, remember to keep regular backups to a separate hardware. It was said many times before "RAID is not replacement for backup".
Have you looked at ZFS on Linux?
The cloud provider Joyent uses KVM on a custom OpenSolaris kernel with ZFS underneath. You could run your Linux host with an industrial strength filesystem (ZFS) and software RAID and not have to use all SSDs for speed.