I have a server with 3 hard drives installed, and a total capacity of 6. We're planning to max it out, but our consultant also suggested getting a second RAID controller "for redundancy" to support the new drives. To me, this doesn't make much sense. Even with a second RAID controller running half of the disks, we're still stuck with only half of our disks/programs/data if one of the controllers dies (which isn't much better than running with none). We're putting vmware on the server and he vaguely mentioned some advanced fault tolerance/failover features, but if the disks are inaccessable due to a failed controller, how is it supposed to work?
Counting only reasons for redundancy, not performance, why would I want to have a second RAID controller in my server?
In a 'single box high availability' design then yes, you'd want a second controller, ideally on a second bus too. But this kind of approach has given way to a cheaper design based around clustering where one box failure doesn't stop service. So it depends on if you plan to use a clustered environment or rely on a single box. Even if your answer is the latter having dual controllers may be seen as adding extra complexity and maybe being overkill.
edit - based on your comment about using ESXi on your other question I'd have to say that its clustering is fabulous, we have many 32-way clusters that work brilliantly.
A second RAID controller which is actively used is not for redundancy. Only if it is a cold-stand-by controller where you switch all your disks to when the first one dies. Then you have redundancy (for the controller). But beware of doing so, as posted here.
So the RAID is for redundancy of disks leading to a single point of failure at the controller. Having a second (unused) controller may solve this as you could switch all the disk to the new one. If this works depends on other factors...
I'm no native speaker, but for me "fault-tolerance" is something different than "redundancy". Can some English speaker help me out here?
On a single box, you actually need two RAID controllers, connected to two different PCI-E root complexes, to have complete I/O subsystem redundancy. This can be achieved by two different configuration:
A key problem with both approach is that you do not have full system redundancy: a motherboard/CPU problem can bring down the entire system, independently from how much controllers/disks you have.
For this reason, this kind of redundancy-in-a-box is seldom used lately (apart that in mid/high-end SAN deployments); rather, clustering/network mirroring is gaining wide traction. With clustering (or network mirroring) you have full system redundancy, as a single failed system can not negate data access. Obviously clustering has its own pitfalls so its not a silver/easy bullet, but in some situation its advantages can not be negated. Moreover, you can also use asynchronous network mirroring to have an almost-realtime data redundacy on geographically different location, so that a single catastrophic event will not wreak havoc on your data.
You'd need dual-ported SAS drives to provide actual failover on multiple controllers. While these do exist, it is decidedly uncheap - not in the price range of a single server that only has internal storage.
These are technologies often employed in SAN systems, where controller death is a real issue.
For a single server with no other failover capabilities, a second controller will not gain anything - it will just cost more money and provide the consultant with more profit.