My webhost offers two different types of high availability options for dedicated servers:
Redundant hard disks (RAID1)
Redundant hard disks (RAID1) plus redundant power supply
How common is a power supply failure in comparison to hard disk failure? I know it's not possible to know the exact figures without knowing the exact hardware, but ballpark figures are good enough for me at the moment.
Thanks,
Adrian
One of the biggest factors here is the conditioning of the power before it gets to the power supplies. Server type hardware tends to be protected by UPS's and this generally seems to extend the life of the power supply because it gets a much cleaner sine wave and is generally subjected to far fewer hiccups etc. Most often, the power supply (and most of the fans) in a server will fail when the server is being power cycled. Power supplies and fans that have been running non-stop for years will suddenly give up the ghost as soon as they are powered off, and refuse to power back up. If a power supply is failing while it is still active, it can cause the server to seemingly randomly freeze or otherwise act strange and stop responding.
Hard drives seem to fail randomly, and will little or no notice. RAID1 is a decent solution (RAID6 is better, you get more capacity out of your drives, and can withstand two failures at once). The issue with RAID is you need to have identical drives to replace the failed ones with, and these can be hard to find after the fact, so it is recommended that you buy the replacement drives along with the original drives, and have them on hand. When renting dedicated servers from web hosts, they will claim to have done this, but at some point they will build their newer servers with different drives and eventually run out of the drives your array uses, which could mean you are out of luck for a hot-swap when your time comes. Also, Google has done extensive research on hard drives, they found that drives either die almost right away (within the first month or so) or last for a few years, however identical drives can tend to fail around the same time, this is where RAID6 shows it's advantage. (The disadvantage is that it requires more drives, and a more expensive controller)
If you can afford it, get redundant everything. If you can't, you should ask your self again if you can really afford not to.
I generally see hard drives fail a LOT more than power supplies. In any given year I'll likely replace 20 or 30 hard drives and maybe only 3 or 4 power supplies. One thing to note about redundant power supplies, if they're just redundant its nice, if they're load balancing + redundant its amazing. Other than a pair of direct lightening hits, at different clients, I've never had to worry about redundant/load balanced power supplies going bad. Theres something about the load balancing that gets done that really helps keep power supplies healthy. I've often replaced entire servers before replacing power supplies when they're load balanced.
In my 15 years experience with Dell and HP servers, I can tell you that you can measure number of drives failed per year, while on the other hand, you can measure number of years between power supply failures.
Failure can occur at any moment- and without knowing the manufacturer of the power supplies and/or hard disks, it's impossible to know for sure. Even knowing the figures- they are only ever averages- failure can occur at any moment. You need to evaulate the options and decide if they are worth the cost(s).
It depends on how much downtime you can afford.
If a single hard drive goes down, in RAID1 you are still up and running.
If a single power supply goes down, your server is down until the supply is replaced.
Yes, power supplies in decent data centers tend to last a long time, possibly exceeding server life. It still is a risk though. Additionally, there is a question of how long will the replacement take -- does the datacenter stack the spares, or will it have to be ordered or something.
There are availability estimates for those kinds of things. I would think the datacenter should just provide you an estimate of % of uptime you get for the price, rather than try to involve you in a technical discussion of how to achieve it.
In general, however, if you can afford to have unexpected downtime of server from 30min (if data center stacks the supplies) to up to 3-5 business days(or whatever it takes to get a new one ordered and delivered), then no need for extra power supply. If you'd rather not, consider, if the price is right for having secured yourself against that. In general, extra power supplies shouldn't be very expensive.