Ping a Specific Port

Question

Jim Hunziker

Asked: 2009-05-04 17:05:43 +0800 CST2009-05-04 17:05:43 +0800 CST 2009-05-04 17:05:43 +0800 CST

Relative failure rates for hardware components

772

Let's say I'm setting up a single machine server. Without knowing the specific components in it (and being able to look up their MTBFs), what are the typical relative failure rates of the hardware components in the server?

Equivalently, what are the rankings of the most often-replaced components across all the servers in corporate use?

8 Answers

Voted

Eddie · Answer 1 · 2009-05-04T19:26:06+08:00

About hard disks, many people misunderstand the MTBF and think a drive with a MTBF 100,000 hours will last, on average, for 11.5 years. What the manufacturer means is that in a collection of a large number of drives, N, all within their lifetime, that one drive will file for every 100,000/N hours. If you have 100,000 drives that each have a MTBF of 100,000 hours, then you should expect a drive to fail -- on average -- every hour.

Hard drives fail more often than people expect. Back up, back up, back up.

Anything with moving parts can fail, including tape drives, floppy drives, fans, and so on. I've had the fan on graphics cards die, causing the death of the graphics card. I've had the power supply fan die, causing most of the parts of the computer to die. (Since then I've never built a system without extra fans.) Tape drives require extra care, or their lifetimes will be significantly shortened. This is because not only does it move, but the tape head makes physical contact with the tape media -- at least in many kinds of tape drives. Cleaning the drive too often with ordinary tape cleaning media will wear away the tape heads.

I've had the built-in chipset fans die, but so far without any effect. So far I've never had a CPU fan die, but I tend to upgrade often enough that I probably avoid this via upgrades. (grin)

I replace my disk drives every several years (mostly because the capacity available increases so rapidly), so have experienced relatively few hard drive failures. I've had many power supplies fail -- many more than I would have naively expected for a component with no moving parts other than the fan. I assume that power irregularities are the cause of many power supply failures.

So far, in a few decades of computing, I have never had a CPU or RAM or motherboard fail unless there was a reasonable cause, such as overheating (fans dying). However, a few brands of motherboards over the years have had much shorter lifetimes than expected due to sub-par parts, often incorrectly manufactured capacitors where power enters the motherboard.

Anywhere that you have a plugged-in connection is a point of failure. I've had computers fail (mostly long ago) due to cheap tin-plated connectors. The tin oxidized and over time the connection because less and less reliable. Eventually I unplugged everything, took an eraser to the tin connectors to remove the oxidation, plugged everything back in, and was up and going for a while longer. Gold connectors are the connector of choice for a reason.

From what I've seen in a corporate environment, with my home experienced mixed in, components seem to fail in this order, from most to least frequently.

Hard drives and tape drives
Power supplies
fans
distantly, everything else

Not mentioned above, but you should expect all flash memory sticks/cards to eventually die, depending on frequency of use. But it will take a long time given the average use of most such cards. Flash memory "wears out" with use and memory cells will eventually fail.

Portman · Answer 2 · 2009-05-04T19:44:07+08:00

Portman

2009-05-04T19:44:07+08:002009-05-04T19:44:07+08:00

Anecdotally, batteries.

I have no hard data, but I have replaced more failed or under-performing batteries in my life than any other component. This includes uninterruptible power supplies, laptops/notebooks, controller batteries, mobile phone batteries, and probably a lot of others.

This has led me to always stock an extra battery pack for a server room's UPS.

5

Kyle Cronin · Answer 3 · 2009-05-04T17:23:32+08:00

Kyle Cronin

2009-05-04T17:23:32+08:002009-05-04T17:23:32+08:00

Anything that moves, which in a server is basically hard drives and fans, will fail much more often than solid-state components. Power supplies are a distant, but notable, second. Everything else (cpu, memory, etc) is pretty reliable... which is not to say immune to failure, but definitely should be worried about after you've got your disk/fan/psu bases covered.

4

womble · Answer 4 · 2009-05-04T17:09:02+08:00

womble

2009-05-04T17:09:02+08:002009-05-04T17:09:02+08:00

Hard Drives
Everything else

Best to keep spares of everything on-site, though, unless you're OK with whatever downtime your hardware vendor decides to give you.

2

Brad · Answer 5 · 2012-10-20T10:16:25+08:00

Brad

2012-10-20T10:16:25+08:002012-10-20T10:16:25+08:00

Just researching this for my company today, I found a summary of one of microsoft's whitepapers at extremetech.com with this chart for an 8 month period:

failure rates w/ underclocking

The rated column was a decent reference for my calculations of the value of Dell's hardware warranties (we're just going to invest in extra hardware instead).

The full whitepaper is here: http://research.microsoft.com/apps/pubs/default.aspx?id=144888

2

Eric Z Beard · Answer 6 · 2009-05-04T17:36:01+08:00

Eric Z Beard

2009-05-04T17:36:01+08:002009-05-04T17:36:01+08:00

You will see more problems with the firmware and drivers for the hardware than you will actually see physical failures (at least early in the device's lifetime), so make sure those are up to date and tested first.

SATA drives will usually be the first to go. SAS tends to be more reliable. (Although I've heard good things about the latest SATA 2 drives)

0

Mikeage · Answer 7 · 2009-05-04T18:02:41+08:00

Mikeage

2009-05-04T18:02:41+08:002009-05-04T18:02:41+08:00

Hard disks
Power supplies (all too common)
Things you plug in and out (more common for desktops than servers)
Everything else, especially after the power supply dies and takes things with it...

Once upon a time, CPU fans also used to be on the list; lately, I can't remember the last time I saw one stop working, but it's a possibility, especially in a dusty environment.

0

jldugger · Answer 8 · 2009-05-04T18:29:41+08:00

jldugger

2009-05-04T18:29:41+08:002009-05-04T18:29:41+08:00

Google has published a paper, "Failure Trends in a Large Disk Drive Population", about failure statistics for a wide set of drives. The main take away is that disks fail above and beyond what the MTBF would suggest. Disks are easily the most failure prone in the server room.

0

Relative failure rates for hardware components

Ping a Specific Port

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What's the command-line utility in Windows to do a reverse DNS look-up?

How to check if a port is blocked on a Windows machine?

What port should I open to allow remote desktop?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?