Ping a Specific Port

Question

Adam Davis

Asked: 2009-05-02 09:19:52 +0800 CST2009-05-02 09:19:52 +0800 CST 2009-05-02 09:19:52 +0800 CST

Testing RAID

772

How does one fully evaluate a RAID configuration?

Pulling drives is one thing, but are there tools and techniques for more?

I've considered putting a nail through a running drive (powder actuated nailgun) to see what would happen, or simulating various electrical anomalies (shorts/opens in cable, power overloads and surges, etc).

What should be tested, and how?

-Adam

2 Answers

Voted

Tom Ritter · Answer 1 · 2009-05-02T09:44:32+08:00

In drives where hot-swap isn't an option, many raid controls (e.g. mdadm on linux) have a set-faulty command that simulates a drive failing.
In drives where hot-swap is okay, yank a drive!

I think your testing should cover the reasonable cases that you plan for. If you're trying to set up a server in the bush, then electrical fluctuations are reasonable test suites. If you're in a data center, the Service Agreement probably covers power.

If you think a drive wildly exploding inside a rack is reasonable - then test it. Maybe you're setting up a server in a command center in Baghdad. But once again, less likely if you're in Washington State.

As a general rule, your tests should cover all expected cases:

Drive is old and eventually goes bad (find a drive on its last legs, get it running, then pound it till it fails)
Drive fails a smart test but seems fine but you want to replace it just-in-case
General drive replacement because of size/performance upgrade or you just heard the batch was bad

And reasonable extreme cases.

Server suddenly losing power - okay.
Server itself being hit by lightning - not so much.
Rack falling over - okay.
Rack hit by truck - not so much.
Drive being jostled - okay
Drive being shot-putted - not so much.

And most importantly - RAID doesn't protect against drives silently corrupting data! So make sure you're doing hashes and file verification!

carlito · Answer 2 · 2009-06-02T12:01:36+08:00

It is indeed important to test a drive failing inelegantly if you care about the ultimate reliability of the overall solution. Every failed RAID solution (meaning the redundancy does not protect against failing drives) I have seen is due to the failure to test real drive failures. The normal test is to pull a drive, claim that drive failure has been tested, and move on.

The best solution is probably to have a collection of marginal drives, or modified firmware that causes inconsistent responses. Only storage vendors are reasonably likely to have this capability.

I like the idea of putting a nail through a running drive, but the forces on adjacent drives might result in an unrealistically catastrophic failure. Or the complete failure of the drive may result in an unrealistically clean failure.

If I was allowed to do legitimate testing of a RAID, I would destroy a few drives with varying means. Hook up wires to random components on the drive's board and fry them or short them. Indeed put a nail through a drive if the geometry of the enclosure makes this unlikely to destroy adjacent drives. (I think the resulting jostling of the remainder of the array is a reasonable test). Intercept a drive's data path and return every possible error, nonsensical results, or correct results delayed by random amounts of time.

Expect drives to return the wrong block sometimes. Expect drives to cause any conceivable electrical problem on their connection.

My experience is that no one considering a storage purchase wants to do real testing. This could expose real problems. I'd be very interested to hear if there is anyone who actually tests storage reliability - certainly they are not publishing their results.

Testing RAID

Ping a Specific Port

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What's the command-line utility in Windows to do a reverse DNS look-up?

How to check if a port is blocked on a Windows machine?

What port should I open to allow remote desktop?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?