Ping a Specific Port

Question

Richard

Asked: 2012-03-15 14:05:34 +0800 CST2012-03-15 14:05:34 +0800 CST 2012-03-15 14:05:34 +0800 CST

Areca 1280ml RAID6 volume set failed

772

Today we hit some kind of worst case scenario and are open to any kind of good ideas.

Here is our problem:

We are using several dedicated storage servers to host our virtual machines. Before I continue, here are the specs:

Dedicated Server Machine
Areca 1280ml RAID controller, Firmware 1.49
12x Samsung 1TB HDDs

We configured one RAID6-set with 10 discs that contains one logical volume. We have two hot spares in the system.

Today one HDD failed. This happens from time to time, so we replaced it. Upon rebuilding a second disc failed. Normally this is no fun. We stopped heavy IO-operations to ensure a stable RAID rebuild.

Sadly the hot-spare disc failed while rebuilding and the whole thing stopped.

Now we have the following situation:

The controller says that the raid set is rebuilding
The controller says that the volume failed

It is a RAID 6 system and two discs failed, so the data has to be intact, but we cannot bring the volume online again to access the data.

While searching we found the following leads. I don't know whether they are good or bad:

Mirroring all the discs to a second set of drives. So we would have the possibility to try different things without loosing more than we already have.
Trying to rebuild the array in R-Studio. But we have no real experience with the software.
Pulling all drives, rebooting the system, changing into the areca controller bios, reinserting the HDDs one-by-one. Some people are saying that the brought the system online by this. Some are saying that the effect is zero. Some say, that they blew the whole thing.
Using undocumented areca commands like "rescue" or "LeVel2ReScUe".
Contacting a computer forensics service. But whoa... primary estimates by phone exceeded 20.000€. That's why we would kindly ask for help. Maybe we are missing the obvious?

And yes of course, we have backups. But some systems lost one week of data, thats why we'd like to get the system up and running again.

Any help, suggestions and questions are more than welcome.

2 Answers

Voted

cipy · Answer 1 · 2012-10-25T08:59:41+08:00

cipy

2012-10-25T08:59:41+08:002012-10-25T08:59:41+08:00

I think Option 1. is your best.

Take 12x new HDDs, 1x new RAID controller Try to mirror (dd if= of=) old disks to the new ones 1:1 using any linux box. Build a new server using the 1x new RAID controller plus the 12x new HDDs

Try to rebuild the array in the new server. Success? Great. Stop.
Rebuild failed? Mirror the old disks to new ones again, try Option i+1

2

Istvan · Answer 2 · 2012-10-05T21:32:57+08:00

This is a very common scenario unfortunately. There was a good Google study on this years ago, and it turns out that losing data with RAID can happen during rebuilding the array. This can impact different RAID systems with different severity. Here is the RAID6 scenario:

your array has 3 data and 2 parity disks.
if you lose one disk it is sure that all the data is recoverable.
if you lose 2 disks you lost data

Why is that?

Think about the following: let have some data, assume first 3 block of a file you have the following data blocks: A1 + A2 + A3 and the following parity: Ap + Ap sitting on hdd1...hdd5

If you lose any two disk between 1 and 3 you lost data because the data is not recoverable, you have 2 parity and 1 data block.

Now the same scenario with 10 disks might be different, but i guess it handled the same way that you split the data to 8 blocks and save the parity to 2 other drives and have 2 hot-spares. Do you know the details of your RAID controller configuration?

I would start to recover from offsite backup (I guess you have some), and the service is back try to recover as much data as possible, using Unix and dd the drives to images and using it as loop device for example.

http://wiki.edseek.com/guide:mount_loopback

You need to know what sort of metadata the RAID controller uses and if you lucky it is supported in some tool like dmraid.

But this does not mean you can recover data at all, since the files are distributed among many-many blocks usually, the recovery is likely to fail to bring back any of your data.

More about RAID:

https://raid.wiki.kernel.org/index.php/RAID_setup

Areca 1280ml RAID6 volume set failed

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?