Today we hit some kind of worst case scenario and are open to any kind of good ideas.
Here is our problem:
We are using several dedicated storage servers to host our virtual machines. Before I continue, here are the specs:
- Dedicated Server Machine
- Areca 1280ml RAID controller, Firmware 1.49
- 12x Samsung 1TB HDDs
We configured one RAID6-set with 10 discs that contains one logical volume. We have two hot spares in the system.
Today one HDD failed. This happens from time to time, so we replaced it. Upon rebuilding a second disc failed. Normally this is no fun. We stopped heavy IO-operations to ensure a stable RAID rebuild.
Sadly the hot-spare disc failed while rebuilding and the whole thing stopped.
Now we have the following situation:
- The controller says that the raid set is rebuilding
- The controller says that the volume failed
It is a RAID 6 system and two discs failed, so the data has to be intact, but we cannot bring the volume online again to access the data.
While searching we found the following leads. I don't know whether they are good or bad:
Mirroring all the discs to a second set of drives. So we would have the possibility to try different things without loosing more than we already have.
Trying to rebuild the array in R-Studio. But we have no real experience with the software.
Pulling all drives, rebooting the system, changing into the areca controller bios, reinserting the HDDs one-by-one. Some people are saying that the brought the system online by this. Some are saying that the effect is zero. Some say, that they blew the whole thing.
Using undocumented areca commands like "rescue" or "LeVel2ReScUe".
Contacting a computer forensics service. But whoa... primary estimates by phone exceeded 20.000€. That's why we would kindly ask for help. Maybe we are missing the obvious?
And yes of course, we have backups. But some systems lost one week of data, thats why we'd like to get the system up and running again.
Any help, suggestions and questions are more than welcome.