We are trying to troubleshoot a DL360 G7 with a P410i controller.
As mentioned in the thread title, the site tech pulled all disks and forgot what order they were pulled in. We were under the expectation that the server was a cold spare, but apparently the vendor who performed the installation mislabelled the host. It was one of our application servers.
When we had the tech reseat the disks, we are now getting an error along the lines of:
"Physical disks reordered. Previously failed disks 3,4 now operational.
Press f1 to continue with logical drive disabled. Press f2 to accept data loss and continue."
We contacted HP, and they're useless. Their statement is that recovery is probably impossible, and we can try the different permutations of moving the disks around but there's no guarantee the OS (RHEL 5) will boot. They're pretty sure the OS is hosed.
What I am trying to figure out...
- Why the heck would moving the disks around mean the OS is destroyed? All Array information is written to the first few sectors of the drive. The controller can look at it and say "Hey, this disk isn't in the right place. I'm going to freeze now." It seems like they write additional information to the disk marking it as failed which destroys its array association.
- Has anyone been through this before? Is HP correct? Is the OS wrecked? Is there any way to recover?
- HP is suggesting that pulling all disks at the same time, even with the server off, causes the controller to stop recognizing the array. But this doesn't make any sense because the array information is written to the disks themselves. That's why you could build an array on one server, then take all of those disks in the array and move them to a like server, and it should boot.
- If we put the drives in the correct order, will the server just boot without prompting us? We have about 30 permutations to work through. It will be very hard to tell if we hit the right combo or not if we keep getting told disks are failed.
Edit: Following up on this item. It appears we will be unable to recover with any major chance of success. Our issue was exacerbated by some additional problems.
- The drives were removed from the host and plugged into a different host out of their original order. This likely caused data corruption to that array information that cannot easily be recovered.
- When the drives were put back into the server, their order wasn't maintained. This likely caused further issues with the array metadata.
After some testing in our lab, it appears that using a raid 1+0 and swapping drive order does not seem to impact the array's functionality, even when the drive order is severely modified. This would suggest the issue really is related to array metadata being damaged when moving the drives between hosts.
Following up on this item. It appears we will be unable to recover with any major chance of success. Our issue was exacerbated by some additional problems.
After some testing in our lab, it appears that using a raid 1+0 and swapping drive order does not seem to impact the array's functionality, even when the drive order is severely modified. This would suggest the issue really is related to array metadata being damaged when moving the drives between hosts.