I have a development ERP server here in my office that I assist with support on, and originally the DBA requested a single drive setup for some of the drives on the server. Thus the hardware RAID controller (an HP embedded controller) looks like:
- c0d0 (2 drive) RAID-1
- c0d1 (2 drive) RAID-1
- c0d2 (1 drive) No RAID <-- Failed
- c0d3 (1 drive) No RAID
- c0d4 (1 drive) No RAID
- c0d5 (1 drive) No RAID
c0d2 has failed. I replaced the drive immediately with a spare using the hot-swap, but the c0d2 continues to mark itself as failed, even when I umount the partition. I'm loathe to reboot the server since I'm concerned about the server coming back up in rescue mode but I'm afraid that's the only way to get the system to re-read the drive. I assumed there was some sort of auto-detection routine for this, but I haven't been able to figure out the proper procedure.
I have installed the HP ACU CLI utilties, so I can see the hardware RAID setup.
I'd really like to find out what the proper procedure should have been, where I went wrong, and how to correct it now.
Obviously this goes without saying I should NOT have listened to the DBA and set the drives up as RAID-1 throughout as was my first instinct. He wasn't worried about data loss, but it sure would have been easier to replace the failed drive. :)
As there's no fault tolerance setup, the array can't initiate a repair, which might be leading to the strange state. Have you got the ProLiant support pack installed - can you access the Array Configuration Utility (ACU), or the System Management Home Page (default http://server:2301)? If you can, you'll be able to see the exact state of the array, and more than likely remediate the problem.
If there's no data of value on the drive, you will just need to reboot and choose the auto-rebuild or default option from the P400 raid controller's BIOS screen. The array will come back and the new drive will be recognized.