We replaced a failed drive in Oracle Exalytics X4-4 machine. Failed drive was replaced fine and rebuild started. But when rebuild reached at 70%, the main disk got a bad sector and the rebuild failed. I tried rebuilding manually in megacli but it failed again. Oracle says that the RAID 1 volume has corrupted and the only option remains is to rebuild entire server. Server is still running and is in degraded mode. Is there any chance to survive from this situation? Can entire server rebuild be avoided? Need help....
LSI RAID controllers should let the user rebuild a RAID1 array with an uncorrectable read error on source drive, resulting in a punctured array. This, however, can be implementation dependent (ie: the firmware and utils of your Oracle box may not support it). Are you sure that you can not rebuild not even using
megacli
?If you can't really rebuild the array, the suggested plan is to backup all your data, destroy the array, recreate it and reload all data. If, and only if, this is not possible, you can try to attach the original disk to a spare machine and, from here,
ddrescue
it into a new identical disk. Then, use the newly cloned disk to boot your Oracle box, rebuilding the array into a third disk.Disclaimer: this will cause downtime and any error can led to complete data loss; don't even think to try it without recent backups and good understanding of the problem.