I have a IBM System x3650 server with a ServeRaid controller and two RAID5 arrays, each consisting of 3 disks.
Yesterday, one disk failed (It was the Raid array that holds the data, the system is located on the sound array). I naively trusted the RAID controller in rebuilding the array. I shut down the server, replaced the failed disk with a new similar. I booted in the controller bios, where I could see that it recognized the new disk and was ready to rebuild (I had nothing to do, everything was automatic). I started the server and it rebuilt the array.
This morning everything seemed OK. The rebuild was finished, the array seemed sound. Only a few hours later, the mysql service crashed with a corrupted database. I managed to dump the data partially and restored the rest from backup. I thought I was OK.
But then I found that some active logfiles were corrupt: they included blocks from different random files. If I appreciate the situation correctly, only files modified since the rebuild has started are corrupted, but I'm not yet 100% sure for this. Somehow, the rebuild must have corrupted the data.
I ask this question to learn out of error. I hope the next time will be never...
What can be the reason that the rebuild failed ? What can I do better next time ?
Is it compulsary to cut the server from the network during rebuild ? I thought, the controller should manage concurrently rebuild and make ordinary reads and writes.
Or shouldn't this never happen, and maybe the controller is faulty ?