We have a server on which a raid 1 disk is trying to rebuild or sync. The disk is resyncing but the server has become unresponsive. One cannot ssh into the box nor are any of the services responding. If you are on the local lan you can ssh in but it is extremely slow so as to be useless.
What could be causing this problem? We recently rebuilt the machine with new disk when it went down about a month ago. We needed to increase the disk size anyway so took the opportunity to do so. Now I am not sure if there is some kind of hardware failure. They are sata disks using software raid.
Usually if one disk or controller goes then the 2nd disk continues to operate. Not sure what is happening now.
Any help appreciated.
Your resync speed is too high for your disk IO capabilities. Run
echo 1000 >/proc/sys/dev/raid/speed_limit_max
and you should see a quick return to responsiveness. Once that's under control, tune that speed limit to a suitable level for your hardware.May be your partitions are not correctly aligned. I had a system with two WD1000EARS disk that had partitions not aligned. It had your same problem. I repartitioned the disk, creating partitions with parted and checking alignment. Rebuild speed jumped to 60-70MB/s and system was very responsive. Load was notably lower and waiting for i/o cpu load was very low compared to the previous situation.