I have a software RAID 5 partition on LVM in Ubuntu (desktop, actually, but I'm using it as a server). I have been rsyncing a ton of data to it, and the computer was hard freezing, as in I needed to press "Reset".
So I thought it was rsync. But I decided I'd try a dd if=/dev/zero of=/path/to/raid5 and sure enough, the computer locked up. Did an identical dd to a JBOD partition on the same machine, and it didn't crash.
Assuming a clean RAID5 partition, tri-core processor 2GB of ram, 6GB swap, what could be causing this?
Edit: I've ruled out memory; I ran an 8 hour memtest without a crash.
04/26/2011 Edit: I've ruled out Ubuntu alone; the error occurred in Debian 6 stable. It's either hardware or an upstream bug.
Yeah, test your RAM. Try testing plain IO more heavily. Other than that, try to get a repeatable scenario and open a bug on launchpad.net
Assuming you're using software RAID5 through LVM (you don't say what's providing the R5) this could be a sign of a kernel bug. R5 requires parity calculation, which consumes CPU resources. If that goes high enough, the kernel might run into some unresolvable contention issues. This is just a guess, though.
Is the RAID array everything in the server (including
/
and so forth) or seperate? If separate, can you see anything in the logs just before the hard hang? Also, could you confirm that it was a complete hang - could you ping the machine over the network at all and so forth?The differences between writing out to a JBOD array and a RAID5 array are that the drives are accessed more evenly in the case of RAID5 and more CPU time will be used (for the parity calcs). If it were a problem with one of the drives I would expect it to fall out of the array rather than the machine hanging though, unless the problem is such that the drive controller fell over and took the machine's I/O controller with it. The first thing I'd do here is a full memory test, and make sure the CPU cooling is working as it should (the parity calcs for RAID5 will not impose any significant load on a modern CPU on its own, but may tip it over the edge if it is running close to trouble already).