There was a power failure recently that took down one of my servers. On reboot, the main storage filesystem - JFS on a 7TB (9x1TB RAID6) filesystem - needed an fsck before mounting read-write. After I started the fsck, I watched it for awhile in top - memory usage was rising steadily (but not too rapidly), and CPU usage was pegged at or near 100%.
Now, about 12 hours in, the fsck process has consumed almost 94% of the 4GB of memory in the system and CPU usage has dropped to around 2%. The process is still running (and gives no indication as to further running time).
First off: is this indicative of a problem? I'm worried by the fact that the CPU usage has dropped so dramatically - it seems almost as though the process has become memory-bound, and the fsck will take forever to complete because it's spending all its time swapping. (I noticed that kswapd0 is floating uncomfortably close to the top of the list in top, actually beating out the fsck process for CPU usage more than half the time.) If this isn't the case, if fsck just slows down CPU-wise near the end of the process, that's fine - I just need to know that.
If this is a problem, what can I do to improve fsck performance? I'm open to almost anything, up to and including "buy more memory for the system."
The relevant line from top:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5201 root 20 0 58.1g 3.6g 128 D 2 93.8 1071:27 fsck.jfs
And the result of free -m:
total used free shared buffers cached
Mem: 3959 3932 26 0 0 6
-/+ buffers/cache: 3925 33
Swap: 964 482 482
Correct me if I'm wrong, but JFS is not a full journaling file system: it only handles the metadata in the journal. This means that the fsck command will take a looong time to complete if you have lots of data.
I suggest you investigate the possibility to switch to a fully journaled file system (etx3/4): that should remove the need for the command to be run in case of abrupt failure.
Based on the virtual memory usage, I figured it'd be impossible to run a full fsck on the volume in any reasonable amount of time (even with extra RAM), so I backed up all the files on the volume and reformatted with XFS.