I've got several systems with an ext3 lv /
that work just fine until fsck'd — then they are unrecoverably corrupted.
What hope do I have of repairing these systems, and, separately, what went wrong?
These are all old systems that began as 2.6 centos-ish boxes with several separate ext3 logical volumes: /
, /var
, and /unused
. They were migrated to a modern Linux running kernel 3.4 by installing on the /unused
partition and then booting to that new installation. Once running, the old /
and /var
were lvremove'd, and the new root was renamed and lvextend'ed to absorb the space. From what I've been able to gather, the new root was resize2fs'd live after the lvextend. (This might be the root of the problem.)
They all run fine until an fsck is forced, at which point the fsck complains mightily and renders the system unbootable (panic). Lots of errors like:
Inode 12345 has INDEX_FL flag set but is not a directory
Inode 67890, i_blocks is 1307617, should be 0.
Inode 34567, i_size is 5616670468207675, should be 0.
... and on and on, followed by lots of multiply claimed inodes, sometimes with ...
Error storing directory block information (inode=76543, block=0, num=98765432): Memory allocation failed
For context, the original partitions were created under CentOS' e2fsprogs-1.39-20, the resize2fs'ing under 1.42.9-4, and the current system is at CentOS' older (don't ask) 1.41.12-12.
To explicitly answer your questions:
Since the file systems work before the fsck, I'd extend the VG with a new physical volume (an actual new, reliable hard drive), define a new LV on the new PV, copy over and retire the old drive(s), or at least run the manufacturer's diagnostics, wipe & reformat.