Recently I've seen the root filesystem of a machine in a remote datacenter get remounted read-only, as a result of consistency issues.
On reboot, this error was shown:
UNEXPECTED INCONSISTENCY: RUN fsck MANUALLY (i.e., without -a or -p options)
After running fsck as suggested, and accepting the corrections manually with Y, the errors were corrected and the system is now fine.
Now, I think that it would be interesting if fsck was configured to run and repair everything automatically, since the only alternative in some cases (like this one) is going in person to the remote datacenter and attach a console to the affected machine.
My question is: why does fsck by default ask for manual intervention? How and when a correction performed by such program would be unsafe? Which are the cases when the sysadmin might want to leave a suggested correction aside for some time (to perform some other operations) or abort it alltogether?
fsck
definitely causes more harm than good if the underlying hardware is somehow damaged; bad CPU, bad RAM, a dying hard drive, disk controller gone bad... in those cases more corruption is inevitable.If in doubt, it's a good idea to just to take an image of the corrupted disk with
dd_rescue
or some other tool, and then see if you can successfully fix that image. That way you still have the original setup available.You have seen one example where
fsck
worked, but I've seen more then enough damaged file systems where it did not work successfully at all. If it would work fully automatic, you might have no chance to do things like add
disk dump or something like that which in many cases would be an excellent idea to do before attempting a repair.It's never, ever a good idea to try something like that automatic at all.
Oh, and modern servers should have remote consoles or at least, independent rescue systems to recover from something like that without lugging a KVM rack to the server.
First of all, you need to understand that with modern (journalized) filesystems, a system crash will not corrupt the filesystem and no fsck will be required at boot time.
Ext3, Ext4, ZFS, btrfs, xfs and all modern FS are 100% consistent after a crash or system reset.
Non journalized FS like ext2 or vfat are a big NOGO for a system rootfs.
Now, if your system requires a fsck at boot time, you should ask yourself: what was the reason for this in the first place?
You should investigate your kernel logs afterwards to find out, when and what did happen. You should also go back in time in the logs to find since when the error did start. You should check your disks with smartctl. Etc... If you need a fsck on a journalized fs, it is virtually certain that your hardware is failing, assuming the fs was not damaged by an admin (with block-level tools like dd) or by a bug.
So it is silly to use fsck to "fix" the problem without investigating and fixing the root cause (by replacing/upgrading the faulty hardware/firmware/software).
Doing a fsck, completing the boot and being happy is naive to say the least. Stating "I've had fsck work a greater percentage of the time than what you quote" is making me wondering what you mean with "fsck work". fsck may have brought back your fs to a consistent state by loosing some files and data in the process... Did you compare with a backup? Many people loose files or get file data corruption without noticing...