I look after an old Debian linux box (running etch) with only 512 MB of RAM, but a lot of external storage attached. One ext3 filesystem is 2.7 TB in size, and fsck can't check it, because it runs out of memory, with an error such as this one:
Error allocating directory block array: Memory allocation failed e2fsck: aborted
I've added a 4 GB swap partition and it still doesn't complete, but this is a 32-bit kernel, so I don't expect adding any more will help.
Apart from booting into a 64-bit kernel, are there any other ways of getting fsck to complete its check?
A 64 bit kernel and large quantities of RAM will allow the fsck to finish nice and fast. Alternately, there's now an option in e2fsck that'll tell it to store all of it's intermediate results in a directory instead of in RAM, which helps immensely. Create
/etc/e2fsck.conf
with the following contents:(And, obviously, make sure that directory exists, and is on a partition with a good few GB of free space). e2fsck will run SLLOOOOWWWWWWW, but at least it'll complete.
Of course, this won't work with the root FS, but if you've got swap then you're past mounting the root FS anyway.
I ended up trying what womble suggested; here are some more details that may be useful if, like me, you haven't seen this new functionality in e2fsck before.
The "scratch_files" configuration option for e2fsck became available sometime in the version 1.40.x period. (In our case, we had to upgrade to the latest Debian distribution to get this functionality.)
As well as the "directory = /var/cache/e2fsk" option that was suggested, there are some further configuration options to fine tune how the scratch files storage is used. I used "dirinfo = false", since the filesystem had a large number of files, but not such a large number of directories. If the situation was reversed, the "icount" option would be appropriate. These options were all documented in the man page for e2fsck.conf.
BTW, Ted T'so wrote about these options in this thread.
I found that e2fsck was running extremely slowly, much more than predicted by Ted. It was running at 99.9% CPU utilization most of the time (on an extremely slow old processor), which suggests that storing these data structures on disk instead of memory was not the main cause of the slowdown. It might be that something else about what was stored in the filesystem made e2fsck particularly slow. In the end, I have abandoned the filesystem check for now; the filesystem was due for a check, but didn't have errors (as far as I know), so I'm going to arrange to check it at a more convenient time when we can afford to have a week-long outage.
It is best to try to move to a 64-bit kernel, if you have a very large filesystem. In some cases a 32-bit
e2fsck
will run out of memory even with scratch files (for how to use them, see @womble's answer orman e2fsck
).I had such a setup with an 11 TiB filesystem that is used as a backup target – with loads of hard links – and I recently upgraded it to 64-bit solely to be able to run
e2fsck
without the recourse of booting from an external media. The machine in question is a very old box, with only 2 GiB of RAM, but it does have a 64-bit CPU. The OS was 32-bit because it had simply been upgraded for well over a decade, but after enlarging the backup FS to over (I think) 5 TiB I was no longer able to run e2fsck on it, despite having a 5 GiB swap and configuringe2fsck
to use scratch files.However, after the 64-bit upgrade it now seems to be able complete; at least it is currently nearly 80% done, after having run for about 18 hours; previously it was never able to proceed to anywhere near this state. My best guess for why it will always fail on a 32-bit system is the sizes of the scratch files themselves; currently one of them is over 3.3 GiB in size.
Of course, if the CPU has no 64-bit support, the situation is difficult, but nearly all systems that support big enough physical disks to run into this problem will probably have it.
EDIT: an example of the memory usage of the said
e2fsck
process:As shown, the process has a whopping 10 GiB address space, even though the actual
VmData
size remains well within the 4 GiB limit of a PAE-enabled system, so it would have no chance of completing successfully in a 32-bit machine. Of course, it is technically possible that the 32-bit version handles the address space differently, but I wouldn't count on it.