I have a system with / on tmpfs. Most / subdirectories have aufs mounted overlaying the read-write root filesystem with read-only base filesystem (the system boots from a read-only medium). Earlier, I used to use unionfs instead of aufs. It has been working properly until recently the tmpfs started to fill up. I am not sure what triggered the change. It could be the unionfs to aufs change, a kernel upgrade or some changes in the system and how it accesses the file systems.
Anyway, it seems it is the tmpfs which behaves somehow wrong.
Although the system should not write a lot to tmpfs, quite a bit of it is used up:
# df -m /
Filesystem 1M-blocks Used Available Use% Mounted on
tmpfs 200 50 151 25% /
while:
# du -smx /
2 /
This is my test system, doing basically nothing. Things are wore on production system when usage quickly reach over 90% and the system crashes.
I would suspect these are deleted files still open, but:
# lsof | grep deleted
shows nothing.
The other idea was, that some files on / are masked by a file system mounted over it, so I tried this:
# mount --bind / /mnt
# du -sm /mnt
2 /mnt
Still, no trace of the 48MB lost.
How can I find out what is using up my tmpfs file system?
System information:
# uname -rm
3.4.6 i686
Update: I have tried kernels 3.4.17 and 3.6.6 – no change.
I have solved the mystery myself, with help of the aufs maintainer, Junjiro Okajima.
The first step, to debug the problem, was to reproduce it in a controlled way. It took me some time (now I wonder why so much) to find out, that the problem occurs when files are written and deleted via aufs.
Reproducing the problem
create mount points:
mount the tmpfs:
mount the aufs, overlaying /usr with /tmp/rw:
now I can see /usr contents under /tmp/mnt:
what I am interested in is the used/available space on the tmpfs below:
No files in /tmp/rw, but 24 blocks allocated. Still not a big problem.
I can write a file to the aufs, it will be stored on tmpfs in /tmp/rw:
Note how the usage stats changed.
du
show 100kB added, as expected, but the 'Used' value in thedf
output increased by 104 blocks.When I remove the file:
Four blocks are lost.
When I repeat the
dd
andrm
commands a few times I get:More and more tmpfs blocks were gone and I didn't know where…
Where I did the same –
dd
andrm
directly on /tmp/rw nothing was lost this way. And after un-mounting the aufs, the lost space on tmpfs was recovered. So, at least, I knew it was aufs, not tmpfs to blame.What has been happening
Knowing what to blame, I described my problem on the aufs-users mailing list. I have quickly received first answers. The one from J. R. Okajima helped me to explain what is happening to the missing tmpfs blocks.
It was a deleted file, indeed. It wasn't shown by
lsof
or anywhere in/proc/<pid>/*
as the file was not opened or mmaped by any user-space process. The file, the 'xino file', is aufs' external inode number translation table and is used internally by the kernel aufs module.Path to the file can be read from sysfs:
But, as the file is deleted, it cannot be seen directly:
Though, information about its size and sizes of the other special aufs files can be read from the debugfs:
The details are described in the aufs manual page.
The solution
The 'xino file' can be manually truncated by:
Automatic xino file truncation can be requested by using trunc_xino option while mounting the aufs:
I still don't know how does it affect file system performance or if this will really solve my out-of-tmpfs-space problems on production… but I have learned a lot.
I have seen this happen where files were deleted but processes were still holding on to the file which meant that the space was not freed until the process was restarted. I have seen this with Apache Log files. It seemed to continue to write to the now deleted log file and the space was not cleared until it was restarted.
To find out which process might be holding on to deleted files, you might try restarting each process and see if that clears the space. If it does, you have found your culprit.
HTH