I'm a novice linux admin and now responsible for the OS of a 3 node Tomcat cluster. (Tomcat is handled by the DEVs luckily.)
I got alarmed by our monitoring solution that /var on server01 has only 172MB left of free space. Most likely because /var/log did fill up.
So I investigated with:
server01:/var# for i in $(ls); do du -sh $i; done
3.5M backups
100M cache
51M lib
0 local
0 lock
598M log
0 mail
0 opt
40K run
32K spool
144K tmp
4.0K www
If I sum that up I end by something around 760MB used. The numbers don't change if I dig deeper into the directory tree. So this is correct.
But if I execute a df -h I end up with completely different numbers for /var. df shows that 2.8G out of 3.0G are used.
server01:/var# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 950M 205M 697M 23% /
tmpfs 2.0G 0 2.0G 0% /lib/init/rw
udev 2.0G 4.0K 2.0G 1% /dev
/dev/sda3 961M 33M 928M 4% /tmp
/dev/dm-0 2.0G 506M 1.5G 26% /usr
/dev/dm-1 3.0G 2.8G 172M 95% /var
/dev/dm-2 20G 17G 3.3G 84% /home
The funny thing is, that the other 2 nodes are reporting even more used spaced on /var. Because /var/log/ on node 2 and 3 are consuming 200-300MB more space. But partitions and the underlying LVM are having the same size on all 3 nodes.
On server02 and server03 df -h reports that everything is fine and only 1.0 to 1.2GB are used from 3.0GB.
So where is my space being used?
I heared of those little bastards called inodes and checked for this. df -i reports:
server01:/var# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda1 123648 6099 117549 5% /
tmpfs 506908 3 506905 1% /lib/init/rw
udev 506487 675 505812 1% /dev
/dev/sda3 987968 7 987961 1% /tmp
/dev/dm-0 2048000 19786 2028214 1% /usr
/dev/dm-1 705808 1807 704001 1% /var
/dev/dm-2 13619632 5906 13613726 1% /home
And on server02 and server03:
server03:/var# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda1 123648 6100 117548 5% /
tmpfs 506908 3 506905 1% /lib/init/rw
/dev 506487 675 505812 1% /dev
/dev/sda3 987968 7 987961 1% /tmp
/dev/dm-0 2048000 19784 2028216 1% /usr
/dev/dm-1 3096576 1758 3094818 1% /var
/dev/dm-2 13113840 5642 13108198 1% /home
So /var on server01 has 705.808 inodes, while server02 and server03 have 3.096.576 inodes on /var. But is this really the cause? As only 1% is used on each node.
If yes, how do I increase the inodes? (All filesystems are XFS out of / that is ext2)
/etc/fstab is the same on all 3 nodes. OS is Debian Lenny 64bit with Kernel 2.6.35.4.
Regards
You can run
lsof | grep deleted
and check witch programs allocated this space (and the deleted file).example:
If you delete log files that are open for writing by a process, the filenames dissapear (so are not seen by du?) but the space allocated is still allocated and as the process continues to write, the allocated space can increase.
If the logs were TomCat logs you need to tell Tomcat to reopen it's log files.
Note "copytruncate" in this example. I don't know if this applies to your situation though.
thanks for the tip with lsof | grep deleted. In fact im getting dozens of deleted files for Apache2 and Tomcat6.
After restarting Apache2 the number of deleted files reduced to 40. And I had 2.4 GB free on /var. I also searched for deleted file on the other 2 hosts and found out that on server02 there are also deleted file still open. Luckily this time I stated a "ps auxf" before. There I saw that an Apache2 Thread was open since November 8th. After "kill -9 $oldapache2threadpid" these deleted files vanished also. Maybe this was also the problem on server01.
I then did a restart of the Tomcat service on server01. The deleted files vanished also, but free space didn't increase. But free space on /var now matches (with a few MB) what du -sch tells me.
So, thanks for help everyone :-)
Still need to investigate why Apache isn't closing all his threads.
Regards