Our server recently ran out of file descriptors, and in regards to that I have some questions. ulimit -n
is supposed to give me the maximum number of open file descriptors. That number is 1024. I checked the number of open file descriptors by running lsof -u root |wc -l
and got 2500 fds. That is a lot more than 1024, so I guessed that would mean the number 1024 is per process, not per user, as I though. Well, I ran lsof -p$PidOfGlassfish|wc -l
and got 1300. This is the part I don't get. If ulimit -n
is not the maximum number of processes per user or per process, then what is it good for? Does it not apply to the root user? And if so, how could I then get the error messages about running out of file descriptor?
EDIT: The only way I can make sense out of ulimit -n
is if it applies the the number of open files (as stated in the bash manual) rather than the number of file handles (different processes can open the same file). If this is the case, then simply listing the number of open files (grepping on '/', thus excluding memory mapped files) is not sufficent:
lsof -u root |grep /|sort -k9 |wc -l #prints '1738'
To actually see the number of open files, I would need to filter on the name column on only print the unique entries. Thus the following is probably more correct:
lsof -u root |grep /|sort -k9 -u |wc -l #prints '604'
The command above expects output on the following format from lsof:
java 32008 root mem REG 8,2 11942368 72721 /usr/lib64/locale/locale-archive
vmtoolsd 4764 root mem REG 8,2 18624 106432 /usr/lib64/open-vm-tools/plugins/vmsvc/libguestInfo.so
This at least gives me number less than 1024 (the number reported by ulimit -n
), so this seems like a step in the right direction. "Unfortunately" I am not experiencing any problems with running out of file descriptors, so I will have a hard time validating this.
I tested this in Linux version 2.6.18-164.el5 - Red Hat 4.1.2-46. I could see that the ulimit is applied per process.
The parameter is set at user level, but applied for each process.
Eg: 1024 was the limit. Multiple processes were started and the files open by each one was counted using
There were no errors when the sum of files opened by multiple processes crossed 1024. I also verified the unique file count combining the results for different processes and counting unique files. The errors started appearing only when the count for each process crossed 1024. ( java.net.SocketException: Too many open files in process logs )
@oligofren
I also carried out some testing to determine how
"ulimits -Sn"
for"open files"
was enforced.Like the poster Chosen mentioned in link, the ulimit for
"open files"
is indeed applied per process. To see what the process's current limits are:cat /proc/__process_id__/limits
To determine how many files a process has open, you need to use the following command:
lsof -P -M -l -n -d '^cwd,^err,^ltx,^mem,^mmap,^pd,^rtd,^txt' -p __process_id__ -a | awk '{if (NR>1) print}' | wc -l
Explanation of the above and my testing method / results
The
"-P -M -l -n"
arguments to lsof are simply there to make lsof operate as fast as possible. Feel free to take them out.The
"-d '^cwd,^err,^ltx,^mem,^mmap,^pd,^rtd,^txt'"
argument instructslsof
to exclude file descriptors of type: cwd/err/ltx/mem/mmap/pd/rtd/txt.From lsof man page:
I deemed
"Lnn,jld,m86,tr,v86"
as not applicable to Linux and hence didn't bother to add them to the exclusion list. I'm not sure about"Mxx"
.If your application makes use of memory mapped files/devices then you may want to remove
"^mem"
and"^mmap"
from the exclusion list.EDIT ---begin snip---
Edit: I found the following link which indicates that:
So if your process does use memory mapped files, you will need to filter out *.so files.
Also, Sun's JVM will memory map jar files
So things like tomcat/glassfish will also show memory mapped jar files. I've not tested whether these count towards the
"ulimit -Sn"
limit.EDIT ---end snip---
Empirically, I've found that
"cwd,rtd,txt"
are not counted with regards to the per process file limit (ulimit -Sn).I'm not sure whether
"err,ltx,pd"
are counted towards the file limit as I don't know how to create file handles of these descriptor types.The
"-p __process_id__"
argument restrictslsof
to only return information for the__process_id__
specified. Remove this if you want to get a count for all processes.The
"-a"
argument is used to AND the selections (i.e. the "-p" and "-d" arguments).The
"awk '{if (NR>1) print}'"
statement is used to skip the header thatlsof
prints in its output.I tested using the following perl script:
I had to execute the script in the perl debugger to ensure the script doesn't terminate and release the file descriptors.
To execute:
perl -d test.pl
In perl's debugger, you can run the program by entering
c
and pressing enter and if yourulimit -Sn
had a value of 1024, you'll find that the program stops after creating theTest1017.log
file in/tmp
.If you now identify the pid of the perl process and use the above
lsof
command you will see that it also outputs 1024.Remove the
"wc -l"
and replace with a"less"
to see the list of files that counted towards the 1024 limit. Remove the"-d ^....."
argument as well to see that thecwd,txt
andrtd
descriptors didn't count towards the limit.If you now run
"ls -l /proc/__process_id__/fd/ | wc -l"
, you will see a value of 1025 returned. This is becausels
added a"total 0"
header to its output which got counted.Note:
To check whether the OS is running out of file descriptors, it is better to compare the value of:
cat /proc/sys/fs/file-nr | awk '{print $1}'
with
cat /proc/sys/fs/file-max
https://www.kernel.org/doc/Documentation/sysctl/fs.txt documents what
file-nr
andfile-max
mean.The ulimit is for filehandles. It applies to files, directories, sockets, pipes epolls, eventfds, timerfds etc etc.
At any point during the processes startup the limits might have been changed. Visit
/proc/<pid>/limits
and see if the values have been altered.You want to take a look at the system-wide limits set in /proc/sys/fs/file-max and adjust it there (until next reboot) or set fs.file-max in sysctl.conf to make it permanent. This might be helpful - http://www.randombugs.com/linux/tuning-file-descriptors-limits-on-linux.html
It seems like your reasoning is something like, "I have to lower that limit so I don't run out of precious descriptors". The truth is exactly the reverse -- if your server ran out of file descriptors, you need to raise that limit from 1,024 to something larger. For a realistic
glassfish
implementation, 32,768 is reasonable.Personally, I always raise the limit to around 8,192 system-wide -- 1,024 is just ridiculous. But you'll want to raise
glassfish
higher. Check/etc/security/limits.conf
. You can add a special entry for the userglassfish
runs as.Common mistake to compare result of raw lsof call with supposed limit.
For the global limit (/proc/sys/fs/file-max) you should have a look at /proc/sys/fs/file-nr -> the fist value indicates what is used and the last value is the limit
The OpenFile limit is for each process but the can be defined on a user, see command "ulimit -Hn" for user limits and see /etc/security/limits.conf for definitions. Generally applyed with "app user" eg:"tomcat": set limit to 65000 to user tomcat that will apply on java process it runs.
If you want to check limit applied on a process, get its PID and then : cat /proc/${PID}/limits If you want to check how many files are opened by a process, get its PID and then : ls -1 /proc/{PID}/fd | wc -l (note for ls it's 'minus one', not to confond with 'minus el')
If you want to know details with lsof but only for those file handers that count for the limit, have a try with thoses : lsof -p ${PID} | grep -P "^(\w+\s+){3}\d+\D+" lsof -p ${PID} -d '^cwd,^err,^ltx,^mem,^mmap,^pd,^rtd,^txt' -a
Remark : the 'files' are files / pipe / tcp connections / etc.
Note that sometimes you'll probably need to be root or to use sudo to obtain correct result for the commands, without privilege sometimes you don't have error, just less results.
and finally if you want to know what 'files' on your filesystem are accessed by a process, have a look at : lsof -p {PID} | grep / | awk '{print $9}' | sort | uniq
have fun !