I have a situation where I need to create 100s of thousands of 0 byte lock files for concurrency control.
I've tested creating them by using:
for i in `seq 1 50000`; do touch "/run/lock/${i}.lock"; done
Since the files are 0 bytes, they don't up any space in the partition. Looking at df -h
:
Filesystem Size Used Avail Use% Mounted on
tmpfs 50M 344K 49M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 246M 0 246M 0% /run/shm
none 100M 0 100M 0% /run/user
The 0%
figure doesn't change at all in the /run/lock
row.
However the memory size does increase at an average of approximately 1KB per lock file. I discovered this by comparing free -h
before and after creating 70,000 lockfiles inside /run/lock
. This memory increase was reflected in real memory usage (virtual memory minus the buffers/cache).
Later I discovered that this 1KB increase is most likely due to the inodes. So I checked inode usage using df -i
:
Filesystem Inodes IUsed IFree IUse% Mounted on
tmpfs 62729 322 62407 1% /run
none 62729 50001 12728 80% /run/lock
none 62729 1 62728 1% /run/shm
none 62729 2 62727 1% /run/user
As you can see, the lockfiles increase inodes inside the /run/lock
partition.
I'm currently on Ubuntu and the /run
mounts are not reflected inside /etc/fstab
. Running mount
gives me:
tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
none on /run/shm type tmpfs (rw,nosuid,nodev)
none on /run/user type tmpfs (rw,noexec,nosuid,nodev,size=104857600,mode=0755)
I have a couple questions regarding this (but the first one is the most important):
- How do I increase the inode limit permanently for
/run/lock
? So that this limit survives restarts? - Would it be better off for me to create my own directory and mount tmpfs on it to use for this instead of using
/run/lock
? - Is each partition's size limit completely independent from each other? That is storing files in
/run
doesn't seem to affect/run/lock
and vice versa. - Is the 1KB derived from the inode? I noticed that when creating non-empty files, the basic block is 4KB for each file.
- Why is
/run
given the filesystem type oftmpfs
but/run/lock
,/run/shm
,/run/user
give filesystem type of "none", especially since all of them are backed by TMPFS? Why aren't they all read astmpfs
in theFilesystem
column? - If all of the directories are independently constrained, how does the OOM killer handle in a situation where there are multiple full TMPFS partitions, each of them sized to 50% of the RAM, and where there are also processes contending for RAM as well. Obviously one cannot use over 100% of RAM. According to the https://www.kernel.org/doc/Documentation/filesystems/tmpfs.txt it mentions the system will deadlock. How does that work?
Responding to some of your question, in order:
mount -o remount,nr_inodes=NUM /run/lock
in your application startup script (in case it's run with uid=0). It should also be safe to add relevant line to /etc/fstab, but haven't tested.Not sure if your application create empty files by opening it (and for how long), but you may also consider increasing open files limit (check ulimit), to avoid depletion.
You are going about this in the wrong direction. You can use filesystem semantics to enforce consistency.
When you want to read a file just open and read it. You should always use
open
, neveraccess
for this operation. If you are using a PHP library to do this, check that it just callsopen
and notaccess
on the file - butfopen
should work fine.When you want to refresh or create a new file, you perform the following operations:-
This operationally is safe, because renames are defined to be atomic. A reader opening the file will see either the old file, or the new file - but never a non-existant file in the cache.
In the worst case with many concurrent checks of each file a number of writers will overwrite one another briefly. But this is way - way cheaper than using a file lock against each file.
Alternatively, rather than having a lock file for each file - consider actually just locking each individual cache object directly. I still dont think that this would scale however.
Using
rename
andlink
semantics in this case guarantees consistency with your cache and is way way cheaper to manage than lock files.