Here is my Varnish conf:
/usr/sbin/varnishd \
-a 127.0.0.1:6081 \
-T 127.0.0.1:6082 \
-f /varnish/default.vcl \
-P %t/%N/varnishd.pid \
-s malloc,4G \
-p thread_pools=12 -p thread_pool_min=250 -p default_ttl=1 -p default_grace=0 -p timeout_idle=1800
So 12 * 250 = 3000 threads. With this setup, I end up with more than 400k open files.
Reducing the number of threads to the minimum does indeed reduce the number of open files a lot.
Question is: how is that possible ? is it normal that each Varnish thread takes so much open files ?
EDIT: Here's my VCL file:
vcl 4.1;
backend someBackend {
.host = "someBackend.net";
.connect_timeout = 2s;
.first_byte_timeout = 5s;
.between_bytes_timeout = 5s;
}
sub vcl_recv {
# Remove Varnish from X-Forwarded-For (set by Nginx)
set req.http.X-Forwarded-For = regsub(req.http.X-Forwarded-For, ",[^,]+$", "");
}
sub vcl_backend_fetch {
# Hide Varnish token
unset bereq.http.X-Varnish;
}
sub vcl_backend_response {
unset beresp.http.Vary;
}
sub vcl_deliver {
unset resp.http.Via;
unset resp.http.X-Varnish;
}
Some of the parameters you have set could be responsible for this high number of open files.
Threads
Let's start off with
thread_pools=12
. The default value is 2 and we don't advise you to change that. Whilethread_pool_min
is set to250
in your use case, the default value forthread_pool_max
is5000
.The question is: "are you seeing 400k open files during peak traffic, or even when you're way below 1000 active threads?"
Let's say hypothetically, this is happening during absolute peak traffic where you have 5.000 threads per pool, but the 12 thread pools result in 60.000 active threads.
That would mean there would be about 7 file descriptors per thread, which is not unreasonable.
The impact of
timeout_idle
There is of course the
timeout_idle
parameter which you increased from 5 to 1800. This means a connection a left idling for 1800 seconds before it is closed if keepalive is set.That's a long time.
The file descriptors for idling connections are moved away from the threads and are managed by a waiter thread. This means the file descriptors are kept around, while the threads can take on new connections and create more file descriptors.
Some more debugging required
Throughout my answer I have assumed that the 400k open files occur during peak traffic. If that's not the case, there is more debugging required to figure out which files (or file descriptors) are in use.
One way of doing it, is by running the following command:
This command will list the various file descriptors that are in use for that process.
And of course running the following command with combined parameters to list all threads & connections:
Update after comments
In the comments section of this question, you'll notice some back-and-forth in an attempt to gather more information and get some context.
@NicoAdrian mentioned that he was seeing a lot of file descriptors pointing to
var/lib/varnish/varnishd/_.vsm_child/_.Stat.*
. I couldn't simulate this. Whenever I startvarnishd
I there are about 45 file descriptors associated with that pattern.When I increase the threading settings to the parameters that were shared here, that number increases to 80, but not more.
However, I can simulate the high number of file descriptors when increasing the threading settings.
I ran the following commands on a running Varnish server to mimic the settings that were mentioned in this question:
The output of
lsof | grep "varnish" | wc -l
was188918
. This means that 188918 file descriptors were in use.Meanwhile the output of
varnishstat -1 -f MAIN.threads
shows that3198
worker threads are up at that time. That's12 x 250
.250 worker threads per thread pool is the minimum with these settings. As traffic starts to increase, threads per pool can go up to 5000, because the standard value of
thread_pool_max
is5000
.There are on average 50-60 file descriptors per worker thread. By having a high number of active threads, those file descriptors will increase at the same rate. Most of these should be linked to
/proc/*
.As mentioned, the number of thread pools was increased to 12 to decrease lock contention. We hardly ever increase the value of
thread_pools
, even for systems at scale. In the case of 12 thread pools, it might make sense to decrease thethread_pool_min
value to reduce the initial thread creation, and reduce file descriptors in use.On baseline traffic it makes sense to look at the
MAIN.threads
to see how many threads are actually being created. If that value is higher thanthread_pools x thread_pool_min
, it is an indication that the more threads were created and thatthread_pool_min
setting was too low.This has a direct correlation to the number of file descriptors in use. If performance is suspected to become an issue because of that, it is possible to mount the
/var/lib/varnish
folder intotmpfs
.