Ping a Specific Port

Question

NicoAdrian

Asked: 2024-11-07 23:12:46 +0800 CST2024-11-07 23:12:46 +0800 CST 2024-11-07 23:12:46 +0800 CST

Varnish very high number of open files

772

Here is my Varnish conf:

/usr/sbin/varnishd \
      -a 127.0.0.1:6081 \
      -T 127.0.0.1:6082 \
      -f /varnish/default.vcl \
      -P %t/%N/varnishd.pid \
      -s malloc,4G \
      -p thread_pools=12 -p thread_pool_min=250 -p default_ttl=1 -p default_grace=0 -p timeout_idle=1800

So 12 * 250 = 3000 threads. With this setup, I end up with more than 400k open files.

Reducing the number of threads to the minimum does indeed reduce the number of open files a lot.

Question is: how is that possible ? is it normal that each Varnish thread takes so much open files ?

EDIT: Here's my VCL file:

vcl 4.1;

backend someBackend {
    .host = "someBackend.net";
    .connect_timeout = 2s;
    .first_byte_timeout = 5s;
    .between_bytes_timeout = 5s;
}

sub vcl_recv {
    # Remove Varnish from X-Forwarded-For (set by Nginx)
    set req.http.X-Forwarded-For = regsub(req.http.X-Forwarded-For, ",[^,]+$", "");
}

sub vcl_backend_fetch {
    # Hide Varnish token
    unset bereq.http.X-Varnish;
}

sub vcl_backend_response {
    unset beresp.http.Vary;
}

sub vcl_deliver {
    unset resp.http.Via;
    unset resp.http.X-Varnish;
}

1 Answers

Voted

Thijs Feryn · Answer 1 · 2024-11-07T23:46:51+08:00

Some of the parameters you have set could be responsible for this high number of open files.

Threads

Let's start off with thread_pools=12. The default value is 2 and we don't advise you to change that. While thread_pool_min is set to 250 in your use case, the default value for thread_pool_max is 5000.

The question is: "are you seeing 400k open files during peak traffic, or even when you're way below 1000 active threads?"

FYI: the MAIN.threads counter in varnishstat can help you figure out how many active threads there are.

Let's say hypothetically, this is happening during absolute peak traffic where you have 5.000 threads per pool, but the 12 thread pools result in 60.000 active threads.

That would mean there would be about 7 file descriptors per thread, which is not unreasonable.

The impact of `timeout_idle`

There is of course the timeout_idle parameter which you increased from 5 to 1800. This means a connection a left idling for 1800 seconds before it is closed if keepalive is set.

That's a long time.

The file descriptors for idling connections are moved away from the threads and are managed by a waiter thread. This means the file descriptors are kept around, while the threads can take on new connections and create more file descriptors.

FYI: you can run varnishstat -f "WAITER.*.conns" -f "MAIN.sess_conn" -f "MAIN.backend_conn" to monitor connections that are handled by the waiter thread, regular incoming connections, and backend connections.

Some more debugging required

Throughout my answer I have assumed that the 400k open files occur during peak traffic. If that's not the case, there is more debugging required to figure out which files (or file descriptors) are in use.

One way of doing it, is by running the following command:

lsof -p $(pgrep cache-main)

This command will list the various file descriptors that are in use for that process.

And of course running the following command with combined parameters to list all threads & connections:

varnishstat -f "MAIN.threads" -f "WAITER.*.conns" -f "MAIN.sess_conn" -f "MAIN.backend_conn"

Update after comments

In the comments section of this question, you'll notice some back-and-forth in an attempt to gather more information and get some context.

@NicoAdrian mentioned that he was seeing a lot of file descriptors pointing to var/lib/varnish/varnishd/_.vsm_child/_.Stat.*. I couldn't simulate this. Whenever I start varnishd I there are about 45 file descriptors associated with that pattern.

When I increase the threading settings to the parameters that were shared here, that number increases to 80, but not more.

However, I can simulate the high number of file descriptors when increasing the threading settings.

I ran the following commands on a running Varnish server to mimic the settings that were mentioned in this question:

varnishadm param.set thread_pools 12
varnishadm param.set thread_pool_min 250

The output of lsof | grep "varnish" | wc -l was 188918. This means that 188918 file descriptors were in use.

Meanwhile the output of varnishstat -1 -f MAIN.threads shows that 3198 worker threads are up at that time. That's 12 x 250.

250 worker threads per thread pool is the minimum with these settings. As traffic starts to increase, threads per pool can go up to 5000, because the standard value of thread_pool_max is 5000.

There are on average 50-60 file descriptors per worker thread. By having a high number of active threads, those file descriptors will increase at the same rate. Most of these should be linked to /proc/*.

As mentioned, the number of thread pools was increased to 12 to decrease lock contention. We hardly ever increase the value of thread_pools, even for systems at scale. In the case of 12 thread pools, it might make sense to decrease the thread_pool_min value to reduce the initial thread creation, and reduce file descriptors in use.

On baseline traffic it makes sense to look at the MAIN.threads to see how many threads are actually being created. If that value is higher than thread_pools x thread_pool_min, it is an indication that the more threads were created and that thread_pool_min setting was too low.

This has a direct correlation to the number of file descriptors in use. If performance is suspected to become an issue because of that, it is possible to mount the /var/lib/varnish folder into tmpfs.

Varnish very high number of open files

Threads

The impact of `timeout_idle`

Some more debugging required

Update after comments

Can you pass user/pass for HTTP Basic Authentication in URL parameters?