I am developing a Varnish pipeline that serves a mix of public and restricted resources.
Since access to public resources makes up the vast majority (>99.9%) of the traffic, I want to create shortcuts to bypass auth token validation and other such things for non-restricted resources—or better yet, only go through the authN/Z path if the resource is in a sort of blacklist.
This blacklist can contain up to about 1M (within the next few years) UUID4 entries. Such a file in plain text occupies about 3.7Gb on disk, so a machine with good RAM capacity should be able to keep it all in memory.
My question is about how to implement this blacklist so lookups are very fast. I thought about a "native" hash set, or Memcached, or similar methods. Memcached would very likely slow down things if it's distributed. Has anybody implemented a similar approach? Which tools does Varnish have at my disposal?
In Varnish you have no direct access to the hash of an object.
However, as you indicated, you could store a list of restricted resources in a key-value store.
KVStore in Varnish Enterprise
When we talk about pure speed of execution, Varnish Enterprise has a KVStore module. This KVStore is kept in local memory and can be rebuilt from a file when a restart occurs.
The Redis VMOD
If Varnish Enterprise is not for you, you could also use Carlos Abalde's Redis VMOD. It's free, it's open source, and does the job quite well.
You can reven run LUA scripts inside VCL to run more intricate logic from within Redis.
If you're afraid that Redis will slow you down, you can limit the number of connections, and even make sure the connection is shared.