I want to use nginx as a caching reverse proxy. I also have a special requirement, which I thought could be accomplished with nginx.
I am using Amazon s3 as origin server and I use signed urls to protect the content. So every user gets a unique url that expires after a certain time. Even tough every user has a unique URL, to make nginx cache the content regardless, I defined the cache key as only consiting of the request filename (see config below).
That works very nice so far. The problem is, that if the request url would be invalid, because the signature in the query string is too old or invalid, the server delivers the file anyway. Because it is cached.
I confirmed that the initial request must contain a valid signature. If the signture in the user request is invalid, nginx cannot fetch it from the server (of course).
Now what I want to have is a re-lookup of the file on every request. This re lookup should occur with the URL that the user specified. If the request is successful, the cached file should be delivered.
This is exactly the behaviour which should be accomplished using Cache-control: must-revalidate
So I configured this header on my origin server (amazon s3).
I then realised that nginx does not behave accordingly.
So the file is directly delivered from cache without validating with the origin server. And thus the bad signature is not recognised and the user can download.
Question 1: Is there a way to make nginx honor must-revalidation headers in this context?
Here is my config file
proxy_cache_path /home/sbgag/cache keys_zone=MYZONE:10m inactive=365d max_size=10g;
server {
listen 80;
server_name test.mydomain.com;
location / {
proxy_pass http://s3-eu-west-1.amazonaws.com;
proxy_set_header Host s3-eu-west-1.amazonaws.com;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_cache_key "$request_filename";
more_set_headers "X-My-Proxy-Cache-Key $request_filename";
more_set_headers "X-My-Proxy-Cache-realpath_root $realpath_root";
more_set_headers "X-My-Proxy-Cache-uri $uri";
proxy_cache MYZONE;
proxy_cache_valid 200 365d;
proxy_cache_use_stale error timeout invalid_header updating http_500 http_502 http_503 http_504;
more_set_headers "X-AppServer $upstream_addr"; # Backend Server / Port
more_set_headers "X-AppServer-Status $upstream_status"; # Backend HTTP Status
more_set_headers "X-Cache $upstream_cache_status"; # HIT / MISS / BYPASS / EXPIRED
}
}
Also I found this changelog
*) Feature: the "proxy_cache_revalidate", "fastcgi_cache_revalidate", "scgi_cache_revalidate", and "uwsgi_cache_revalidate" directives.
So I thought I play with it. After bringin my nginx up to the newest version I set my cache time to 0s. With 0s the file is never cached, so I set it to 1s.
This produces almost the behaviour I want. It causes the file to be re-validated on the server after 1s. It is then re validated with the signed url the user provided. If it is incorrect, then it fails. Also the file is not deleted, because nginx appears to delete the files not immediately but only when space is getting full. So even if the file timed out and even if another client supplied an invalid url, the next client with a valid url can download from the cache.
In the 1-second timefrime however everyone can download, but hta tis not really a concern to me.
Ok so this is almost what I want, what I do not like is that it is an ugly work around that is based on behaviour that is more accidental than funcitonal.
Question 2: Isnt there a better way?
What I would like best would be to pass requests to a validation script where I can validate the request with my own script in the backgorund. And only if that script returns successful, download is permittetd. And then making use of existing, proven nginx caching algorithms.
Maby even a rewrite rule would do. Rewriting to a script with a rewrite map and only outputting the url when the validation was a success.
Any input is appriciated!
I think what you are looking for is actually adding this:
proxy_cache_bypass $http_cache_control;
In this way sending a request with set Cache-Control:must-revalidate header will bypass the cache.