I have a system runnning nginx / php-fpm / varnish / wordpress and amazon s3.
Now I have looked at a lot of configuration files while setting up the system, and in all of them I found something like this:
/* If the request is for pictures, javascript, css, etc */
if (req.url ~ "\.(jpg|jpeg|png|gif|css|js)$") {
/* Remove the cookie and make the request static */
unset req.http.cookie;
return (lookup);
}
I do not understand why this is done. Most of the examples also run NginX as a webserver. Now the question is, why would you use the varnish cache to cache these static files.
It makes much more sense to me to only cache the dynamic files so that php-fpm / mysql don't get hit that much.
Am I correct or am I missing something here?
UPDATE
I want to add some info to the question based on the answer given.
If you have a dynamic website, where the content actually changes a lot, chaching does not make sense. But if you use WordPress for a static website for example, this can be cached for long periods of time.
That said, more important to me is static conent. I have found a link with some test and benchmarks on different cache apps and webserver apps.
http://nbonvin.wordpress.com/2011/03/14/apache-vs-nginx-vs-varnish-vs-gwan/
NginX is actually faster in getting your static content, so it makes more sense to just let it pass. NginX works great with static files.
--
Apart from that, most of the time static content is not even in the webserver itself. Most of the time this content is stores on a CDN somewhere, maybe AWS S3, something like that. I think the varnish cache is the last place where you want to have you static content stored.
There are a few advantages to Varnish. The first one you note is reducing load on a backend server. Typically by caching content that is generated dynamically but changes rarely (compared to how frequently it is accessed). Taking your Wordpress example, most pages presumably do not change very often, and there are some plugins that exist to invalidate a varnish cache when the page changes (i.e. new post, edit, comment, etc). Therefore, you cache indefinitely, and invalidate on change - which results in the minimum load to your backend server.
The linked article not-withstanding, most people would suggest that Varnish performs better than Nginx if setup properly - although, (and I really hate to admit it) - my own tests seem to concur that nginx can serve a static file faster than varnish (luckily, I don't use varnish for that purpose). I think that the problem is that if you end up using Varnish, you have added an extra layer to your setup. Passing through that extra layer to the backend server will always be slower than just serving directly from the backend - and this is why allowing Varnish to cache may be faster - you save a step. The other advantage is on the disk-io front. If you setup varnish to use malloc, you don't hit the disk at all, which leaves it available for other processes (and would usually speed things up).
I think that one would need a better benchmark to really gauge the performance. Repeatedly requesting the same, single file, triggers file system caches which begin to shift the focus away from the web-servers themselves. A better benchmark would use siege with a few thousand random static files (possibly even from your server logs) to simulate realistic traffic. Arguably though, as you mentioned, it has become increasingly common to offload static content to a CDN, which means that Varnish probably won't be serving it to begin with (you mention S3).
In a real-world scenario, you would likely prioritize your memory usage - dynamic content first, as it is the most expensive to generate; then small static content (e.g. js/css), and lastly images - you probably wouldn't cache other media in memory, unless you have a really good reason to do so. In this case, with Varnish loading files from memory, and nginx loading them from disk, Varnish will likely out-perform nginx (note that nginx's caches are only for proxying and fastCGI, and those, by default are disk based - although, it is possible to use nginx with memcached).
(My quick - very rough, not to be given any credibility - test showed nginx (direct) was the fastest - let's call it 100%, varnish (with malloc) was a bit slower (about 150%), and nginx behind varnish (with pass) was the slowest (around 250%). That speaks for itself - all or nothing - adding the extra time (and processing) to communicate with the backend, simply suggests that if you are using Varnish, and have the RAM to spare, you might as well just cache everything you can and serve it from Varnish instead of passing back to nginx.)
I think you might be missing something.
By definition, dynamic files change. Typically, they change by doing some sort of database query that affects the content of the page being served up to the user. Therefore, you do not want to cache dynamic content. If you do, it simply becomes static content and most likely static content with incorrect content.
As a simple example, let's say you have a page with the logged in user's username at the top of the page. Each time that page is loaded, a database query is run to determine what username belongs to the logged in user requesting the page which ensures that the proper name is displayed. If you were to cache this page, then the the database query would not happen and all users would see the same username at the top of the page and it likely will not be their username. You need that query to happen on every page load to ensure that the proper username is displayed to each user. It is therefore not cacheable.
Extend that logic to something a little more problematic like user permissions and you can see why dynamic content should not be cached. If the database is not hit for dynamic content, the CMS has no way to determine whether the user requesting the page has permissions to see that page.
Static content is, by definition, the same for all users. Therefore no database query needs to take place to customize that page for each user so it makes sense to cache that to eliminate needless database queries. Images are a really great example of static content - you want all users to see the same header image, the same login buttons, etc, so they are excellent candidates for caching.
In your code snippet above you're seeing a very typical Varnish VCL snippet which forces images, css and javascript to be cached. By default, Varnish will not cache any request with a cookie in it. The logic being that if there is a cookie in the request, then there must be some reason the server needs that cookie so it is required on the back end and must be passed through the cache. In reality, many CMSes (Drupal, Wordpress, etc) attach cookies to almost everything whether or not it is needed so it is common to write VCL to strip the cookies out of content that is known to be static which in turn causes Varnish to cache it.
Make sense?
For dynamic contents, some kind like stock quotes actually change often (updated each second on an
SaaS server
from abackend server
) but might be queried even more often (by tens of thousands ofsubscription clients
):In this case, caching on the
SaaS server
the per-second update frombackend servers
makes it possible to satisfy the queries of the tens of thousands ofsubscription users
.Without a cache on the SaaS server then this model would just not work.
Caching static files with Varnish would benefit in terms of offloading Nginx. Of course, if you have lots of static files to cache, it will waste RAM. However, Varnish has a nice feature - it supports multiple storage backends for its cache.
For static files: cache to HDD For everything else: cache to RAM.
This should give you more insight on how to implement this scenario: http://www.getpagespeed.com/server-setup/varnish-static-files-cache