We have Varnish 3.0.2 running on Amazon's Linux and it works great. We have a ttl of 48 hours for most content pages and much longer for images, PDFs etc.
This weekend we've taken the backend down for some maintenance, so I upped the ttl to 5 days earlier in the week. I had assumed that anything in cache would continue to be served for up to 5 days, but much to our disappointment we checked varnishstat
this morning and the cache was almost completely empty and varnish was serving "page not found" messages.
I know that this is not what Varnish is designed to do, but why would it reset its cache when the backend is down? And how can I prevent it for next time?
Update 2012-06-11: After looking in the /var/log/messages I see every 3 hours or so:
Jun 9 03:56:31 idea-varnish varnishd[1128]: Manager got SIGINT
Jun 9 03:56:33 idea-varnish varnishd[6708]: Platform: Linux,3.2.18-1.26.6.amzn1.x86_64,x86_64,-smalloc,-smalloc,-hcritbit
Jun 9 03:56:33 idea-varnish varnishd[6708]: child (6709) Started
Jun 9 03:56:33 idea-varnish varnishd[6708]: Child (6709) said Child starts
I guess this is the server crashing and wiping all the objects in memory. I have only just now installed the -debuginfo rpm but not sure that will actually show anything more.
I supposed we could have switched back to disk-based storage during the scheduled downtime? or would a crash like this wipe that anyway?
Did the varnish process maybe restart? There's an uptime counter in varnishstat. Under certain circumstances the varnish worker thread can die, but it gets restarted immediately. When everything is working fine, this might go unnoticed, but with (planned) backend down time it can be quite inconvient.