I don't understand Varnish's behavior in this instance:
Say I have one page on a site that begins to generate 500 errors (and the site has a good custom 500 screen, but that's not completely relevant).
I have code in the vcl_fetch subroutine:
sub vcl_fetch {
// Keep stale response for six hours in case backend fails.
set beresp.grace = 6h;
if (beresp.status == 500) {
set beresp.saintmode = 30s;
return(restart);
}
I had hoped that in the instance of the onset of a 500 error, Varnish would deliver a cached version of the page with an extended grace period. That's not what happened.
Instead Varnish posts its own 500 page, and then starts posting 503.
If I change the return to (deliver), then it delivers the custom 500 page from the back end, but then servers its own 503 guru meditation pages for the duration of saint mode.
What I want to happen, obviously, is to deliver a cached version from prior to the 500 onset, barring that, send our custom 500 page, and never ever send a 503.
In this instance, the backends are still reporting healthy, and other pages from the site are still being served.
You need to look at implementing Grace mode and/or Saint mode.
https://www.varnish-cache.org/docs/trunk/users-guide/vcl-saint-and-grace.html
Grace mode allows you to serve stale content when your backends are down or slow, and saint mode lets you retry another backend if the backend you used responds with an error.
So you'd need 2 or more backends to use saint mode.
To use grace mode, you'd need some way to return a custom error page even when the backend is down. Either from a static html file, or html code built into the vcl. Both of these should always be around even when the backend's down.