We have a load-balanced web app that serves some images off an NFS mounted drive.
When the NFS server goes down, it ends up bringing all of the web instances down.
Currently the volume is mounted with:
ip:/path/to/images /docroot/images nfs soft,intr,rw,rsize=32768,wsize=32768 0 0
I ran a siege test against a selection of images that live on this volume, and when it went down requests ended up timing out based on the apache Timeout value (which was set to 600 sec for this test).
I changed the mount options to:
bg,soft,intr,rw,rsize=32768,wsize=32768,timeo=5,retrans=2,actimeo=60,retry=15
And this was better, but still took too long to fail: The first set of request timed out in about 30 seconds, but the next set took anywhere from 180 to 300s.
I know the long-term solution is to move these to S3, but is it possible to reduce this to under 5-10s without affecting performance?