I am rather confused with the whole access and modification.
I want the files to be cached until they are modified. If the data is the same, show cache version. If data has changed, download updated version and cache it.
I don't see why you would want anything different? All this access plus 6 months is not making sense to me. Request a new file if the last modified date is newer than the cache date. Can this not be done in such a simple manner?
I want to be able to create index.html on 1/1/2012 and have everyone cache it. I don't want it to be downloaded again until I edit it. Say I edit it at 1/5/2012 and someone who hasn't seen index.html since 1/1/2012 comes along, they should see they have 1/1/2012 cache but now the file has a last mod time of 1/5/2012 so they download 1/5/2012 version and cache it.
I never want to edit something and have a user not see it for any period of time. I want all my edits to be viewed upon next request.
How can I do this for all files?
There's two concepts here: caching and bandwidth-saving. For your use-case, I'd forget about
Expires:
, but I'll explain it afterwards.If you serve a physical file from a folder, the webserver will add one or two headers:
ETag: xyzzas4324@asdad/33
: which works as an ID for that version of the file and changes if the file's contents are changed; and/orLast-Modified: <date>
: which is the same modification date, as seen on the file's properties.So the next time around, the browser will ask for http://$URL/file with a little twist:
If-None-Match: xyzzas4324@asdad/33
; and/orIf-Modified-Since: <same date>
If the file hasn't been changed, the server sends back an
304 Not Modified
which signals to the browser that it should display the cached version.Notice: this avoided only the repeated transfer of that particular file; the browser still had to wait for an answer from the webserver. So there's still some latency.
Which is where
Expires
comes in. Imagine that the requested file is an RSS feed. If there's anExpires: <2 hours from now>
header, the request will not be repeated by that browser during that interval, period. No browser latency waiting for the server, no increased load.There's a book that goes into more detail: Building Scalable Websites. It goes into more detail about these tricks, but I'll give you the quick rundown:
This is how you should change from the old logo to a new one, assuming you've set
Expires: <6 years>
on the whole ofassets/*
index.html
v1:index.html
v2:Notice that
index.html
doesn't haveExpires: <6 years>
. It gets the304 Not Modified
treatment. When you change one of the assets, you increase its version number and change the .html files where it's used.And you get the best of both worlds.
Don't do anything. Apache already has the behavior you expect out of the box.
When a browser makes a request for a URL that it has already seen and cached, and the user is not explicitly forcing a reload, then it includes an
If-Modified-Since
header with the timestamp of the previous request. If the server determines that the asset has not changed since theIf-Modified-Since
timestamp, then it just responds with304 Not Modified
. (In addition to timestamps, there is another header calledETag
used for checking cache validity that works by comparing hashes of the content, but that doesn't add anything to this discussion.)The only time you want to use any of the
mod_expires
features is when you want to tell the browser not to bother making an HTTP request to check whether a cached asset is up to date for some period of time. For example, you could have a scheme where your CSS file is located at/style.1.css
, and you name subsequent versions of it as/style.2.css
,/style.3.css
, etc. In that case, you could save the overhead of an HTTP request altogether by setting anExpires
header.What you describe as desired is the way any modern webserver works by default.
If you do not send explicit cache/expires headers (or if you send etag headers) every time a user requests a page/asset (asset = js, css, images etc), the browser will send a request to your server. From your description this is your desired behavior - you want the browser to always request the page/asset and receive either the updated content or a
304 Not Modified
empty response indicating that what the user has previously cached is the latest version.Therefore what you want to do is remove any headers you have added yourself so that apache bases it's response on (only) the last-modified header.
As to why you would want a different behavior:
The above is appropriate for html pages - but is inefficient for content-types which change rarely or where you can simply change the url when the content changes. Sending a request to your server, and receiving a
304 Not Modified
response is not free - it takes time (depending on how many assets are on the page - this can be a lot of time) and generates load on the server (yes, handling lots of requests for static assets can significantly affect a server's performance), so it's bad for the user and it's bad for you. This is why you would want to add long expire headers and use cache-busting where possible to reduce http-requests to a minimum, yet force users to re-download assets when they do change.There is an excellent .htaccess guide available from the html5boilerplate project. It goes into great detail how to optimise (cache) headers to improve performance and user experience. You don't need to get stuck in details though: just drop the default .htaccess in your document root and check it does what you're expecting. The default configuration should do exactly what you're wanting with the added benefit of adding best-practice cache headers for static assets. If your question was not in fact about html content but another content type - read the references before simply disabling all the good work the cache logic in the .htaccess file does for you ;)