Background
I'm hosting a static site on S3, with CloudFront over the top. The issue I have is with my HTML files.
According to CloudFront's FAQ:
Amazon CloudFront uses these cache control headers to determine how frequently it needs to check the origin for an updated version of that file
What I've done so far
With this in mind I've set the HTML files in my S3 Bucket to add in the following headers:
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Expires: Fri, 01 Jan 1990 00:00:00 GMT
On my first call to my samplefile.htm
, I see the following response headers (I've excluded obvious headers (e.g. Content-Type
) in order to keep to the point:
Cache-Control:no-cache, no-store, max-age=0, must-revalidate
Date:Sat, 10 Dec 2011 14:16:51 GMT
ETag:"a5890ace30a3e84d9118196c161aeec2"
Expires:Fri, 01 Jan 1990 00:00:00 GMT
Last-Modified:Sat, 10 Dec 2011 14:16:43 GMT
Server:AmazonS3
X-Cache:Miss from cloudfront
As you can see, my Cache-Control
header is in there. The problem is, if I update this file and refresh I get the cached content (rather than the latest file), and I can see that CloudFront is serving its cached version by looking at the response headers:
X-Cache:Hit from cloudfront
Summary/question
With the above in mind, how can I achieve automatic retrieval of the latest HTML when using CloudFront?
As per its FAQ I should be able to do this with Cache-Control headers, but I can't seem to get this working.
Following the answers below
In the end I decided to change my www CNAME to point to my S3 bucket directly. Then added a new CNAME called "static", which points to CloudFront.
This means that HTML is direct from S3, which then has all its CSS/JS/IMG references pointing to static.mydomain.com
I believe the answers so far, while correct at the time, are now out of date, as Cloudfront now supports a minimum TTL of 0, and the OP's original attempt to use cache-age=0 should now work.
You would want to look into whether to use those other cache-control headers, in terms of whether they will produce the result you are looking for - you may only need max-age. What you probably want is for Cloudfront to check S3 to see if the HTML file has changed. If it has, Cloudfront can fetch and return the new file. If not, it can serve the client from its existing cache (conserving S3 bandwidth, and serving the client faster, and more locally).
The point of Cloudfront is to serve cached content, yes, but now this includes content that sometimes changes, but can be cached if it has not changed.
P.s. query strings also work with Cloudfront now (if you configure a 'behaviour' for the relevant origin - another new feature), however some proxies may still fail to cache any files with query strings.
Amazon Developer Guide: Expiration1
Firstly, the point of Cloudfront is to serve cached content - if you try to serve uncached content from Cloudfront it is slower than serving it directly from S3, in almost all cases (something like streaming content would be the exception). Consider for a moment what needs to happen to serve content from Cloudfront - it needs to be retrieved from the origin server to a location that is geographically close to the user - which means that for a request where Cloudfront has to retrieve content from the origin server, you add extra latency into the request, and the user receives content slower. It is only once the content is available at the edge location that subsequent requests are faster.
The best approach to this problem is to change your filenames when you update a page - this will force Cloudfront to retrieve the new content. Again, keep in mind that Cloudfront is typically used for media files (including images) and style/javascript - and not so much for html. Esssentially, you would have your HTML on S3, and your images on Cloudfront - with any changes you make, you can change the name of the file on Cloudfront (e.g. file-v1.jpg, file-v2.jpg, etc). Another common way is including a query string with version information.
Also, keep in mind that Cloudfront does not serve gzipped content - which may result in a slower response than from a regular server (although, in your case, S3 doesn't identify gzip capable browsers either).
Finally, if you want to, you can use invalidation to force Cloudfront to discard its existing copy and fetch a new one from the origin server. Note, however, that Cloudfront gives you only 1000 free invalidations per month, after which the cost is $0.005/invalidation.
The lowest time Cloudfront will keep content is 1hr, although, the default is 24hr. I'd therefore try to set the max-age to at least 3600. Consider also an s-maxage header (for shared - i.e. proxied content). Amazon recommends this caching tutorial.
There was a recent problem with this, rectified a few days ago
Not sure how CloudFront treats the header like you one you have, but if you don't specify any headers the default time to refresh the objects is 24 hours.
One of the things you can do to refresh the objects is to Invalidate the content. Check out the link below of more info. http://blog.cloudberrylab.com/2010/08/how-to-manage-cloudfront-object.html