When building our web pages from different content sources, it may be necessary to get some images from external servers (e.g. when incorporating a rss feed), which may not be as fast or as well connected as our own data center. I would like a have a means to copy respectively proxy the files into a server address that is running at our site to keep the load off the external servers possible changing the filename to hide that fact that the images were dynamically generated.
e.g. turn the following url
http://domain.de/content/query?file=foo/nr_1.gif
into something like this:
mydomain.net/static/domain.de/query_3fresource_3dfoo_2fnr_5f1.gif
This should honor etags, if-modified-since change expires headers to make the files static and cacheable, regardless of what the originating server says.
I think I could build something like this using varnish and another web server, but maybe there is a solution already available.
This could be part of a CDN, however I do not anticipate the necessity of a real CDN since we do not have many visitors from other countries
I would strongly recommend the use of a proxy such as varnish or squid rather than downloading the files and keeping them yourself, so that the proxy takes care of all of the cache expiry and other entertainments that make caching so much fun.
If you're trying to cache content that is dynamically generated and doesn't have proper expiry information in the request headers, you either need to get whoever's generating those pages to include appropriate headers (based on the rate of change of the data that makes up the page), or if that really isn't possible, then overriding the expiry times in your Varnish VCL file.
"Caching" by retrieving everything to files and then serving them locally means you're probably going to be requesting a whole pile of content that is never actually going to be served to users (meaning that the load -- network traffic, CPU, disk, whatever -- on the origin server is potentially higher than it is now) and you're going to end up reimplementing a large part of what makes a caching proxy useful anyway (expiry, storage management, etc). It's just not worth it. There isn't anything like this out there (that I know of, anyway) because anyone smart enough to make something like this that isn't complete arse is smart enough to realise what a bad idea it is, and how they're just reimplementing the hard half of Squid anyway.