I am accessing images stored on storage like AWS S3 or GCS.
I am using a node library called Got to fetch an image, and I stream the response into an image processor.
I am using Got with a tunnel agent so that it uses my Squid Forward Proxy. I am going via Squid, as I want to cache the images so that I don't always need to go all the way to S3 to fetch an image.
My images, however, are not being cached. And every call to fetch the image via Squid results in the following log message:
1612789358.530 3100 172.22.0.3 TCP_TUNNEL/200 2296639 CONNECT storage.googleapis.com:80 - HIER_DIRECT/172.217.170.80 -
I have added a refresh_pattern
directive to force caching.
refresh_pattern . 0 100% 525600 override-expire override-lastmod ignore-reload ignore-no-cache ignore-no-store reload-into-ims ignore-must-revalidate
This should cache everything for a year.
The relevant javascript is this (in case it matters)
let stream = await got.stream(sourceUrl, {
agent: {
http: tunnel.httpOverHttp({
proxy: {
host: "localhost",
port: "3128",
},
}),
},
});
I have no experience with Squid - I am using version 4.x
Can it be, that because I am streaming responses from S3, that Squid can not cache the response once the whole image response has streamed? Or might there be a specific cache directive in my squid.conf file that is required.
You're using the HTTP CONNECT method through the Squid proxy, which completely bypasses its cache and pretty much everything else Squid does.
You should instead connect to it as a normal HTTP proxy, which a quick look at got's page at npmjs.com indicates should be done with the hpagent package.