I am managing a couple of web proxies running Squid 4.10 on Ubuntu 20.04LTS in several locations distributed worldwide. One of them has developed a nasty habit of occasionally failing to access a web page. The user receives instead an error page saying:
Hmmm... can't reach this page
It looks like the webpage at <URL> might be having issues,
or it may have moved permanently to a new web address.
ERR_TUNNEL_CONNECTION_FAILED
After adding %err_code/%err_detail
to the end of the relevant logformat
as recommended on this mailing list post, Squid access.log entries for the failing accesses look like this:
1635169354.239 171 10.72.1.103 NONE/503 0 CONNECT ad.360yield.com:443 - HIER_
NONE/- - ERR_DNS_FAIL/-
Squid status is NONE/503
, and the error code and detail always ERR_DNS_FAIL/-
.
The timestamp, client IP address and requested URL vary of course.
Each occurrence of the problem affects a single FQDN or very small number of FQDNs, often all from the same organisation (eg. lm.licenses.adobe.com and cc-api-data.adobe.io, both from Adobe.) All other accesses continue to work normally. An occurrence lasts typically between five and ten minutes. During that time all clients trying to access that FQDN are affected. Before and after that, the same FQDN works without a problem. There is no discernible regularity in the affected FQDNs.
Some of the occurrences are accompanied by a message like:
2021/10/25 15:42:34 kid1| ipcacheParse No Address records in response to 'ad.360yield.com'
in /var/log/squid/cache.log
but in the majority of cases nothing is logged there.
How can I find out what goes wrong there?