[squid-users] HEAD requests: pass through?

Wed Sep 4 13:54:40 UTC 2019

On 4/09/19 9:54 pm, fansari wrote:
> If my understanding is correct when the client already has the content it
> sends a HEAD request to the squid and it will be checked whether the content
> on the squid is newer than the local cache of the client.

Maybe.

HTTP/1.0-only clients are likely to do so. Since all they can really
depend on is the small set of 1.0 protocol features.

HTTP/1.1 clients (the majority these days) have access to conditional
requests and may send a GET request with conditions about what it wants.

> 
> Is it possible to configure the squid in a way that such requests are not
> answered by the squid itself but passed through to the internet?

It is BUT ...

> Because it
> may happen that the content on the internet has changed - in this case the
> client would compare against the older content from the suqid cache (this
> should be avoided).
> 

For Squid to provide an answer from its cached object that object needs
to have information telling Squid what to do when constructing such an
response. There is no need to setup hard-coded behaviour with dependency
on a particular request method.

If the resource is likely to change, then the origin server is supposed
to be sending the Cache-Control header with option to tells the proxy to
always check for updates before answering any client, or to tell the
proxy an interval to perform re-checks.

> The scenario should be: 
> 
> 1. If the client does not have the content ask the squid and
>   1a)  If it has the content take this. 
>   1b) If it does not have the content get it from the internet. 
> 

You just described HTTP caching in a very simplistic way. For more
details you can read the specification itself:
 <https://tools.ietf.org/html/rfc7234>

> 2. If the client already has the content and just sends a HEAD request pass
> this to the internet in order to check against the newest version.
> 

HEAD method _can_ be used for this, but its intended main purpose is for
finding out details about an object the client *does not* already have.
Without incurring the time and bandwidth costs of fetching the entire thing.

This relaying every request is terribly inefficient thing to be doing.
One of the core functions of a proxy cache is to *reduce*
traffic/bandwidth to upstream servers. The HTTP conditional requests and
caching controls are far more efficient at doing these updates on an
as-needed basis.

For more details on conditional requests see:
 <https://tools.ietf.org/html/rfc7232>

Also, HEAD responses are not cacheable. So by relaying the HEAD requests
you are guaranteeing that any update to that resource does not get
cached by the proxy until much later when the client is already waiting
for it to arrive. If you let the proxy decide what to send the server it
can choose to send a conditional request itself to update the cached
content ready for when these clients do their followup GET.

If you want to still insist on doing this weird thing with HEAD requests
you can configure:
 acl HEAD method HEAD
 send_hit deny HEAD

HTH
Amos