[squid-users] Http write cache

Sun Sep 10 17:25:18 UTC 2017

On 10/09/17 21:14, Olivier MARCHETTA wrote:
> Hello,
> 
>> Origin servers can sometimes respond to requests with payload ("uploads") before the request has fully arrived, but any subsequent network issues are guaranteed to result in data loss - so the practice is discouraged.
> 
> If I understand, when it's a download (GET), Squid will replace the payload with the object in cache, if fresh.

Nod. This is possible because two identical requests

> But the HTTP control messages are still coming from the Origin server.

Not necessarily. There are no "control messages" as such in HTTP. The 
cache controls are delivered along with the cached payload to indicate 
what can be done with it. Synchronous server contact (aka revalidation) 
to deliver responses is only required if those controls say so.

> In case of an upload (PUT), it won't accelerate to use the Squid cache,
> because the client has to wait for the Origin server's response of the payload transfer (or request).

Yes. Squid has never seen the request before, so has no idea what 
response will appear as a result.

> 
> The only option to make uploads faster is if the Origin server is aware that the client is using a reverse proxy cache and respond to the upload request before the full payload transfer.
> 

Close, bit not quite. The server does not need to know about the proxy, 
it just has to know the upload payload is "pointless waste of bandwidth" 
  (where data loss don't matter) and deliver its response early.

For example; this is usually seen with NTLM authentication, where 
uploads without credentials are denied early. Because the upload has to 
be repeated in full with the right credentials and all the bytes from 
the first attempt can be dropped in-transit by the proxy.

> Tell me if I'm wrong, but I think that I understand now.
> Meaning that if I want to "bufferize" the writes it has to happen with another protocol before the WebDAV connection to Sharepoint Online.
> 

The "other protocol" is WebDAV as far as I know. HTTP is just about 
delivery of some request and its corresponding response. How WebDAV 
transfers use HTTP messaging, and which parts of HTTP and WebDAV the 
client and server implement may or may not support the behaviour you want.

You are then colliding with the definition differences between "cache" 
and "buffer". Caches store *past* data for the purpose of reducing 
current/future server work, buffers store *current* data awaiting delivery.
  An upload is normally not something seen previously, so not cacheable.

Proxies and the network itself *do* buffer data along the way. But that 
in no way adds any asynchronous properties to HTTP. The client still has 
to wait for the HTTP response to be delivered back to it before it can 
consider the HTTP part of that transaction over - the "transaction" in 
this context may or may not be the full WebDAV upload+processing on the 
server.

HTTP has some mechanisms that can help improve upload behaviour and 
avoid pointless bandwidth delivery. Notably the Expect:100-continue and 
Range features and 201/202 status codes. WebDAV extensions to HTTP add 
various other things I'm not very familiar with.
  Between them they can signal to the client a server is a) contactable 
before data gets delivered, b) deliver it in small chunks to minimize 
loss, and c) that any given part has completed arrival and awaiting some 
state (ie full object arrival) and/or some async processing.

BUT, as should be obvious these are all application-logic level things 
(ie WebDAV) and require explicit support by both the endpoint 
applications on server and client for that logic to take place. The 
async properties arise from how things are done *between* HTTP 
transactions. The interactions are separate synchronous request+response 
message pairs as far as Squid and any HTTP infrastructure is concerned.

Amos