[squid-users] Http write cache

Mon Sep 11 12:50:08 UTC 2017

Thank you Amos for this enlightenment.
I really do appreciate your help.
I will stay with the reverse proxy configuration for our POC.
We need more to cache the libraries data reads than the writes at the moment.
And the next version of OneDrive client should help with the asynchronous writes.
Still, it will download from the Cloud so Squid is necessary in all cases.

Thank you.
Regards,
Olivier MARCHETTA

-----Original Message-----
From: squid-users [mailto:squid-users-bounces at lists.squid-cache.org] On Behalf Of Amos Jeffries
Sent: Sunday, September 10, 2017 6:25 PM
To: squid-users at lists.squid-cache.org
Subject: Re: [squid-users] Http write cache

On 10/09/17 21:14, Olivier MARCHETTA wrote:
> Hello,
> 
>> Origin servers can sometimes respond to requests with payload ("uploads") before the request has fully arrived, but any subsequent network issues are guaranteed to result in data loss - so the practice is discouraged.
> 
> If I understand, when it's a download (GET), Squid will replace the payload with the object in cache, if fresh.

Nod. This is possible because two identical requests

> But the HTTP control messages are still coming from the Origin server.

Not necessarily. There are no "control messages" as such in HTTP. The cache controls are delivered along with the cached payload to indicate what can be done with it. Synchronous server contact (aka revalidation) to deliver responses is only required if those controls say so.

> In case of an upload (PUT), it won't accelerate to use the Squid 
> cache, because the client has to wait for the Origin server's response of the payload transfer (or request).

Yes. Squid has never seen the request before, so has no idea what response will appear as a result.

> 
> The only option to make uploads faster is if the Origin server is aware that the client is using a reverse proxy cache and respond to the upload request before the full payload transfer.
> 

Close, bit not quite. The server does not need to know about the proxy, it just has to know the upload payload is "pointless waste of bandwidth" 
  (where data loss don't matter) and deliver its response early.

For example; this is usually seen with NTLM authentication, where uploads without credentials are denied early. Because the upload has to be repeated in full with the right credentials and all the bytes from the first attempt can be dropped in-transit by the proxy.

> Tell me if I'm wrong, but I think that I understand now.
> Meaning that if I want to "bufferize" the writes it has to happen with another protocol before the WebDAV connection to Sharepoint Online.
> 

The "other protocol" is WebDAV as far as I know. HTTP is just about delivery of some request and its corresponding response. How WebDAV transfers use HTTP messaging, and which parts of HTTP and WebDAV the client and server implement may or may not support the behaviour you want.

You are then colliding with the definition differences between "cache" 
and "buffer". Caches store *past* data for the purpose of reducing current/future server work, buffers store *current* data awaiting delivery.
  An upload is normally not something seen previously, so not cacheable.

Proxies and the network itself *do* buffer data along the way. But that in no way adds any asynchronous properties to HTTP. The client still has to wait for the HTTP response to be delivered back to it before it can consider the HTTP part of that transaction over - the "transaction" in this context may or may not be the full WebDAV upload+processing on the server.

HTTP has some mechanisms that can help improve upload behaviour and avoid pointless bandwidth delivery. Notably the Expect:100-continue and Range features and 201/202 status codes. WebDAV extensions to HTTP add various other things I'm not very familiar with.
  Between them they can signal to the client a server is a) contactable before data gets delivered, b) deliver it in small chunks to minimize loss, and c) that any given part has completed arrival and awaiting some state (ie full object arrival) and/or some async processing.

BUT, as should be obvious these are all application-logic level things (ie WebDAV) and require explicit support by both the endpoint applications on server and client for that logic to take place. The async properties arise from how things are done *between* HTTP transactions. The interactions are separate synchronous request+response message pairs as far as Squid and any HTTP infrastructure is concerned.

Amos
_______________________________________________
squid-users mailing list
squid-users at lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users