[squid-users] Vary object loop returns

Wed Jun 8 17:36:17 UTC 2016

On 7/06/2016 10:48 p.m., Yuri Voinov wrote:
> 
> 
> 
> 07.06.2016 16:36, Amos Jeffries пишет:
>> On 7/06/2016 8:48 p.m., Yuri Voinov wrote:
>>>
>>> 07.06.2016 4:57, Amos Jeffries пишет:
>>>> On 7/06/2016 5:55 a.m., Yuri Voinov wrote:
>>>>>
>>>>> So.
>>>>>
>>>>> Squid DOES NOT and DON'T BE support gzip. The only way to do it - use
>>>>> ecap + desupported ecap gzip adapter. Let's accept this. We can support
>>>>> gzip. With restrictions. Ok.
>>>>>
>>>>> any other compression - false. No. No way. Get out. and so on.
>>>>>
>>>>>  identity - this is uncompressed type.
>>>>>
>>>>> That's all, folks.
>>>>>
>>>>> Finally. As Joe does, we can remain only gzip and identity in
>>>>> Accept-Encoding and truncate all remaining.
>>>
>>>> Locking the entire Internet to using your personal choice of gzip
>>>> compression or none.
>>>
>>>> gzip is the slowest and more resource hungry type of compression there
>>>> is. deflate is actually faster for clients and just as widely supported.
>>> Unfortunately, Amos, no one has written any other compression algorithms
>>> support module. We have to eat what they give.
>>>
> 
>> Like I said deflate is widely available. Heiler's recent info shows that
>> lzma is becomming more visible on the public web, which should help fix
>> the one issue deflate has.
> 
>> And noone appears to be fixing the remaining issues in the Squid gzip
>> eCAP module.
> 
>> There also seems to be a big push back from browser and some server
>> vendors about compression in general. We had a fairly major fight in
>> IETF to get HTTP/2 to contain data compression at all. It is still only
>> in there as an optional extension that some are openly refusing to
>> implement.
> 
> 
>>>
>>>>>
>>>>> Without any problem. Moreover, this type of can be push to all brunches
>>>>> of squid without any problem, because of this dramatically increases
>>>>> byte HIT.
>>>
>>>> Responding with a single object to all requests makes your HIT ratio
>>>> 100% guaranteed. The clients wont like you though if all they ever see
>>>> is the same cat picture.
>>>
>>>> It sounds ridiculous when put that way, but that is what these patches
>>>> are doing for a unknown number of those "gained" HITs. See my previous
>>>> post about how none of these patches are changing the request the server
>>>> gets.
>>> But no one asked the question - why Squid in production installations
>>> has such a low hit ratio
> 
>> Yes that has been asked, even investigated. The reason(s) are many
>> complex details and small issues adding together to a big loss.
> 
>> They range from protocol things like Vary not being fine-grained enough
>> (Key header being developed fixes that), through to client behaviour
>> (Chrome sdch doubles the variant count - almost halving useful cache
>> space), to server behaviour (Apache changing Vary header).
> 
>> What your testing of joes patches is showing is that the sdch effect
>> Chrome has is probably way bigger than one would expect to be reasonable.
> 
> 
>>> that raises the question of expediency of
>>> application caching proxy. We do believe that this is a caching proxy?
>>>
>>>
>>>> You are once again sweeping asside the critical requirement of content
>>>> integrity to achieve high HIT ratio. Which is not something that I can
>>>> accept into Squid as a default action.
>>> I continue to believe that 20% is unacceptably low cache hit ratio,
>>> given the very aggressive settings and the active use of Store ID. Which
>>> brings us back to the idea of the feasibility of using the SQUID as a
> whole.
>>>
> 
>> That kind of "unacceptable" statement simply cannot be made about cache
>> HIT ratio. It is what it is. One cannot change the speed of light
>> because it takes unacceptable long to travel through space.
> Yes and no.
> 
> We're not just talking about the abstract ratio of cache hits. But,
> above all, about the measured byte hit ratio. It is who gives the
> maximum gain traffic. Even an increase in latency cache in many cases
> can be neglected. Traffic is money. Often very large sum.

You are missing my point. The place of measurement matters as much as
the traffic content to what min and max limits the ratio will appear to
have.

> 
> 
>> Two properly working caches in serial will have extremely different
>> caching ratios. The one with most direct client connections trends
>> towards 50-100% and the upstream one towards the servers will trend
>> towards zero. The total cacheable ratio is unchanged, but each cache
>> sees a different proportion of it and so shows different HIT ratios
>> relative to their clients portion.
> Sure, but not all of us can afford two cache. In most installations,

A cache in the client browser. And one in your network. Bingo two caches.

A cache in your network and a client visiting CDN hosted site. Bingo two
caches.

Average request these days goes through something like 6 different HTTP
software installations - each of which might be caching in the
end-to-end message pathway.

> only one box. And we, of course, desirable to have maximum possible
> efficiency. In addition, I am currently working on the optimization of a
> single installation with two storage arrays in order to obtain the best
> possible hit rate by only one server Squid.
> 

Cool. :-)

> 
> 
>> Also, don't forget that browser cache disk space available are
>> increasingly large as well. So their caches are growing in size and
>> taking up a larger share of the total achievable HIT ratios in recent
> years.
> Browser's cache in this case, not quite. It still is not shared.

Well. That situation is getting very fuzzy in the past few years. 'apps'
can offload their activity to a browser with caching to do their things.
So while its still one person / user the embeded advertising re-used by
each app means the browser becomes a shared cache for at least the
advertising part of the traffic. Probably also graphics tiles and the
like as well.

> 
> Which brings us to the fact that customers download the same content
> multiple times, and for a shared cache that content is unique.
> Duplication is a problem. Serious problem. Caches have become more
> content increased quantitatively. We have a huge amount of duplicate
> content, which, in many cases, is apparently identical. StoreID only
> partially solves the problem, because it requires a huge amount of
> manual work for maintenance, and continuous. For example, first I worked
> Instagram own means, then went under Akamai. URL structure has changed
> and needed to radically change the rewrite rule.

Nod. And Vary is part of the problem with duplication. That is a known
thing. But Vary is set into the bedrock of the HTTP ecosystem, there is
no changing how it works without breaking a lot of things. The choice is
only whether or not one stores its variant cloud on a case-by-case basis.

> 
> If we are at the level of application protocols to reduce the amount of
> work for maintenance of shared cache, and it will increase the
> efficiency of our work and solve a lot of problems for future changes in
> the Web.
> 

Amos