[squid-dev] 206 Partial Content Caching

Thu Sep 13 16:14:58 UTC 2018

On 09/13/2018 05:23 AM, Benson Kwok wrote:

> I have successfully implemented caching of 206 Partial Content as a
> project at my job and want to know if you guys are interested in pulling
> it into main branch.

Yes, of course! If you are willing to make the changes necessary for the
official inclusion, please submit an official pull request:
https://wiki.squid-cache.org/MergeProcedure

I did not review your changes, but here are a few high-level problems
that jumped at me:

1. The changes add support for single-range caching without merging, not
general 206 caching. The added feature alone _is_ useful, but its
scope/limits need to be clearly specified in the PR description.

2. The changes add new data fields to the StoreEntry class. If possible,
those fields should be moved to MemObject. The former class exists for
all UFS-cached entries (and more). There can be billions of them! The
latter class is primarily for the entries currently in use. We should
not waste memory on the former if we can use the memory of the latter.

3a. With the proposed range-in-key approach, cached range objects cannot
be removed when such removal is required by URL-based HTTP rules (e.g.,
an HTTP DELETE request). Squid will simply not know what ranges to use
to find the objects for a given URL. IIRC, Squid has a similar problem
for request methods, but there Squid can enumerate all currently
cachable methods because that list is hard-coded (GET and HEAD).

3b. The proposed range-in-key approach probably clashes with the
ultimate goal of supporting caching (and merging/fetching) of arbitrary
range sets.

3c. With the proposed range-in-key approach, ten requests fetching ten
different ranges will create ten cache objects. IIRC, it is common for
applications such as PDF readers to request several ranges for a single
document. The current approach could result in lots of objects being
cached instead of one. Is that a good idea, especially as the default
behavior?

Looking at items 3x combined, we need to discuss whether that
range-in-key design is the right approach. Right now, I do not think it
is. It certainly makes things simpler short-term, but it immediately
leads to  dangerous HTTP violations and probably contradicts long-term
goals. This discussion should probably happen here on the mailing list.

Thanks a lot,

Alex.

> - adding range_offset and range_length to StoreEntry
> - caching single ranged requests by adding the range offset and length
> to store key hash function so they can be lookup by another request with
> the same range offset and length
> - offset and length are also added to StoreMeta so after a restart, the
> offset and range can be restored
> - enhancing HTCP so the range header from a peer is parsed and the
> offset and length are used during HIT/MISS lookup
> - skip ICP for range request since ICP cannot include range header
> - adjust store_client.cc copyInfo.offset by range_offset
> - adjust store_swapout.cc mem->swapout.queue_offset by range_offset
> - adjust trimSwappable() new_mem_lo by range_offset

> https://github.com/squid-cache/squid/compare/master...bkwzwz:206_partial_content_caching