[squid-dev] [PATCH] Case-insensitive URI schemes
Amos Jeffries
squid3 at treenet.co.nz
Fri Jan 6 12:27:58 UTC 2017
On 2017-01-06 10:24, Eduard Bagdasaryan wrote:
> Hello,
>
> This patch fixes URI schemes to be case-insensitive (r14802
> regression).
>
> The bug was introduced by r14802 (better support of unknown
> URL schemes). AnyP::UriScheme constructor had a problem: the
> explicitly-specified img parameter (always provided by
> URL::setScheme())
> was used for all schemes, not only for unknown ones (as it obviously
> should be).
The situation is not so obvious. Alex and I already had a rather long
audit discussion about the finer points of that before the 14802 commit.
The problem is rooted in the unfortunate fact that transport protocol
labels are conflated into the scheme types, and are both upper-case +
case-sensitive.
For context please read:
<http://lists.squid-cache.org/pipermail/squid-dev/2016-June/005995.html>
> As a result, the code responsible for lower-case
> transformation was not executed.
That is intentional behaviour for several reasons;
1) it improves transparency and reduces risks from proxy fingerprinting
by systems probing the URI scheme handling by the transport agents (ie,
fingerprinting Squid).
2) unknown URI schemes are not necessarily handled properly as
case-insensitive by the experimental agents sending and receiving the
messages.
also, (and more importantly);
3) the transport protocol label and URI scheme label are still
conflated. The scheme down-casing procedure is _only_ applicable when
translating from ProtocolType_str labels (upper case) to scheme label
(lower case).
- as we improve the code to remove the conflation problem the uses of
that logic path should trend towards zero, then we can drop it entirely.
- with unknown protocols the UriScheme object will have to be used as
the transport representation (ew, but that what we have today). So
retaining the upper case version there on unknown things is currently
important while being not-harmful for the scheme comparisons which
should be ignoring case anyhow.
- we do not yet properly handle scheme comparisons for unknown schemes,
so down-casing for that reason is not part of the existing code. The #1
security consideration will need to be balanced into the logic when that
is eventually fixed.
4) storing the down-cased string for registered protocols of each URI
avoids many explicit down-casing operations on use/display of the URI
scheme. Note that is specific to the known protocols.
- There are many more points of code displaying the scheme than setting
it. So this is a significant performance gain despite the overhead of
allocating and own-casing a new SBuf per UriScheme object your patch
notes with an XXX.
5) the most common cases for this method being called are internal uses
where img is passed with a static lower-cased value, and input from
urlParse() parsing an absolute-URI from clients sending properly
lower-cased URI schemes.
- So checking for (img && scheme == (NONE or UNKNOWN)) is unnecessary
extra logic to begin with and by causing the lower-casing logic to be
applied for registered schemes is also re-adding a major performance
regression in the common traffic case.
>
> There are a couple of XXX and one of them (about "broken caller" for
> PROTO_NONE) needs clarification. This caller uses PROTO_NONE
> (i.e., absent scheme) together with "http" scheme image inside
> clientReplyContext::createStoreEntry() which looks inconsistent and
> probably is a bug. Am I missing anything there?
Thats an odd case. I'm not exactly sure what we should be doing there.
Thus the unusual parameter combo. It was done that way to minimize
breakage with old code - so the request can be detected as missing
scheme (avoid confusing with a real HTTP URL) if anything tries to
compare it, but still generates a reasonably correct URL if its dumped
to store files.
Amos
More information about the squid-dev
mailing list