[squid-dev] [PATCH] Case-insensitive URI schemes

Amos Jeffries squid3 at treenet.co.nz
Fri Jan 6 12:27:58 UTC 2017


On 2017-01-06 10:24, Eduard Bagdasaryan wrote:
> Hello,
> 
> This patch fixes URI schemes to be case-insensitive (r14802 
> regression).
> 
> The bug was introduced by r14802 (better support of unknown
> URL schemes). AnyP::UriScheme constructor had a problem: the
> explicitly-specified img parameter (always provided by 
> URL::setScheme())
> was used for all schemes, not only for unknown ones (as it obviously
> should be).

The situation is not so obvious. Alex and I already had a rather long 
audit discussion about the finer points of that before the 14802 commit.
  The problem is rooted in the unfortunate fact that transport protocol 
labels are conflated into the scheme types, and are both upper-case + 
case-sensitive.

For context please read:
  
<http://lists.squid-cache.org/pipermail/squid-dev/2016-June/005995.html>


> As a result, the code responsible for lower-case
> transformation was not executed.

That is intentional behaviour for several reasons;

1) it improves transparency and reduces risks from proxy fingerprinting 
by systems probing the URI scheme handling by the transport agents (ie, 
fingerprinting Squid).

2) unknown URI schemes are not necessarily handled properly as 
case-insensitive by the experimental agents sending and receiving the 
messages.

also, (and more importantly);

3) the transport protocol label and URI scheme label are still 
conflated. The scheme down-casing procedure is _only_ applicable when 
translating from ProtocolType_str labels (upper case) to scheme label 
(lower case).
  - as we improve the code to remove the conflation problem the uses of 
that logic path should trend towards zero, then we can drop it entirely.

  - with unknown protocols the UriScheme object will have to be used as 
the transport representation (ew, but that what we have today). So 
retaining the upper case version there on unknown things is currently 
important while being not-harmful for the scheme comparisons which 
should be ignoring case anyhow.

  - we do not yet properly handle scheme comparisons for unknown schemes, 
so down-casing for that reason is not part of the existing code. The #1 
security consideration will need to be balanced into the logic when that 
is eventually fixed.


4) storing the down-cased string for registered protocols of each URI 
avoids many explicit down-casing operations on use/display of the URI 
scheme. Note that is specific to the known protocols.

  - There are many more points of code displaying the scheme than setting 
it. So this is a significant performance gain despite the overhead of 
allocating and own-casing a new SBuf per UriScheme object your patch 
notes with an XXX.


5) the most common cases for this method being called are internal uses 
where img is passed with a static lower-cased value, and input from 
urlParse() parsing an absolute-URI from clients sending properly 
lower-cased URI schemes.

  - So checking for (img && scheme == (NONE or UNKNOWN)) is unnecessary 
extra logic to begin with and by causing the lower-casing logic to be 
applied for registered schemes is also re-adding a major performance 
regression in the common traffic case.


> 
> There are a couple of XXX and one of them (about "broken caller" for
> PROTO_NONE) needs clarification. This caller uses PROTO_NONE
> (i.e., absent scheme) together with "http" scheme image inside
> clientReplyContext::createStoreEntry() which looks inconsistent and
> probably is a bug. Am I missing anything there?

Thats an odd case. I'm not exactly sure what we should be doing there. 
Thus the unusual parameter combo. It was done that way to minimize 
breakage with old code - so the request can be detected as missing 
scheme (avoid confusing with a real HTTP URL) if anything tries to 
compare it, but still generates a reasonably correct URL if its dumped 
to store files.

Amos



More information about the squid-dev mailing list