[squid-dev] [PATCH] Case-insensitive URI schemes

Amos Jeffries squid3 at treenet.co.nz
Sun Jan 29 14:10:29 UTC 2017


On 7/01/2017 5:41 a.m., Eduard Bagdasaryan wrote:
> 
> On 06.01.2017 15:27, Amos Jeffries wrote:
>> As a result, the code responsible for lower-case
>>> transformation was not executed.
>>
>> That is intentional behaviour for several reasons;
>>
>> 1) it improves transparency and reduces risks from proxy
>> fingerprinting by systems probing the URI scheme handling by the
>> transport agents (ie, fingerprinting Squid).
>>
>> 2) unknown URI schemes are not necessarily handled properly as
>> case-insensitive by the experimental agents sending and receiving the
>> messages.
>>
>> also, (and more importantly);
> 
> The patch does not change this, i.e., "unknown" images are still stored
> without
> down-casing.
> 
>>
>> 3) the transport protocol label and URI scheme label are still
>> conflated. The scheme down-casing procedure is _only_ applicable when
>> translating from ProtocolType_str labels (upper case) to scheme label
>> (lower case).
> 
> To avoid misunderstanding I pay your attention that the unpatched Squid
> did not
> down-case at all (i.e. for known ProtocolType_str schemes too). In other
> words, when
> receiving HTTP://example.com "HTTP" was not down-cased. Just this
> violates HTTP
> caching rules: two different cache entries were created for
> HTTP://example.com
> and http://example.com requests.
> 
>>
>>
>> 4) storing the down-cased string for registered protocols of each URI
>> avoids many explicit down-casing operations on use/display of the URI
>> scheme. Note that is specific to the known protocols.
>>
>>  - There are many more points of code displaying the scheme than
>> setting it. So this is a significant performance gain despite the
>> overhead of allocating and own-casing a new SBuf per UriScheme object
>> your patch notes with an XXX.
> 
> I am not against allocating and storing down-cased SBuf "image_" (for
> performance sake).
> The related XXX is about  allocating SBuf which we probably can avoid in
> future optimization.
> For example, we could do this by converting ProtocolType_str to a const
> array of SBufs, thus
> avoiding image_ member allocation when dealing with known protocols.
> 

That will not help avoid the re-allocation since ProtocolType_str should
be upper case and COW property will reallocate for lower-casing anyway.


I'm thinking the quick-and-dirty way is to just lowercase the 'proto'
variable in url.cc urlParse() function. Doing that in the for-loop where
it is copied from 'src' would be easiest.
 - it breaks the case preservation on unknown schemes a litte bit. But
since they are supposed to be insensitive anyway the harm is minimal.

Amos



More information about the squid-dev mailing list