[squid-dev] [PATCH] implement RFC3986
Amos Jeffries
squid3 at treenet.co.nz
Tue Jan 5 12:11:32 UTC 2016
I've been playing around with these changes today and the more I thnk
about it the more it seems it should be in src/anyp/ using the Uri::
namespace - which will encompass both RFCs 1738 and 3896 CharacterSet
definitions along with other src/URL.h src/url.cc code implementing
those RFCs logics.
On 29/12/2015 10:53 a.m., Alex Rousskov wrote:
> On 12/28/2015 12:01 PM, Kinkie wrote:
>> On Mon, Dec 28, 2015 at 5:46 PM, Alex Rousskov wrote:
>>> On 12/28/2015 07:01 AM, Kinkie wrote:
>>>> +static int
>>>> +fromhex(char ch)
>>>> +{
>>>> + if (ch >= '0' && ch <= '9')
>>>> + return ch - '0';
>>>> + if (ch >= 'a' && ch <= 'f')
>>>> + return ch - 'a' + 10;
>>>> + if (ch >= 'A' && ch <= 'F')
>>>> + return ch - 'A' + 10;
>>>> + return -1;
>>>> +}
>>>
>>> If you are after performance, an unconditional lookup in a 256-member
>>> int vector would be faster.
>>
>> Performance is good, but it's not the primary objective. For now I'd
>> leave as-is.
>
> Consider adding a TODO comment then. Someday, somebody might actually
> look through TODO comments and implement simple and useful optimizations
> like this one [ instead of complaining that there are only hard problems
> that they cannot solve well or, worse, attempting to solve those problems ].
>
Today I re-implemented both tohex() and fromhex() as array lookups. It
is a little time consuming to manually generate the fromhex array. But a
one-off action and seems to work fine.
>
>>>> + for (auto in = s.begin(); in != e; ++in) {
>>>> + if (*in != '%') { // normal case, copy and continue
>>>> + rv.push_back(*in);
>>>> + continue;
>>>> + }
>>> ...
>
>
>>> I cannot validate that low-level C code. Can you use a Tokenizer
>>> instead? Is it critical to provide copy-less performance for helpers
>>> using std::string instead of SBuf?
>
>
>> Performance for helpers is not really critical IMO.
>> SBuf is a lot of code and it really depends on infrastructure provided
>> by squid (e.g. mempools).
>> Outside squid it really makes not much sense to bring in that much
>> scaffolding - which would also have include stubs etc.
>> It's much better to be API-compatible and use std::string there.
>
> I doubt that writing low-level code that is difficult to review and that
> we often get wrong is worth std::string API compatibility. I have
> already speculated about that in the previous squid-dev message so I
> will not repeat those thoughts here.
>
>
>
>
>>>> +const CharacterSet
>>>> + RFC3986::Unsafe("rfc1738:unsafe", "<>\"# %{}|\\^~[]`'"),
RFC 1738 does not specify the ' character in that unsafe set.
>>>> + RFC3986::Ctrls("rfc1738:ctrls", {{0x00, 0x1f}, {0x7f,0xff}}),
>>>> + RFC3986::Reserved1738("rfc1738:reserved", ";/?:@=&"),
>>> ...
>>>> + RFC3986::Unescaped = (UnsafeAndCtrls - CharacterSet(nullptr,"%") ).rename("rfc1738:unescaped"),
>>>
>>>
>>> If these are RFC 1738 sequences, why place them in the RFC 3986 namespace?
>>
>> Providing callers backwards compatibility is the main reason for me.
>
> This answer does not compute for me: AFAIK, you are adding a _new_ class
> or namespace called RFC3986. Thus, backwards compatibility is not an
> issue here.
>
> Please note that I am not saying that "rfc1738:ctrls" and
> "rfc1738:unsafe" sets should not be provided. I am saying that if they
> are provided, they should be provided inside an Rfc1738 namespace rather
> than inside RFC3986.
The set called Ctrls is actually CharacterSet::CTL, which we already
have defined in CharacterSet.h
Amos
More information about the squid-dev
mailing list