[squid-dev] [PATCH] implement RFC3986

Alex Rousskov rousskov at measurement-factory.com
Mon Dec 28 21:53:44 UTC 2015


On 12/28/2015 12:01 PM, Kinkie wrote:
> On Mon, Dec 28, 2015 at 5:46 PM, Alex Rousskov wrote:
>> On 12/28/2015 07:01 AM, Kinkie wrote:
>>> +static int
>>> +fromhex(char ch)
>>> +{
>>> +    if (ch >= '0' && ch <= '9')
>>> +        return ch - '0';
>>> +    if (ch >= 'a' && ch <= 'f')
>>> +        return ch - 'a' + 10;
>>> +    if (ch >= 'A' && ch <= 'F')
>>> +        return ch - 'A' + 10;
>>> +    return -1;
>>> +}
>>
>> If you are after performance, an unconditional lookup in a 256-member
>> int vector would be faster.
> 
> Performance is good, but it's not the primary objective. For now I'd
> leave as-is.

Consider adding a TODO comment then. Someday, somebody might actually
look through TODO comments and implement simple and useful optimizations
like this one [ instead of complaining that there are only hard problems
that they cannot solve well or, worse, attempting to solve those problems ].


>>> +// return a static 2-char buffer with a hex representation of argument
>>> +static char*
>>> +tohex(const unsigned char c)
>>> +{
>>> +    static char rv[3];
>>> +    (void) snprintf ( rv, 3, "%02X", c);
>>> +    return rv;
>>> +}
>>
>> If you are after performance, an unconditional lookup in a 256-member
>> char* vector would be much faster.
> 
> Same as above.

Consider adding a TODO comment then.


>> If you can return a const pointer, please return a const pointer.
> 
> It's not possible with the current implementation. If we decide to
> move to an array-based one, sure.

The last part does not compute for me: If an array-based implementation
can return a pointer to a const buffer then the current code can as
well. The implementation is not important for this *API* concern. The
important part is whether the callers modify the result. The way tohex()
is declared now, I have to assume that the callers modify the buffer
that tohex() result points to.


>>> +    for (auto in = s.begin(); in != e; ++in) {
>>> +        if (*in != '%') { // normal case, copy and continue
>>> +            rv.push_back(*in);
>>> +            continue;
>>> +        }
>> ...


>> I cannot validate that low-level C code. Can you use a Tokenizer
>> instead? Is it critical to provide copy-less performance for helpers
>> using std::string instead of SBuf?


> Performance for helpers is not really critical IMO.
> SBuf is a lot of code and it really depends on infrastructure provided
> by squid (e.g. mempools).
> Outside squid it really makes not much sense to bring in that much
> scaffolding - which would also have include stubs etc.
> It's much better to be API-compatible and use std::string there.

I doubt that writing low-level code that is difficult to review and that
we often get wrong is worth std::string API compatibility. I have
already speculated about that in the previous squid-dev message so I
will not repeat those thoughts here.




>>> +const CharacterSet
>>> +    RFC3986::Unsafe("rfc1738:unsafe", "<>\"# %{}|\\^~[]`'"),
>>> +    RFC3986::Ctrls("rfc1738:ctrls", {{0x00, 0x1f}, {0x7f,0xff}}),
>>> +    RFC3986::Reserved1738("rfc1738:reserved", ";/?:@=&"),
>> ...
>>> +    RFC3986::Unescaped = (UnsafeAndCtrls - CharacterSet(nullptr,"%") ).rename("rfc1738:unescaped"),
>>
>>
>> If these are RFC 1738 sequences, why place them in the RFC 3986 namespace?
> 
> Providing callers backwards compatibility is the main reason for me.

This answer does not compute for me: AFAIK, you are adding a _new_ class
or namespace called RFC3986. Thus, backwards compatibility is not an
issue here.

Please note that I am not saying that "rfc1738:ctrls" and
"rfc1738:unsafe" sets should not be provided. I am saying that if they
are provided, they should be provided inside an Rfc1738 namespace rather
than inside RFC3986.



>>> === added file 'include/rfc3986.h'
>> ...
>>> +#include "base/CharacterSet.h"
>>
>> This is not my area of expertise, but including src/base/CharacterSet.h
>> into include/ files looks very wrong: Code that needs
>> src/base/CharacterSet, should live in src/ IMHO.
> 
> Yes, I had the same doubt. Alternatively, we could to move
> CharacterSet into include/. I really have no preference.

Both sound like bad ideas to me. Again, there appears to be a hidden
"let's build a library for helpers that is both Squid-specific and not
Squid-specific at the same time" meta-problem here that forces you to
pick between two bad options instead of doing something good.


HTH,

Alex.



More information about the squid-dev mailing list