[squid-dev] [PATCH] implement RFC3986

Kinkie gkinkie at gmail.com
Mon Dec 28 19:01:36 UTC 2015


Hi!
  thanks for the super-fast review.

On Mon, Dec 28, 2015 at 5:46 PM, Alex Rousskov
<rousskov at measurement-factory.com> wrote:
> On 12/28/2015 07:01 AM, Kinkie wrote:

>> +static int
>> +fromhex(char ch)
>
> ...
>
>> +// return a static 2-char buffer with a hex representation of argument
>> +static char*
>> +tohex(const unsigned char c)
>
> Please do not add global static functions to header files unless
> actually needed. Use inline instead to avoid duplicating these functions
> in every file that includes this header.

Ok.

>> +// duplicate from rfc1738.c
>> +static int
>> +fromhex(char ch)
>
> Missing description. "Duplicate of some private function in some
> scheduled-for-removal(?) C file" is not a description.

Ok

>> +static int
>> +fromhex(char ch)
>> +{
>> +    if (ch >= '0' && ch <= '9')
>> +        return ch - '0';
>> +    if (ch >= 'a' && ch <= 'f')
>> +        return ch - 'a' + 10;
>> +    if (ch >= 'A' && ch <= 'F')
>> +        return ch - 'A' + 10;
>> +    return -1;
>> +}
>
> If you are after performance, an unconditional lookup in a 256-member
> int vector would be faster.

Performance is good, but it's not the primary objective. For now I'd
leave as-is.

>> +// return a static 2-char buffer with a hex representation of argument
>> +static char*
>> +tohex(const unsigned char c)
>> +{
>> +    static char rv[3];
>> +    (void) snprintf ( rv, 3, "%02X", c);
>> +    return rv;
>> +}
>
> If you are after performance, an unconditional lookup in a 256-member
> char* vector would be much faster.

Same as above.

> If you can return a const pointer, please return a const pointer.

It's not possible with the current implementation. If we decide to
move to an array-based one, sure.

>> +    for (auto in = s.begin(); in != e; ++in) {
>> +        if (*in != '%') { // normal case, copy and continue
>> +            rv.push_back(*in);
>> +            continue;
>> +        }
> ...
>
> I cannot validate that low-level C code. Can you use a Tokenizer
> instead? Is it critical to provide copy-less performance for helpers
> using std::string instead of SBuf?

Performance for helpers is not really critical IMO.
SBuf is a lot of code and it really depends on infrastructure provided
by squid (e.g. mempools).
Outside squid it really makes not much sense to bring in that much
scaffolding - which would also have include stubs etc.
It's much better to be API-compatible and use std::string there.

>> +public:
>> +    const static CharacterSet
>> +        Unsafe,  // RFC 1738 unsafe set
>> +        Ctrls,   // control characters (\0x00 to \0x1f)
>> +        UnsafeAndCtrls, // RFC 1738 Unsafe and Ctrls
>> +        Reserved1738, // RFC 1738 Reserved set
>> +        GenDelims,// RFC 3986 gen-delims set
>> +        SubDelims,// RFC 3986 sub-delims set
>> +        Reserved, // RFC 3986 reserved characters set
>> +        Unreserved, // RFC 3986 unreserved characters set
>> +        Unescaped,//ctrls and unsafe except for percent symbol
>> +        All;
>> +
>> +};
>
> This is not a class but a namespace.
>
> Please s/RFC3986/Rfc3986/ for consistency with similar "let's group RFC
> concepts in one namespace" SslBump code posted by Christos earlier (IIRC).
>
>
>
>> +template <class String>
>> +String rfc3986_escape(const String &s, const CharacterSet &escapeChars = RFC3986::UnsafeAndCtrls)
>
> Describe and place inside the Rfc3986 namespace as "Escape()". There may
> be other names that you may want to add to the Rfc3986 namespace (I have
> not checked).

Escape, Unescape an the masks are about all I can think of.

>
>
>> +const CharacterSet
>> +    RFC3986::Unsafe("rfc1738:unsafe", "<>\"# %{}|\\^~[]`'"),
>> +    RFC3986::Ctrls("rfc1738:ctrls", {{0x00, 0x1f}, {0x7f,0xff}}),
>> +    RFC3986::Reserved1738("rfc1738:reserved", ";/?:@=&"),
> ...
>> +    RFC3986::Unescaped = (UnsafeAndCtrls - CharacterSet(nullptr,"%") ).rename("rfc1738:unescaped"),
>
>
> If these are RFC 1738 sequences, why place them in the RFC 3986 namespace?

Providing callers backwards compatibility is the main reason for me.

>> === added file 'include/rfc3986.h'
> ...
>> +#include "base/CharacterSet.h"
>
> This is not my area of expertise, but including src/base/CharacterSet.h
> into include/ files looks very wrong: Code that needs
> src/base/CharacterSet, should live in src/ IMHO.

Yes, I had the same doubt. Alternatively, we could to move
CharacterSet into include/. I really have no preference.

>> +/* Being a C library code it is best bodily included and tested with C++ type-safe techniques. */
>> +#include "lib/rfc3986.cc"
>
> lib/rfc3986.cc is not a C library code. This looks like a dirty hack to
> work around some linking problems due to misplaced rfc3986.cc file.

I copied that from rfc1738, but you're right, it's not really needed

>> +    // bugs.
>> +    void PercentZeroNullDecoding();
>> +    void testPerformance();
>
> Not sure what this misplaced description means. Are we adding bugs?

Hopefully not.

The changes are in the branch; I'm not posting a diff as there are
some open points still. As soon as they're closed I'll send a new
patch for review.

Thanks.

-- 
    Francesco


More information about the squid-dev mailing list