[squid-dev] [PATCH] implement RFC3986
Alex Rousskov
rousskov at measurement-factory.com
Mon Dec 28 16:46:40 UTC 2015
On 12/28/2015 07:01 AM, Kinkie wrote:
> the attached patch is meant to implement rfc3986, and eventually
> supersede rfc1738, which is not API-compatible with.
I am only commenting on the code itself, not the RFC 3986 support idea.
> +static int
> +fromhex(char ch)
...
> +// return a static 2-char buffer with a hex representation of argument
> +static char*
> +tohex(const unsigned char c)
Please do not add global static functions to header files unless
actually needed. Use inline instead to avoid duplicating these functions
in every file that includes this header.
> +// duplicate from rfc1738.c
> +static int
> +fromhex(char ch)
Missing description. "Duplicate of some private function in some
scheduled-for-removal(?) C file" is not a description.
> +static int
> +fromhex(char ch)
> +{
> + if (ch >= '0' && ch <= '9')
> + return ch - '0';
> + if (ch >= 'a' && ch <= 'f')
> + return ch - 'a' + 10;
> + if (ch >= 'A' && ch <= 'F')
> + return ch - 'A' + 10;
> + return -1;
> +}
If you are after performance, an unconditional lookup in a 256-member
int vector would be faster.
> +// return a static 2-char buffer with a hex representation of argument
> +static char*
> +tohex(const unsigned char c)
> +{
> + static char rv[3];
> + (void) snprintf ( rv, 3, "%02X", c);
> + return rv;
> +}
If you are after performance, an unconditional lookup in a 256-member
char* vector would be much faster.
If you can return a const pointer, please return a const pointer.
> + for (auto in = s.begin(); in != e; ++in) {
> + if (*in != '%') { // normal case, copy and continue
> + rv.push_back(*in);
> + continue;
> + }
...
I cannot validate that low-level C code. Can you use a Tokenizer
instead? Is it critical to provide copy-less performance for helpers
using std::string instead of SBuf?
> +class RFC3986 {
> +public:
> + const static CharacterSet
> + Unsafe, // RFC 1738 unsafe set
> + Ctrls, // control characters (\0x00 to \0x1f)
> + UnsafeAndCtrls, // RFC 1738 Unsafe and Ctrls
> + Reserved1738, // RFC 1738 Reserved set
> + GenDelims,// RFC 3986 gen-delims set
> + SubDelims,// RFC 3986 sub-delims set
> + Reserved, // RFC 3986 reserved characters set
> + Unreserved, // RFC 3986 unreserved characters set
> + Unescaped,//ctrls and unsafe except for percent symbol
> + All;
> +
> +};
This is not a class but a namespace.
Please s/RFC3986/Rfc3986/ for consistency with similar "let's group RFC
concepts in one namespace" SslBump code posted by Christos earlier (IIRC).
> +template <class String>
> +String rfc3986_escape(const String &s, const CharacterSet &escapeChars = RFC3986::UnsafeAndCtrls)
Describe and place inside the Rfc3986 namespace as "Escape()". There may
be other names that you may want to add to the Rfc3986 namespace (I have
not checked).
> +const CharacterSet
> + RFC3986::Unsafe("rfc1738:unsafe", "<>\"# %{}|\\^~[]`'"),
> + RFC3986::Ctrls("rfc1738:ctrls", {{0x00, 0x1f}, {0x7f,0xff}}),
> + RFC3986::Reserved1738("rfc1738:reserved", ";/?:@=&"),
...
> + RFC3986::Unescaped = (UnsafeAndCtrls - CharacterSet(nullptr,"%") ).rename("rfc1738:unescaped"),
If these are RFC 1738 sequences, why place them in the RFC 3986 namespace?
> === added file 'include/rfc3986.h'
...
> +#include "base/CharacterSet.h"
This is not my area of expertise, but including src/base/CharacterSet.h
into include/ files looks very wrong: Code that needs
src/base/CharacterSet, should live in src/ IMHO.
> +/* Being a C library code it is best bodily included and tested with C++ type-safe techniques. */
> +#include "lib/rfc3986.cc"
lib/rfc3986.cc is not a C library code. This looks like a dirty hack to
work around some linking problems due to misplaced rfc3986.cc file.
> + // bugs.
> + void PercentZeroNullDecoding();
> + void testPerformance();
Not sure what this misplaced description means. Are we adding bugs?
HTH,
Alex.
More information about the squid-dev
mailing list