[squid-dev] Implementing draft-ietf-httpbis-header-structure

Alex Rousskov rousskov at measurement-factory.com
Wed Apr 26 04:14:45 UTC 2017


On 04/25/2017 02:54 PM, Amos Jeffries wrote:
> The class HttpHeader currently provides a mixed bunch of methods to
> retrieve headers as a blob of ASCII in a String. This is causing a lot
> of code almost-duplication, String copying, validation difficulties,
> etc.

Agreed.


> Our Parser-NG project seeks to reduce this and make it a lot more
> efficient by scoping the parse action inside Parser that can be audited
> for accuracy isolated from the complications of header semantic handling.

FYI: The "scoping the parse action inside Parser" approach is too vague
for me to support or critique. I assume that agreeing on Parser-NG
project goals is not really necessary for this specific email thread,
but please correct me if I am wrong.


> https://tools.ietf.org/html/draft-ietf-httpbis-header-structure

> The implementation of this would be refactoring HttpHeader so most
> headers can be parsed and syntax validated in one place (inside
> HttpHeader) and callers receive a structured tree conforming as much as
> possible to the common header structure instead of a String blob. The
> callers then (at worst) only need to know what key or parameter it is
> looking for and how to scan the tree for that instead of parsing the
> whole String.

I assume the above plan means adding some kind of a Tree class and that
a typical field getter would then look like this (in pseudo code):

    /// \returns a generic Common Header Structure tree
    Tree fieldFoo() const {
        if (const auto field = findEntry(HdrType::FOO))
            return FieldParser(field->syntax).parse(field->value);
        return Tree();
    }
    ...
    const auto foo = header->fieldFoo();
    if (foo.element[3].asAscii().contains("bar"))...

If my interpretation of the above plan is anywhere close to what you
have in mind, then please do not implement that plan! A tree-focused
generic field parser approach is likely to make things worse for Squid.
Instead of using tree objects and generic parsers, we should use
field-specific [Common Header Structure] syntax directly in _code_:

    /// \returns a field-specific type/object, like int or HttpHdrCc
    Foo fieldFoo() const {
        if (const auto field = findEntry(HdrType::FOO)) {
            FieldParser parser(field->value);
            // parse Foo field according to its Common Syntax
            foo.stamp = parser.timestamp(); // h1-timestamp
            foo.bar = parser.ascii().contains("bar"); // h1-ascii-string
            ...
            return foo;
        }
        return Foo();
    }
    ...
    const auto foo = header->fieldFoo();
    if (foo.bar)...


Why should we reject "parsing in one place" and "callers that receive
trees"? Consider the combination of the following facts:

1. Header field users do not care about the field syntax and do not want
to scan generic parser-generated trees. They need access to
field-specific values expressed as integers, timestamps, enums, and
similar "typed" values, not generic trees. Giving an integer-expecting
caller a String is clearly wrong, but giving it a "tree" could be worse!

2. Configuring a generic parser to use [field-]specific syntax is
difficult, especially when real-world fields may require adjusting that
syntax depending on the being-parsed values. AFIK, all existing
solutions to these problems are ugly, inflexible, complex, and/or slow.

3. Squid should not police traffic by default. Many compliant fields do
not conform to the Common Header Structure format. Many real fields are
not fully compliant, even when they should be. In most cases, Squid
should still forward fields as is, even if we cannot parse them. This
means that you cannot store most header values using [just] a tree.

4. Each field that Squid manipulates has a fairly simple syntax. A
single field type does not normally contain a wide variety of valid
inputs (unlike, say, a C++ file that may have numerous valid constructs
for a C++ parser to deal with, requiring building a general "tree").

Overall, we most likely need many simple parsers instead of one generic
but configurable Common Header Structure parser. Each Foo field getter
is essentially a parser. All those simple field parsers will share lots
of code, of course (e.g., they will call the same set of FieldParser
methods). My sketch above illustrates this approach.


HTH,

Alex.
P.S. I applaud the ongoing Common Header Structure efforts. In Squid
context, that work may help define parsing vocabulary in the short term
and reduce the number of odd headers we deal with in the long run.



More information about the squid-dev mailing list