[squid-dev] [PATCH] mime unfolding

Alex Rousskov rousskov at measurement-factory.com
Wed May 11 17:34:18 UTC 2016


On 05/11/2016 05:32 AM, Amos Jeffries wrote:

> +    if (mimeHeaderBlock_.length() < 2) {
> +        if (mimeHeaderBlock_[0] == '\n')
> +            mimeHeaderBlock_ = Http1::CrLf();
> +        else
> +            mimeHeaderBlock_.append(Http1::CrLf());
> +    }
> +    // now mimeHeaderBlock_ has 0+ fields followed by the CRLF terminator


mimeHeaderBlock_[0] does not exist when mimeHeaderBlock_ is empty (the
append statement implies that the block may be empty). The comment lies
about CRLF terminator because CR may be still missing in longer blocks.
Moreover, I suspect you do not need this code at all if the loop below
is fixed!


> +    do {
> +        SBuf line;
> +        if (tok.prefix(line, nonCRLF))
> +            mimeHeaderBlock_.append(line);
> +
> +        // the skipAll() might erase bare-CR not part of a CR*LF
> +        SBuf savePoint(tok.remaining());
> +        auto crLen = tok.skipAll(CharacterSet::CR);
> +
> +        if (tok.skipOne(CharacterSet::LF)) {
> +            if (tok.skipAll(CharacterSet::WSP))
> +                mimeHeaderBlock_.append(' ');
> +            else
> +                mimeHeaderBlock_.append(Http1::CrLf());
> +
> +        } else if (crLen) {
> +            // TODO: it might be better to replace the CR with one SP, but for now preserve
> +            // preserve the bare-CR
> +            tok.reset(savePoint);
> +            if (tok.prefix(line, CharacterSet::CR))
> +                mimeHeaderBlock_.append(line);
> +        }
> +
> +    } while (!tok.atEnd());


The above loop requires a genius to fully comprehend. I certainly cannot
validate it. I think the whole method ought to be much simpler/clearer.
Something along these lines:

    Tokenizer tok(mimeHeaderBlock_);
    mimeHeaderBlock_.clear();
    SBuf prefix;
    while (tokenBeforeObsfolds(tk, prefix)) {
        mimeHeaderBlock_.append(prefix);
        mimeHeaderBlock_.append(' ');
    }
    mimeHeaderBlock_.append(tk.remaining());

We already have a Tokenizer::token(prefix, delimiters) method. The
tokenBeforeObsfolds() call above is exactly the same except the
delimiters pattern is more complex. See attachment for a simple
implementation and review/adjust as needed, of course. I have not tested
my sketch.


> +const SBuf &CrLf();

I suspect this should be defined inline for performance reasons.
Inlining may result in duplication of a few CRLF SBufs, but that is
probably better than the expense of calling a function every time we
need CRLF. I expect the number of use cases to grow as we polish the code.


> But the savings is not enough to be bothered with AFAICS. 

There are no performance savings or losses here to worry about. The code
in question is "fast enough" for the non-performance-sensitive path it
is on because containsObsFold ought to be false in the vast majority of
cases. This code should be optimized for quality, not speed.

Alex.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: unfoldMime.cc
Type: text/x-c++src
Size: 1673 bytes
Desc: not available
URL: <http://lists.squid-cache.org/pipermail/squid-dev/attachments/20160511/c00da138/attachment.cc>


More information about the squid-dev mailing list