[squid-dev] [PATCH] Coverity(-inspired) fixes part four, HttpHeader refactor

Fri Aug 21 17:31:48 UTC 2015

On Fri, Aug 21, 2015 at 6:39 PM, Alex Rousskov <
rousskov at measurement-factory.com> wrote:

> On 08/21/2015 04:37 AM, Kinkie wrote:
>
> >   the attached patch does:
>
> ... a long list of mostly unrelated changes ...
>
> Sigh. Some of the patch changes are not necessary at all. Some are
> useful. Some should be redone to minimize conflicts. Some are
> performance optimizations that should be considered very carefully. It
> would take too much time to sift through all of this and separate the
> good from the bad, so we would have to accept pretty much all of them,
> create a lot of conflicts, and probably miss bugs.
>

Sorry about that.
To defend the code, I'll comment that this has been extensively build-,
polygraph- coadvisor- and run-tested. It's hard - if possible at all - to
cover all code paths, I've done my best to be very cautious.

Sometimes, changing a lot of code is necessary. There is no way around
> that. However, combining many unrelated changes, some of which change a
> lot of code, and some of which are performance-sensitive or bug-prone,
> is virtually never necessary.
>
>
> > -        Tolower(name);
> > -
> > -        if (strcmp(name, "*") == 0) {
> > -            /* Can not handle "Vary: *" withtout ETag support */
> > -            safe_free(name);
> > +        SBuf name(item, ilen);
> > +        if (name.cmp("*",1) == 0) {
> >              vstr.clean();
> >              break;
> >          }
> > -
> > -        strListAdd(&vstr, name, ',');
> > +        name.toLower();
>
> Do you know why we need to convert header names to lowercase here? The
> header names are not a part of some Store hash, right? This conversion
> is expensive because most header names will be changed here, breaking
> SBuf reuse. If you know the reason, let's add a comment since you are
> changing this code anyway.
>

(pseudo) code removing memory management was:
while (e = get_list_element()) {
   e = lowerCase(e)
   if (e == "*") {
     abort_processing()
     break
   }
   vstr.append(e)
   //other stuff
}

new code is:
while (e = get_list_element()) {
  if (e == "*") {
    abort_processing()
    break
  }
  e = lowerCase(e)
  vstr.append(e)
  //some other stuff as above
}

Since lowerCase("*") == "*", I'd hope that the behavior is unchanged, if
anything it's a small optimization in a hopefully uncommon case (no change
in the common case)

>
> >     in src/http.cc:
> >
> >     * please mark use of name.c_str() with "XXX: performance regression,
> >     c_str() reallocates"
> >
> >
> > Actually, no in this case it doesn't unless ilen is exactly equal to one
> > of the MemBlob MemPool sizes (possible but unlikely, I estimate in the
> > 0.1% range).
>
> *If* we remove toLower(), then there will be other, very common cases
> where c_str() will reallocate in this context, either right now or when
> the surrounding code is optimized to be more SBuf-friendly. Without
> toLower(), the "name" SBuf should be pointing in the middle of another
> SBuf in most cases where name was already in lower case.
>

in the general case, I agree. I also hope that by the time the general case
comes, we won't need this very specific c_str().
In this case:
SBuf is allocated, lower-cased (guaranteed in place due to single
reference) and then - if there's room which should be a statistically
common case - a ghost \0 is appended to ensure termination.

I would honor Amos request for an XXX. In that XXX, I would not use the

> word "regression" because this change may not make performance worse
> than it was, but it is still bad code from performance point of view and
> should eventually be changed/optimized. I would write something like:
>
>   // XXX: will reallocate if toLower() above is removed
>

.. and strListGetItem will start emitting SBufs in place of (char*,
length).

>
> > -        if (strcmp(name, "*") == 0) {
> > -            /* Can not handle "Vary: *" withtout ETag support */
> > -            safe_free(name);
> > +        SBuf name(item, ilen);
> > +        if (name.cmp("*",1) == 0) {
>
> AFAICT, the new code looks for names starting with "*". The old code was
> looking for "*" names.
>

doh! you're right.

> > -    while ((e = getEntry(&pos))) {
> > -        if (e->id == id)
> > -            delAt(pos, count);
> > -    }
> > +    int sz_before = entries.size();
> > +    entries.erase(std::remove_if(entries.begin(), entries.end(),
> > +    [=](HttpHeaderEntry *e) { return e && e->id == id; }),
> > +    entries.end());
>
> This change is a performance regression AFAICT: Unlike the old delAt(),
> std::remove_if is very expensive for std::vector. Consider using
> std::replace_if without .erasing instead.
>

yes, it is a small regresson (from N to 2N
). Reimplemented using replace_if - readability goes a bit down the drain,
unfortunatley.

> > +enum HdrKind {
>
> Use C++11 class enums for global enums, especially when such use does
> not increase the amount of old code changes. In this specific case, it
> does not AFAICT.
>

Unfortunately it's not possible in ordeer to use it as a bit-field (I have
tried and the compiler refused)

>
>
> > -    for(HttpHeaderEntry *e : entries) {
> > +    for (HttpHeaderEntry *e : entries) {
>
> > -
> > -    return NULL;        /* not reached */
> > +    return nullptr;        /* not reached */
>
> >      /* check mask first */
> > -
> >      if (!CBIT_TEST(mask, id))
> >          return NULL;
>
> ...
>
> Avoid out of scope changes, especially whitespace changes and NULL
> renaming.
>

I can't check it before replying to this mail, but will fix before merge.

>
>
> > +    assert(0);
>
> s/0/false/
>
>
> > +    int sz_before = entries.size();
> > +    entries.erase(std::remove_if(entries.begin(), entries.end(),
> > +    [=](HttpHeaderEntry *e) { return e && e->id == id; }),
> > +    entries.end());
> > +    int count = entries.size() - sz_before;
>
> If you can make sz_before, *e, and/or count const, make them const.
>

Have to use a side-effect in the predicate if I want to use replace_if.

>
>
> > +        if (e != nullptr && e->id == id)
>
> You can save a lot of ink by writing "e" instead of "e != nullptr".
>
>
> As always, please post patches with at least 20 context lines to
> increase chances of reviewers spotting bugs.
>

 New patch attached according to spec.
"bzr send" unfortunately doesn't allow to specify context to be added :(

Thank you!

-- 
    Francesco
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squid-cache.org/pipermail/squid-dev/attachments/20150821/d0d7e13e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: coverity-fixes-4-v3.patch
Type: application/octet-stream
Size: 209809 bytes
Desc: not available
URL: <http://lists.squid-cache.org/pipermail/squid-dev/attachments/20150821/d0d7e13e/attachment-0001.obj>