<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Aug 21, 2015 at 6:39 PM, Alex Rousskov <span dir="ltr"><<a href="mailto:rousskov@measurement-factory.com" target="_blank">rousskov@measurement-factory.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 08/21/2015 04:37 AM, Kinkie wrote:<br>

<br>

>   the attached patch does:<br>

<br>

... a long list of mostly unrelated changes ...<br>

<br>

Sigh. Some of the patch changes are not necessary at all. Some are<br>

useful. Some should be redone to minimize conflicts. Some are<br>

performance optimizations that should be considered very carefully. It<br>

would take too much time to sift through all of this and separate the<br>

good from the bad, so we would have to accept pretty much all of them,<br>

create a lot of conflicts, and probably miss bugs.<br></blockquote><div><br></div><div>Sorry about that.<br></div><div>To defend the code, I'll comment that this has been extensively build-, polygraph- coadvisor- and run-tested. It's hard - if possible at all - to cover all code paths, I've done my best to be very cautious.<br><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Sometimes, changing a lot of code is necessary. There is no way around<br>

that. However, combining many unrelated changes, some of which change a<br>

lot of code, and some of which are performance-sensitive or bug-prone,<br>

is virtually never necessary.<br>

<br>

<br>

> -        Tolower(name);<br>

> -<br>

> -        if (strcmp(name, "*") == 0) {<br>

> -            /* Can not handle "Vary: *" withtout ETag support */<br>

> -            safe_free(name);<br>

> +        SBuf name(item, ilen);<br>

> +        if (name.cmp("*",1) == 0) {<br>

>              vstr.clean();<br>

>              break;<br>

>          }<br>

> -<br>

> -        strListAdd(&vstr, name, ',');<br>

> +        name.toLower();<br>

<br>

Do you know why we need to convert header names to lowercase here? The<br>

header names are not a part of some Store hash, right? This conversion<br>

is expensive because most header names will be changed here, breaking<br>

SBuf reuse. If you know the reason, let's add a comment since you are<br>

changing this code anyway.<br></blockquote><div><br></div><div>(pseudo) code removing memory management was:<br></div><div>while (e = get_list_element()) {<br></div><div>   e = lowerCase(e)<br></div><div>   if (e == "*") {<br></div><div>     abort_processing()<br></div><div>     break<br>   }<br></div><div>   vstr.append(e)<br></div><div>   //other stuff<br>}<br><br></div><div>new code is:<br></div><div>while (e = get_list_element()) {<br></div><div>  if (e == "*") {<br>    abort_processing()</div><div>    break<br>  }<br></div><div>  e = lowerCase(e)<br></div><div>  vstr.append(e)<br></div><div>  //some other stuff as above<br></div><div>}<br><br></div><div>Since lowerCase("*") == "*", I'd hope that the behavior is unchanged, if anything it's a small optimization in a hopefully uncommon case (no change in the common case)<br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<span class=""><br>

<br>

>     in src/http.cc:<br>

><br>

>     * please mark use of name.c_str() with "XXX: performance regression,<br>

>     c_str() reallocates"<br>

><br>

><br>

> Actually, no in this case it doesn't unless ilen is exactly equal to one<br>

> of the MemBlob MemPool sizes (possible but unlikely, I estimate in the<br>

> 0.1% range).<br>

<br>

</span>*If* we remove toLower(), then there will be other, very common cases<br>

where c_str() will reallocate in this context, either right now or when<br>

the surrounding code is optimized to be more SBuf-friendly. Without<br>

toLower(), the "name" SBuf should be pointing in the middle of another<br>

SBuf in most cases where name was already in lower case.<br></blockquote><div><br></div><div>in the general case, I agree. I also hope that by the time the general case comes, we won't need this very specific c_str().<br></div><div>In this case:<br></div><div>SBuf is allocated, lower-cased (guaranteed in place due to single reference) and then - if there's room which should be a statistically common case - a ghost \0 is appended to ensure termination.<br><br>I would honor Amos request for an XXX. In that XXX, I would not use the<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

word "regression" because this change may not make performance worse<br>

than it was, but it is still bad code from performance point of view and<br>

should eventually be changed/optimized. I would write something like:<br>

<br>

  // XXX: will reallocate if toLower() above is removed<br></blockquote><div><br></div><div>.. and strListGetItem will start emitting SBufs in place of (char*, length). <br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

> -        if (strcmp(name, "*") == 0) {<br>

> -            /* Can not handle "Vary: *" withtout ETag support */<br>

> -            safe_free(name);<br>

> +        SBuf name(item, ilen);<br>

> +        if (name.cmp("*",1) == 0) {<br>

<br>

AFAICT, the new code looks for names starting with "*". The old code was<br>

looking for "*" names.<br></blockquote><div><br></div><div>doh! you're right.<br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

> -    while ((e = getEntry(&pos))) {<br>

> -        if (e->id == id)<br>

> -            delAt(pos, count);<br>

> -    }<br>

> +    int sz_before = entries.size();<br>

> +    entries.erase(std::remove_if(entries.begin(), entries.end(),<br>

> +    [=](HttpHeaderEntry *e) { return e && e->id == id; }),<br>

> +    entries.end());<br>

<br>

This change is a performance regression AFAICT: Unlike the old delAt(),<br>

std::remove_if is very expensive for std::vector. Consider using<br>

std::replace_if without .erasing instead.<br></blockquote><div><br></div>yes, it is a small regresson (from N to 2N<br>). Reimplemented using replace_if - readability goes a bit down the drain, unfortunatley.<br></div><div class="gmail_quote"><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

> +enum HdrKind {<br>

<br>

Use C++11 class enums for global enums, especially when such use does<br>

not increase the amount of old code changes. In this specific case, it<br>

does not AFAICT.<br></blockquote><div><br></div><div>Unfortunately it's not possible in ordeer to use it as a bit-field (I have tried and the compiler refused)<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

<br>

> -    for(HttpHeaderEntry *e : entries) {<br>

> +    for (HttpHeaderEntry *e : entries) {<br>

<br>

> -<br>

> -    return NULL;        /* not reached */<br>

> +    return nullptr;        /* not reached */<br>

<br>

>      /* check mask first */<br>

> -<br>

>      if (!CBIT_TEST(mask, id))<br>

>          return NULL;<br>

<br>

...<br>

<br>

Avoid out of scope changes, especially whitespace changes and NULL renaming.<br></blockquote><div><br></div><div>I can't check it before replying to this mail, but will fix before merge.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

<br>

> +    assert(0);<br>

<br>

s/0/false/<br>

<br>

<br>

> +    int sz_before = entries.size();<br>

> +    entries.erase(std::remove_if(entries.begin(), entries.end(),<br>

> +    [=](HttpHeaderEntry *e) { return e && e->id == id; }),<br>

> +    entries.end());<br>

> +    int count = entries.size() - sz_before;<br>

<br>

If you can make sz_before, *e, and/or count const, make them const.<br></blockquote><div><br></div><div>Have to use a side-effect in the predicate if I want to use replace_if.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

<br>

> +        if (e != nullptr && e->id == id)<br>

<br>

You can save a lot of ink by writing "e" instead of "e != nullptr".<br>

<br>

<br>

As always, please post patches with at least 20 context lines to<br>

increase chances of reviewers spotting bugs.<span class="HOEnZb"><font color="#888888"><br></font></span></blockquote><div><br></div><div> New patch attached according to spec.<br></div><div>"bzr send" unfortunately doesn't allow to specify context to be added :(<br><br></div><div>Thank you!<br></div></div><br>-- <br><div class="gmail_signature">    Francesco</div>

</div></div>