[squid-dev] [WTF] HttpHeader strangenesses

Tue Aug 11 02:47:40 UTC 2015

On 11/08/2015 11:10 a.m., Alex Rousskov wrote:
> On 08/10/2015 02:13 PM, Kinkie wrote:
>> I'm going over HttpHeader to see if there's any possible improvements to
>> be obtained by dragging it (kicking and screaming, from the look of it)
>> into the 10's.
>>
>> Here's a list of weirdnesses I'm coming to find, hopefully someone has
>> any idea of WHY they are like this, or if they can be safely be removed:
>>
>> - HttpHeader::insertTime makes no sense. It's O(N) and it's always used
>> in two patterns: delById(HDR); insertTime(HDR) and if (!has(HDR)
>> insertTime(HDR)).
> 
> 
> Henrik may know why he went the extra mile to insert header Date at the
> beginning of the message as opposed to just appending it at the end
> (trunk r7037).
> 

IIRC its a HTTP/1.0 performance optimization.

Locating Host quickly avoids work in the inefficient getHost() pre-parse
function which is now Http::One::Parser::getHeaderField().

Locating Date header at the top avoids a similar scan to determine
cacheability and validity of received replies. AFAIK that is not a
useful now that HTTP/1.1 involve other headers than Date.

> If I have to guess, I would say that it might be related to the RFC 2616
> recommendation to order headers a certain way:
> 
>> it is "good practice" to send
>> general-header fields first, followed by request-header or response-
>> header fields, and ending with the entity-header fields.
> 
> 
> RFC 7230 updates that with:
> 
>> it is good practice to send
>> header fields that contain control data first, such as Host on
>> requests and Date on responses
> 

Yes. And HTTP/2 will add a third group of "pseudo" headers representing
URI segments with a MUST condition of being at the top of the header list.

So if we want to optimize it is probably best to optimize allowing this
type of insertion than to remove it completely. At least sorting in
blocks of { pseudo, control, entity, custom/unknown } header type. Exact
placement within each grouping is only important for list headers.

We probably want to be making the headers list into multiple lists and
just add/remove/append to the appropriate one. Squid will mostly not
have to deal with the entity or custom headers at all and it wastes time
searching through them for control headers that were already found
during parse.

We may also want to think about having the control headers as
first-class unique members of HttpHeader instead of using the list
accesses. For example a HdrDate HdrCc, HdrSc, Hdr member that stores the
date including a "not present" flag. Leaving the headers list only
containing the entity and unknown headers.
 Assembling the message buffer would then be a hard-coded sequence of {
member Host, member Date, member Cc, member ..., walk generic headers list }

> 
>> I've replaced with putTime everywhere
> 
> Sounds good to me. IIRC, at some point, we have tried to make "put"
> delete all old headers, but I doubt that idea survived. To make the code
> consistent, you may want to check that no remaining put methods delete
> old headers now (and adjust the code as needed if some still do).
> 

Seconded.

> 
>> - why doesn't HttpHeaderEntry use a copy-constructor, relying instead in
>> a clone() method?
> 
> The conversion from a C struct to a C++ class was incomplete, most
> likely because we did not know as much about C++ as we know now.
> Renaming the old httpHeaderEntryClone() into HttpHeaderEntry::clone()
> was probably a natural thing to do ten years ago (trunk r7589).
> 
> 
>> - why the strange loop in HttpHeader's copy-constructor?
> 
> What do you find strange about it? It just reuses a method to clone all
> the header fields from the other header. I could (should?) have
> optimized this by refactoring the cloning/second loop from update() into
> a separate cloneEntries() method and using just that.
> 
> 
>> Replaced with a "= default", it should be just fine..
> 
> But is it? AFAICT, the "default" (i.e., generated) copy constructor will
> not clone individual header entries. Perhaps you have added some magic
> to make that cloning happen now, but the old code did require an
> explicit cloning loop (the second loop in update(), see above).
> 

Nod. I find its best to think of default copy-constructor a memcpy()
(usually it is) and do the appropriate actions whenever that would not
be suitable.

> 
>> - statistics accounting: we count A LOT (see HttpHeader::clean) Is it
>> really worth it in a performance-critical code as this? Maybe we could
>> shuffle things around to HttpHeader and HttpHeaderEntry's ctor and dtor ..
> 
> The statics code is ugly and, IIRC, collects inaccurate stats in many
> cases. I am sure it can be significantly improved.
> 
> However, I would recommend going into the _opposite_ direction. Instead
> of trying to find the right low-level code to collect header stats (and
> then dealing with problems such as cloned headers screwing up all
> stats), I would add dedicated statistics-collection calls in a few
> specific places in the high-level code, where new headers are "accepted
> into Squid" and should be counted.
> 
> Neither approach will cover 100% of cases, but I think the high-level
> approach will have a lot less problems overall. As an added bonus, it
> would be easier to disable all or some statistics header collection that
> way.

Agreed. Parser is the only thing that should be stats accounting for
"seen" (aka received) headers. And that can do it at the same point it
adds the freshly parsed value to the HttpHeader object state.
 As a bonus it even knows the type of message the header was seen in for
request/reply and HTTP/ICAP/HTCP stats correctness.

Amos