[squid-dev] [PATCH] HttpHeader migration (coverity-fixes branch)

Kinkie gkinkie at gmail.com
Thu Aug 6 07:31:15 UTC 2015


Merged as revision 14210 with some minor added polish (ostream<< for
HttpHdrCcType and Http::HdrType).

On Wed, Aug 5, 2015 at 3:58 PM, Kinkie <gkinkie at gmail.com> wrote:

>
>
> On Wed, Aug 5, 2015 at 4:08 AM, Amos Jeffries <squid3 at treenet.co.nz>
> wrote:
>
>> On 5/08/2015 9:08 a.m., Kinkie wrote:
>> > On Tue, Aug 4, 2015 at 2:58 PM, Amos Jeffries wrote:
>> >
>> >> On 4/08/2015 11:22 p.m., Kinkie wrote:
>> >>> Hi all,
>> >>>   the attached patch is a build- and run-tested merge proposal for
>> next
>> >>> round of the coverity-fixes branch, currently focusing on more
>> effective
>> >>> header name -> id lookups.
>> >>>
>> >>> Here's the status with the current todo checklist:
>> >>>
>> >>>  * DONE 1. shift HDR_BAD_HDR to end of enum
>> >>>  * DONE 2. shift headers data array to http/RegistredHeaders.cc
>> >>>  * DONE 3. creatign LookupTable object from teh enum and array
>> >>>  * (with HDR_BAD_HDR as invalid value)
>> >>>  * DONE 4. replacing httpHeaderIdByName() uses with the lookup table
>> >>>  * NOT POSSIBLE 5. merge HDR_BAD_HDR and HDR_ENUM_END into one thing -
>> >>> HDR_ENUM_END is overloaded meaning "All" headers in Manglers.
>> >>>  * DONE 6. replacing httpHeaderNameById with direct array lookups
>> >>>  * DONE 7. being looking at the other arrays removal
>> >>>
>> >>> In working on this I found out several instances of enum abuses -
>> >> tracking
>> >>> those down has been the hardest part of the effort.
>> >>> HttpHeader::parse is being used to parse error page templates - thus
>> the
>> >>> relaxed any_registered_header() checks in some methods, e.g.
>> >>> HttpHdeader::addEntry().
>> >>>
>> >>> Next steps, if there is consensus:
>> >>> - moving LookupTable away from std::map to std::hash with custom
>> >>> gperf-derived Hashers for extra boost
>> >>> - investigating whether strongly-typed enums can be used instead of
>> >> C-style
>> >>> enums in more places.
>> >>> - moving away from homegrown bitfields (CBIT_TEST etc.) towards
>> >>> std::vector<bool> or std::vector<unsigned char>, possibly via a class
>> >>> bitfield or somesuch.
>> >>>
>> >>
>> >> audit results:
>> >>
>> >> * httpHdrCcCleanModule() and httpHdrScCleanModule() can both be fully
>> >> deleted now.
>> >>
>> >
>> > Yes. I chose to let them in: they are NOPs, not in the critical path,
>> and
>> > may be useful in the future. Let me know if you still want them removed.
>> >
>>
>> :-( more 'dead' code. This kind of thing is useful for
>> ctpr/dtor/functor/virtual definition. But C functions that are not even
>> used as functors is a waste of LOC and also compiler time dealing with
>> the symbols.
>>  Its minor but still technical debt. The long-term plan is also to use
>> RegisteredRunners for this type of thing not C functions.
>>
>
> Ok. Removed all empty module cleanup functions - which are only invoked if
> LEAKCHECK is enabled anyway.
>
> >> * LookupTableRecord should be a class.
>> >>  - custom storage types can then inherit from it, with it as the first
>> >> parent
>> >>
>> >
>> > Well, templatization can prevent that need, but sure. Should I implement
>> > that?
>> > Isn't nowadays a struct == all-public class though? It looks odd, but
>> > functionally is the same
>> >
>>
>> Avoid "looks odd" whenever possible :-) as you say, the compiler dont
>> care. But this is about us humans quick and easy reading it in 10 years
>> time.
>>
>> I think its okay for useful code optimizations. Bt then the design
>> choice needs documenting so nobody breaks or undoes it casually.
>>
>
> Ok.
>
>
>> >
>> >> in src/HttpHeader.cc:
>> >>
>> >> * can you move the headerLookupTable to src/http/RegisteredHeaders.cc
>> >> and the Http:: namespace as well please ?
>> >>
>> >
>> > Moved to RegisteredHeaders and renamed HDR_FOO to Http::HdrType::FOO .
>> > Moving to a strongly-typed enum is unfortunately not feasible as eCAP
>> > requires integer-enum equivalence; it may be that the whole change has
>> to
>> > be reverted.
>> >
>>
>> I thought "enum class" with first parameter value assigned 0
>> ("HDR_ACCEPT = 0,") should do that. ie. strongly typed to unsigned
>> integer values.
>>
>
> No.
> strongly-typed == doesn't automatically cast to int, if you use
> strongly-typed you have to static_cast, while for old-style enum casting is
> automatic.
>
> in c++11 the full enum syntax (square brackets mean optional) is
> enum [ class ] enumName [ : storage_specification ] {
>   enum_elem_1 [ = elem-1-representation ],
>   ...
> };
>
> where storage_specification is the underlying integral data type.
> Using the optional "class" specification disables auto-cast-to-int.
> Elements can then be referenced by enumName::enum_elem_name , where the
> enumName specification is mandatory for "enum class", optional for
> old-style enum.
>
> BTW: I've tried changing the underlying storage to unsigned, and I get
> asserts in masks calculation.
>
>
>> >> * any_registered_header() is wrong.
>> >>  - it matches for HDR_OTHER which by definition is a non-registered
>> header
>> >>  - assert_eid() is equivalent to any_valid_header()
>> >>
>> >
>> > Then we need to change name. We need two semantics:
>> > 1) any header which is valid and defined (including OTHER)
>> > 2) any header ID which will not go out-of-bounds (same as (1) +
>> HDR_BAD_HDR)
>> >
>> > any suggestion?
>> >
>>
>> (1) is already any_valid_header().
>>
>> (2) any_HdrType_enum_value()
>>
>> But where/why do we need (2) ?
>>  Any value input or received that is non-(1) needs to be represented as
>> BAD_HDR when parsing/validating the input to an enum value.
>>
>
> It's sprinkled all over the place, mostly meant to protect against
> int-related abuses which may end up causing out of bounds in arrays,
> vectors and masks. Moving to enum class would mean implicit safety in this
> regard, but would require accessor functions to ensure consistent checks in
> typecasts. Hello again HdrIdFromName and HdrDescFromId. I'm cool with it,
> if that's a wise design decision in your opinion. It'd save the need of a
> dozen asserts or so.
>
>
>> Are you looking at things HttpHeader.cc line ~524:
>>
>> AA)
>>
>> +  if (e->id >= Http::HdrType::ENUM_END) {
>>     debugs(55, DBG_CRITICAL, "BUG: invalid entry (" << e->id << ")...
>>
>> That should probably be if(!any_valid_header(e->id)) since BAD is also
>> invalid entry value for a header.
>>
>
> aha, got it and turned in all places where ENUM_END is used to this end.
>
>
>> If the logic there explicitly requires BAD handling, it should be:
>>   // some reason for why BAD is accepted as 'valid'.
>>   if(!any_valid_header(e->id) && e->id != BAD)
>>
>> Although note that when parsing is fixed BAD will represent *any*
>> invalid value.
>>
>>
>> or BB) the header manglers "All" ?
>>
>> For that I think we need a new enum value like "Other,"  ("All," ?)
>> which is outside registered headers, but valid only as an enum entry.
>> Currently END, but that needs a rename and never to be used by
>> non-mangler code.
>>
>
>
>
>
>>
>>
>> >>
>> >>  - please consider removing the assert(assert(any_valid_header(id)))
>> >>    it could probably be replaced by:
>> >>       if (!assert(any_valid_header(id)))
>> >>           id = HDR_OTHER
>> >>
>> >
>> > Huh? I can't find any nested assert
>> >
>>
>> copy-paste error on my part. It is the assert you are adding three lines
>> below what is now "got hdr-id=" in the chunk @1512.
>>
>> I can't see the function name, but it looks like parsing some input.
>> Avoid assert in all parsers. Output BAD or OTHER as appropriate instead.
>>
>
> It's HttpHeader::parse.
> That assert is actually useless: it's set by a HeaderLookupTable.lookup.
> It's guaranteed to be valid (if BAD, and the BAD case is handled by
> reassigning to OTHER.
> Removing that assert full stop.
>
>
>> I been thinking a bit more (sorry dangerous).
>>
>> It may be worth adjusting the src/mk-string-arrays.awk script to also be
>> capable of output LookupTable<X>::Record arrays. That script is designed
>> to guarantee invariance. Though it does leave us with another .cc.
>> Anyhow thats a followup to think about, not in this scope.
>>
>
> I agree on it not being in scope. Also because if we wish to use gperf to
> generate the perfect hashes, then gperf could (and thus should) do that for
> us (see
> http://www.gnu.org/software/gperf/manual/html_node/User_002dsupplied-Struct.html#User_002dsupplied-Struct
> )
>
>
>> second round audit:
>>
>> in src/HttpHdrCc.cc:
>>
>> * valid_id in httpHdrCcStatDumper() can probably be a const bool like
>> the other.
>>
>
> Yes. Done.
>
> in src/HttpHdrSc.cc:
>>
>> * useless include of dlink.h since its done in the .h
>>
>
> Ok.
>
>
>>
>> * a duplicate include of HttpHdrSc.h since its pulled in by
>> HttpHdrScTarget.h.
>>  - but in this case I think replece it with:
>>   //#include "HttpHdrSc.h" // pulled in by HttpHdrScTarget.h
>>
>
> Done. Good idea, it'd look odd otherwise.
>
>
>>
>> in src/HttpHdrSc.h:
>>
>> * please use mem/forward.h instead of mem/AllocatorProxy.h
>>
>
> Ok
>
>
>>
>> * please store the class pre-defines alphabetically.
>>  - Packable above StatHist.
>>
>
> Done.
>
>>
>>
>> in src/HttpHdrScTarget.cc:
>>
>> * useless include of HttpHdrSc.h
>>  - thanks for fixing the missing HttpHdrScTarget.h
>>
>
> Done. NP :)
>
>
>> in src/HttpHdrScTarget.h:
>>
>> * now has useless include of mem/forward.h (after above fix),
>> SquidString.h, dlink.h, typedefs.h
>>
>> Ok.
>
>
>> in src/HttpHeader.cc:
>>
>> * please take the opportunity to remove useless includes overlapping
>> with HttpHdrCc.h and HttpHdrSc.h
>>
>
> Done. I've commented them and mentioned the indirect inclusion in a
> comment.
>
>
>> * please replace include fof HttpHdrSc.h with HttpHdrScTarget.h in the
>> new include nesting.
>>  - also checking for duplicate includes between HttpHdrScTarget.h and
>> HttpHeader.cc
>>
>
> Ok.
>
>
>> * chunk @562, s/NULL/nullptr/
>>  - maybe others in the touched lines
>>
>
> Done.
>
>
>>
>> * HttpHeaderEntry dtor use of assert(any_valid_header(id)); at the top
>> is needless.
>>  - protect the stats accounting against BAD though
>>
>
> Done.
>
>
>>
>> * HttpHeaderEntry ctor accepts OTHER but also "anId != BAD" implies that
>> it accepts BAD as well despite the previous
>> assert(any_registered_header(anId)).
>>  - wrap the stats accounting in if (id != BAD).
>>
>
> done
>
>
>>
>>  - fixing those ctor/dtor asserts will avoid any potential for assertion
>> from temporary default- or partially- constructed objects, or bad
>> emplacement destructions.
>>
>
> ..which are not there, or we'd assert all over the place. But sure.
>
>
>>
>>
>> in src/acl/RequestHeaderStrategy.h:
>>
>>  * please take the opportunity to remove that empty line between
>> 'template' and 'class' at lines at ~14
>>
>
> Done
>
>
>> in src/http/RegisteredHeaders.cc:
>>
>> * please add empty line between namespace '{' and start of comment.
>>
>
> Ok
>
>
>>
>>
>> in src/http/RegisteredHeaders.h:
>>
>> * please document on HeaderTable that lookup() will return BAD if the
>> header is unknown / unregistered. callers need to check and handle that
>> case properly.
>>
>
> Ok.
>
>
>> I think that is all. And all polish :-)
>>
>> +1 on this so far. Though you/we still need to sort the valid registered
>> ID tests before commit.
>>
>
> I think I have fixed those, although it's hard to be sure - layering
> violations all over the place.
>
>
>> NOTE: these are about old bugs you are uncovering. I'd like to take the
>> opportunity to fix them. But this is already huge, so for now just XXX
>> mark it and followup patch.
>>
>
>
> Like many other areas, there is (plenty of) room for improvement :)
> If not performance-wise, certainly code-wise.
> I'd leave them as a follow-up effort.
>
>
>>
>> * I'm still not clear on why insertEntry() checks for valid header enum
>> values but addEntry() needs to accept BAD.
>>  - if that need is true its worth a comment to document. otherwise
>> update top of addEntry() to match insertEntry().
>>  - NP: we should *never* be adding a BAD/invalid header entry to a
>> message object. Custom ones yes, but they are OTHER.
>>
>>
>> * like has() all the has/put/get*(ID, ...) methods need to exclude BAD
>> and OTHER as a valid inputs.
>>  - OTHER means we are accessing and using the Entry by string name, not
>> by ID value.
>>  - BAD means we should never have put it into object in the first place.
>>  - the put*(ID, ...) accepting OTHER may be where addEntry(ID,...) goes
>> wrong.
>>  - sorry these seem to have got caught in cleaning up one of my earlier
>> comments which was about another changed use of assert_eid().
>> assert_eid() test was wrong for these get/put/has*(ID,...) to begin with.
>>
>>
>> [[ httpHeaderFieldStatDumper() earning itself a place in the hall of
>> horrors :-(
>>   ..., double val, ...
>>   const int id = static_cast<int>(val);
>>   const bool valid_id = id < Http::HdrType::ENUM_END;
>>  // back away, quietly ...
>> ]]
>>
>>
>
> --
>     Francesco
>



-- 
    Francesco
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squid-cache.org/pipermail/squid-dev/attachments/20150806/30d9ef23/attachment-0001.html>


More information about the squid-dev mailing list