[squid-dev] [PATCH] HttpHeader migration (coverity-fixes branch)

Kinkie gkinkie at gmail.com
Wed Aug 5 13:58:05 UTC 2015


On Wed, Aug 5, 2015 at 4:08 AM, Amos Jeffries <squid3 at treenet.co.nz> wrote:

> On 5/08/2015 9:08 a.m., Kinkie wrote:
> > On Tue, Aug 4, 2015 at 2:58 PM, Amos Jeffries wrote:
> >
> >> On 4/08/2015 11:22 p.m., Kinkie wrote:
> >>> Hi all,
> >>>   the attached patch is a build- and run-tested merge proposal for next
> >>> round of the coverity-fixes branch, currently focusing on more
> effective
> >>> header name -> id lookups.
> >>>
> >>> Here's the status with the current todo checklist:
> >>>
> >>>  * DONE 1. shift HDR_BAD_HDR to end of enum
> >>>  * DONE 2. shift headers data array to http/RegistredHeaders.cc
> >>>  * DONE 3. creatign LookupTable object from teh enum and array
> >>>  * (with HDR_BAD_HDR as invalid value)
> >>>  * DONE 4. replacing httpHeaderIdByName() uses with the lookup table
> >>>  * NOT POSSIBLE 5. merge HDR_BAD_HDR and HDR_ENUM_END into one thing -
> >>> HDR_ENUM_END is overloaded meaning "All" headers in Manglers.
> >>>  * DONE 6. replacing httpHeaderNameById with direct array lookups
> >>>  * DONE 7. being looking at the other arrays removal
> >>>
> >>> In working on this I found out several instances of enum abuses -
> >> tracking
> >>> those down has been the hardest part of the effort.
> >>> HttpHeader::parse is being used to parse error page templates - thus
> the
> >>> relaxed any_registered_header() checks in some methods, e.g.
> >>> HttpHdeader::addEntry().
> >>>
> >>> Next steps, if there is consensus:
> >>> - moving LookupTable away from std::map to std::hash with custom
> >>> gperf-derived Hashers for extra boost
> >>> - investigating whether strongly-typed enums can be used instead of
> >> C-style
> >>> enums in more places.
> >>> - moving away from homegrown bitfields (CBIT_TEST etc.) towards
> >>> std::vector<bool> or std::vector<unsigned char>, possibly via a class
> >>> bitfield or somesuch.
> >>>
> >>
> >> audit results:
> >>
> >> * httpHdrCcCleanModule() and httpHdrScCleanModule() can both be fully
> >> deleted now.
> >>
> >
> > Yes. I chose to let them in: they are NOPs, not in the critical path, and
> > may be useful in the future. Let me know if you still want them removed.
> >
>
> :-( more 'dead' code. This kind of thing is useful for
> ctpr/dtor/functor/virtual definition. But C functions that are not even
> used as functors is a waste of LOC and also compiler time dealing with
> the symbols.
>  Its minor but still technical debt. The long-term plan is also to use
> RegisteredRunners for this type of thing not C functions.
>

Ok. Removed all empty module cleanup functions - which are only invoked if
LEAKCHECK is enabled anyway.

>> * LookupTableRecord should be a class.
> >>  - custom storage types can then inherit from it, with it as the first
> >> parent
> >>
> >
> > Well, templatization can prevent that need, but sure. Should I implement
> > that?
> > Isn't nowadays a struct == all-public class though? It looks odd, but
> > functionally is the same
> >
>
> Avoid "looks odd" whenever possible :-) as you say, the compiler dont
> care. But this is about us humans quick and easy reading it in 10 years
> time.
>
> I think its okay for useful code optimizations. Bt then the design
> choice needs documenting so nobody breaks or undoes it casually.
>

Ok.


> >
> >> in src/HttpHeader.cc:
> >>
> >> * can you move the headerLookupTable to src/http/RegisteredHeaders.cc
> >> and the Http:: namespace as well please ?
> >>
> >
> > Moved to RegisteredHeaders and renamed HDR_FOO to Http::HdrType::FOO .
> > Moving to a strongly-typed enum is unfortunately not feasible as eCAP
> > requires integer-enum equivalence; it may be that the whole change has to
> > be reverted.
> >
>
> I thought "enum class" with first parameter value assigned 0
> ("HDR_ACCEPT = 0,") should do that. ie. strongly typed to unsigned
> integer values.
>

No.
strongly-typed == doesn't automatically cast to int, if you use
strongly-typed you have to static_cast, while for old-style enum casting is
automatic.

in c++11 the full enum syntax (square brackets mean optional) is
enum [ class ] enumName [ : storage_specification ] {
  enum_elem_1 [ = elem-1-representation ],
  ...
};

where storage_specification is the underlying integral data type.
Using the optional "class" specification disables auto-cast-to-int.
Elements can then be referenced by enumName::enum_elem_name , where the
enumName specification is mandatory for "enum class", optional for
old-style enum.

BTW: I've tried changing the underlying storage to unsigned, and I get
asserts in masks calculation.


> >> * any_registered_header() is wrong.
> >>  - it matches for HDR_OTHER which by definition is a non-registered
> header
> >>  - assert_eid() is equivalent to any_valid_header()
> >>
> >
> > Then we need to change name. We need two semantics:
> > 1) any header which is valid and defined (including OTHER)
> > 2) any header ID which will not go out-of-bounds (same as (1) +
> HDR_BAD_HDR)
> >
> > any suggestion?
> >
>
> (1) is already any_valid_header().
>
> (2) any_HdrType_enum_value()
>
> But where/why do we need (2) ?
>  Any value input or received that is non-(1) needs to be represented as
> BAD_HDR when parsing/validating the input to an enum value.
>

It's sprinkled all over the place, mostly meant to protect against
int-related abuses which may end up causing out of bounds in arrays,
vectors and masks. Moving to enum class would mean implicit safety in this
regard, but would require accessor functions to ensure consistent checks in
typecasts. Hello again HdrIdFromName and HdrDescFromId. I'm cool with it,
if that's a wise design decision in your opinion. It'd save the need of a
dozen asserts or so.


> Are you looking at things HttpHeader.cc line ~524:
>
> AA)
>
> +  if (e->id >= Http::HdrType::ENUM_END) {
>     debugs(55, DBG_CRITICAL, "BUG: invalid entry (" << e->id << ")...
>
> That should probably be if(!any_valid_header(e->id)) since BAD is also
> invalid entry value for a header.
>

aha, got it and turned in all places where ENUM_END is used to this end.


> If the logic there explicitly requires BAD handling, it should be:
>   // some reason for why BAD is accepted as 'valid'.
>   if(!any_valid_header(e->id) && e->id != BAD)
>
> Although note that when parsing is fixed BAD will represent *any*
> invalid value.
>
>
> or BB) the header manglers "All" ?
>
> For that I think we need a new enum value like "Other,"  ("All," ?)
> which is outside registered headers, but valid only as an enum entry.
> Currently END, but that needs a rename and never to be used by
> non-mangler code.
>




>
>
> >>
> >>  - please consider removing the assert(assert(any_valid_header(id)))
> >>    it could probably be replaced by:
> >>       if (!assert(any_valid_header(id)))
> >>           id = HDR_OTHER
> >>
> >
> > Huh? I can't find any nested assert
> >
>
> copy-paste error on my part. It is the assert you are adding three lines
> below what is now "got hdr-id=" in the chunk @1512.
>
> I can't see the function name, but it looks like parsing some input.
> Avoid assert in all parsers. Output BAD or OTHER as appropriate instead.
>

It's HttpHeader::parse.
That assert is actually useless: it's set by a HeaderLookupTable.lookup.
It's guaranteed to be valid (if BAD, and the BAD case is handled by
reassigning to OTHER.
Removing that assert full stop.


> I been thinking a bit more (sorry dangerous).
>
> It may be worth adjusting the src/mk-string-arrays.awk script to also be
> capable of output LookupTable<X>::Record arrays. That script is designed
> to guarantee invariance. Though it does leave us with another .cc.
> Anyhow thats a followup to think about, not in this scope.
>

I agree on it not being in scope. Also because if we wish to use gperf to
generate the perfect hashes, then gperf could (and thus should) do that for
us (see
http://www.gnu.org/software/gperf/manual/html_node/User_002dsupplied-Struct.html#User_002dsupplied-Struct
)


> second round audit:
>
> in src/HttpHdrCc.cc:
>
> * valid_id in httpHdrCcStatDumper() can probably be a const bool like
> the other.
>

Yes. Done.

in src/HttpHdrSc.cc:
>
> * useless include of dlink.h since its done in the .h
>

Ok.


>
> * a duplicate include of HttpHdrSc.h since its pulled in by
> HttpHdrScTarget.h.
>  - but in this case I think replece it with:
>   //#include "HttpHdrSc.h" // pulled in by HttpHdrScTarget.h
>

Done. Good idea, it'd look odd otherwise.


>
> in src/HttpHdrSc.h:
>
> * please use mem/forward.h instead of mem/AllocatorProxy.h
>

Ok


>
> * please store the class pre-defines alphabetically.
>  - Packable above StatHist.
>

Done.

>
>
> in src/HttpHdrScTarget.cc:
>
> * useless include of HttpHdrSc.h
>  - thanks for fixing the missing HttpHdrScTarget.h
>

Done. NP :)


> in src/HttpHdrScTarget.h:
>
> * now has useless include of mem/forward.h (after above fix),
> SquidString.h, dlink.h, typedefs.h
>
> Ok.


> in src/HttpHeader.cc:
>
> * please take the opportunity to remove useless includes overlapping
> with HttpHdrCc.h and HttpHdrSc.h
>

Done. I've commented them and mentioned the indirect inclusion in a comment.


> * please replace include fof HttpHdrSc.h with HttpHdrScTarget.h in the
> new include nesting.
>  - also checking for duplicate includes between HttpHdrScTarget.h and
> HttpHeader.cc
>

Ok.


> * chunk @562, s/NULL/nullptr/
>  - maybe others in the touched lines
>

Done.


>
> * HttpHeaderEntry dtor use of assert(any_valid_header(id)); at the top
> is needless.
>  - protect the stats accounting against BAD though
>

Done.


>
> * HttpHeaderEntry ctor accepts OTHER but also "anId != BAD" implies that
> it accepts BAD as well despite the previous
> assert(any_registered_header(anId)).
>  - wrap the stats accounting in if (id != BAD).
>

done


>
>  - fixing those ctor/dtor asserts will avoid any potential for assertion
> from temporary default- or partially- constructed objects, or bad
> emplacement destructions.
>

..which are not there, or we'd assert all over the place. But sure.


>
>
> in src/acl/RequestHeaderStrategy.h:
>
>  * please take the opportunity to remove that empty line between
> 'template' and 'class' at lines at ~14
>

Done


> in src/http/RegisteredHeaders.cc:
>
> * please add empty line between namespace '{' and start of comment.
>

Ok


>
>
> in src/http/RegisteredHeaders.h:
>
> * please document on HeaderTable that lookup() will return BAD if the
> header is unknown / unregistered. callers need to check and handle that
> case properly.
>

Ok.


> I think that is all. And all polish :-)
>
> +1 on this so far. Though you/we still need to sort the valid registered
> ID tests before commit.
>

I think I have fixed those, although it's hard to be sure - layering
violations all over the place.


> NOTE: these are about old bugs you are uncovering. I'd like to take the
> opportunity to fix them. But this is already huge, so for now just XXX
> mark it and followup patch.
>


Like many other areas, there is (plenty of) room for improvement :)
If not performance-wise, certainly code-wise.
I'd leave them as a follow-up effort.


>
> * I'm still not clear on why insertEntry() checks for valid header enum
> values but addEntry() needs to accept BAD.
>  - if that need is true its worth a comment to document. otherwise
> update top of addEntry() to match insertEntry().
>  - NP: we should *never* be adding a BAD/invalid header entry to a
> message object. Custom ones yes, but they are OTHER.
>
>
> * like has() all the has/put/get*(ID, ...) methods need to exclude BAD
> and OTHER as a valid inputs.
>  - OTHER means we are accessing and using the Entry by string name, not
> by ID value.
>  - BAD means we should never have put it into object in the first place.
>  - the put*(ID, ...) accepting OTHER may be where addEntry(ID,...) goes
> wrong.
>  - sorry these seem to have got caught in cleaning up one of my earlier
> comments which was about another changed use of assert_eid().
> assert_eid() test was wrong for these get/put/has*(ID,...) to begin with.
>
>
> [[ httpHeaderFieldStatDumper() earning itself a place in the hall of
> horrors :-(
>   ..., double val, ...
>   const int id = static_cast<int>(val);
>   const bool valid_id = id < Http::HdrType::ENUM_END;
>  // back away, quietly ...
> ]]
>
>

-- 
    Francesco
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squid-cache.org/pipermail/squid-dev/attachments/20150805/66370516/attachment-0001.html>


More information about the squid-dev mailing list