[squid-dev] Care and feeding of ConnStateData

Amos Jeffries squid3 at treenet.co.nz
Thu Jul 7 04:52:42 UTC 2016


On 7/07/2016 10:24 a.m., Alex Rousskov wrote:
> Hello,
> 
>     Several committed, pending, and upcoming trunk changes revolve
> around ConnStateData-related classes. Audit disagreements, unaudited
> commits, and blocked changes in that area make progress painfully slow.
> This email proposes answers to the following blocking questions:
> 
>   Q1. What is ConnStateData?
>   Q2. Where does the pending Downloader class belong?
>   Q3. Where does the future Http::Two::Server class belong?
>   Q4. What to do with the existing src/servers/Server.h?
> 
> The answers are summarized at the end of this email if you do not want
> to read the whole thing, but you may want to at least scan the BB1-3
> definitions immediately below because they are used in the answers.
> 
> 
> I will start with several "obvious" Building Blocks or Squid parts with
> well-understood overall functionality that should not need much
> discussion and should not cause any controversy:
> 
> BB1: Servers or protocol request receivers [and response senders]:
>   * HTTP/1 Server (existing Http::One::Server and misplaced code)
>   * HTTP/2 Server (future Http::Two::Server)
>   * FTP server (existing Ftp::Server and misplaced code)
> 
> BB2: Protocol-agnostic request routing code:
>   * doCallouts() for access control, StoreID, ICAP, etc.
>   * HIT/MISS detection in client_side_*.cc
>   * FwdState
> 
> BB3: Clients or protocol request senders [and response receivers]:
>   * HTTP/1 Client (existing HttpStateData == future Http::One::Client)
>   * HTTP/2 Client (future Http::Two::Client)
>   * FTP Client (existing Ftp::Client)
> 
> The above list is not exhaustive (there are more major building blocks,
> of course) or 100% precise (e.g., some routing code does use originating
> protocol information so it is not purely protocol-agnostic), but this
> list is sufficient to answer questions Q1-4.
> 

So far we are in agreement.

> 
> Q1. What is ConnStateData (and related client_side.* code)?
> 
> Today, the ConnStateData and related client_side.* code plays many,
> often rather different roles related to a single user connection
> handling, including:
> 
>   * some new connection acceptance logic
>   * high-level user connection processing logic:
>     - sequential request handling (pause/resume)
>     - request pipelining (control pipelining depth)
>   * some HTTP/1 parsing (belongs to Http::One::Server)
>   * HTTP/2 rejecting
>   * PROXY protocol handling
>   * some user connection reading/writing
>   * some SSL Bumping
>   * error delaying
>   * transaction logging (access.log)
> 
> This heavy mixture and fuzzy boundaries (e.g., all those "some" words)
> complicate analysis and lead to design mistakes. It will take years to
> perfect this code, but I believe that there is actually a simple way to
> define ConnStateData in such a way that will immediately yield answers
> to many questions.
> 
> Using the building blocks described above, I propose the following two
> rules for defining ConnStateData boundaries and responsibilities:
> 
>   C1. ConnStateData is the code shared among all Servers (BB1).
>   C2. ConnStateData ends where request routing code (BB2) begins.

Last time we went over this issue AFAIK we agreed that the above was to
be the defintion of ::Server.

Pieces of ConnStateData which are meeting that description should be
suffled up into Server. Which was made parent of ConnStateData for the
exact purpose of minimizing chages when those shuffles are done.

That shuffle can be done now as the pieces are identified, the blocker
that keeps coming up there is disagreement on what those pieces are.


> 
> [ N.B. To avoid misunderstanding, when we talk about class XYZ, we
> generally talk about "class XYZ" itself combined with all its parent
> classes (e.g., "class X" and "class YZ") (if any) unless noted
> otherwise. This rule applies here as well -- when I say "ConnStateData
> is", I do _not_ wish to distinguish that class definition from all its
> parent classes, if any; I am talking about what the _combination_ of
> parents and ConnStateData is. ]

Nod.

> 
> The current code more-or-less obeys both rules C1 and C2. Yes, there are
> some violations. For example, ConnStateData contains some non-shared
> HTTP/1- and FTP-specific code that should be moved to protocol-specific
> Servers. However, those violations usually do not have far-reaching side
> effects outside ConnStateData and can usually be ignored for the purpose
> of this discussion.
> 

FYI: when we looked at this last there was a possibility of 1xx needing
to be used by both HTTP/1 and HTTP/2 servers. That has changed and only
HTTP/1 is needing the 1xx status sending code now.

HTTP/2 does do similar control messages, but in a different way which
the Squid 1xx sending code will need re-writing to perform anyhow. So
the existing 1xx code can probably become Http1::Server actions.

> 
> Q2. Where does the pending Downloader class belong?
> 
> The Downloader class[1] fetches SBuf-storable things for other Squid
> components/transactions using internal requests. For example, it is used
> to fetch missing intermediate certificates when validating origin server
> certificate chains.
> 
> The preview patch[1] implemented Downloader as a ConnStateData kid which
> triggered a request[2] to implement it as a Client kid. At the time, I
> could not fully reconcile those diametrically opposed ideas[3]. Now, I
> am ready to conclude that both designs were wrong:
> 
> * Since Downloader does not receive http/ftp_port requests and does not
> send responses, it is not a Server. It was wrong to build it on top of
> ConnStateData as [1] did. Yes, Downloader creates an HTTP[S] request to
> be routed through Squid, but that does not qualify it to be a
> ConnStateData kid. Other things can (and already do) initiate requests
> that are routed through Squid -- ConnStateData does not hold a monopoly
> on that.
> 
> * Since Downloader does not care about origin server communication and
> does not even require one (in case of cache hits), but can benefit from
> the usual REQMOD adaptations/hooks, it would also be wrong to make it a
> Client kid as [2] suggested. However, [2] was absolutely correct that
> there is no need to drag Server into Downloader!
> 
> [1] Factory "Fetch missing certificates" preview on squid-dev at
> http://lists.squid-cache.org/pipermail/squid-dev/2015-December/004297.html
> 
> [2] Amos' review of [1] at
> http://lists.squid-cache.org/pipermail/squid-dev/2016-February/005100.html
> 
> [3] Alex's response to [2] at
> http://lists.squid-cache.org/pipermail/squid-dev/2016-February/005106.html
> 
> So where does the Downloader class belong then? Downloader is a
> stand-alone class. Just like ConnStateData, it is a "routing client"
> class or a thing that creates internal requests to be routed through
> Squid using the BB2 building block. The three such classes in existence
> that I know of are:
> 
>   - ConnStateData
>   - ESIInclude
>   - Downloader
> 
> but I suspect that more old code should be on that list if things were
> implemented correctly (e.g., DigestFetchState). In the future, all these
> classes may gain a common RoutingClient parent and become formal siblings.
> 

Yes, some non-class logics for CacheDigest, NetDB exchange, etc. should
be on that list.

In overview I think if we have a good Downloader design those other
things and ESIInclude should probably be converted into Downloader or
children of it.
Making the major actor component hierarchies:
  Server, Client, and Downloader.


> 
> Q3. Where does the future Http::Two::Server class belong?
> 
> Http::Two::Server is a natural Http::One::Server sibling. That is, the
> two classes should have the same parent class (possibly among other
> parent classes). Both Http::One::Server and Http::Two::Server classes
> are servers and both are HTTP servers, so they will want to share a lot.
> Some of that sharing will most likely come through their common parent.
> 
> Ftp::Server is an existing Http::One::Server sibling which proves that
> it is possible to implement Http::Two::Server as such a sibling as well.
> Moreover, if Ftp::Server and Http::One::Server are siblings but
> Http::One::Server and Http::Two::Server are not, then something clearly
> went wrong because FTP is no more "closer" to HTTP/1 than HTTP/2 is!
> 
> Today, that common parent for the two siblings is ConnStateData.
> Tomorrow, we may split and/or rename ConnStateData, of course, but
> Http::One::Server and Http::Two::Server classes are likely to remain
> siblings with a common base that contains the code they share.

My draft branch uses ::Server as parent to Http2::Server. I've not yet
run across anything in ConnStateData that needed it to be a child of
that. Though there are things that need moving up to the Server parent
class.


> 
> Q4. What to do with the existing src/servers/Server.h?
> 
> Today, that class is a nuisance:
> 
> * It is not a common base for all servers (its stated purpose).

Which servers is it not a base for exactly?

Note that as parent of ConnStateData for now anything which implements
ConnStateData implements Server.

> 
> * It does not fully encapsulate connection-managing code of its kids
> (its stated usage).

... those changes to make it so got blocked in audit.

> 
> * It is essentially unused (it is not used as an API by others and has
> only one kid).
> 

... the other child class go blocked in audit initially and still
waiting the major redesign to be completed before it can be re-suvmitted.

The two grandchildren in trunk we planned to make direct children of it
(yes we, all three of us at the time agreed). The intermediate kid being
the ConnStateData mess still *temporarily* in the middle of the hierarchy.


> * It owns the pipeline object but only uses it to abort pending requests
> after I/O errors, and even that is done inconsistently.
> 

... the changes to make it so got blocked in audit.


> * It forces new code to add pure virtual methods that ConnStateData then
> overrides and implements, reminding me of the old Store class that
> slowly accumulated dozens of such methods because we needed to add
> something that one of the Store kids implemented.

This is backwards. ConnStateData is the old class that has accumulated
all those extra bits. We are trying to remove that ConnStateData scope
creep.

virtuals in Server are ideally implemented by the end-child
HTTP/FTP/whatever Server class. The ones that are implemented by
ConnStateData itself are in a temporary stage, until they can either be
pushed down to the childs, or the bits they depend on using from
ConnStateData and also moved up to Server and the virtual-ness dropped.


> 
> Overall, it shows many symptoms of premature encapsulation -- when the
> final purpose/boundaries are still fuzzy, and the new code can be added
> or excluded without any significant penalty; everything "works", both
> correct solutions and incorrect ones.
> 
> Most likely this class will not be able to preserve its current overall
> purpose/API as HTTP/2 support adds more specific requirements. Keeping
> this class until it is needed only increases development overheads and
> developer confusion.
> 
> I recommend merging it back into ConnStateData until Squid actually
> needs it (or needs something like it). Merging it into ConnStateData
> would preserve all code improvements we have done after it was split. If
> somebody finds that useful, we can even keep its method definitions
> inside Server.cc, at least for a while.

Doing that undermines 3 years work towards getting rid of ConnStateData.
I recommend adding more effort in the already agreed direction.

> 
> Executive Summary:
> 
>   Q1. What is ConnStateData?

A lot of code that is not in scope for a Server class (your BB2 logic),
mixed with some Server class code. All tied up in a bundle with publicly
visible data members and pieces that are used directly by Client
classes, store classes and miscellaneous other parts of Squid so that it
has become very difficult to retroactively add separation of duties.
 That is what ConnStateData is. A nasty tangle that needs to go.


>       Code shared among all Server (BB1) classes.

That is ::Server.

>       ConnStateData does not do routing (that is what BB2 code does).
> 

Yes. Well, ideally anyway. Right now its not true.

>   Q2. Where does the pending Downloader class belong?
>       It is a stand-alone class generating internal requests for BB2,
>       just like ConnStateData and ESI do.
> 

Yes.

>   Q3. Where does the future Http::Two::Server class belong?
>       It will be an Http::One::Server sibling.

Yes. Children of ::Server.

> 
>   Q4. What to do with the existing src/servers/Server.h?
>       Merge back into ConnStateData until it is actually needed.
> 

Not quite. I still favour the reverse direction, merging the
ConnStateData code that is actually Server duties _up_ into Server and
removing ConnStateData from the direct chain of that hierarchy.

If we have to, an intermediate stage where Http1::Server inherits from
both Server and ConnStateData to get the non-Server logics the
ConnStateData retains for a while. If we can avoid doing that though it
would be even better.


Thanks for resuming this discussinon.

Amos



More information about the squid-dev mailing list