[squid-dev] [PATCH] Http::StreamContext refactoring

Sat Jan 23 13:17:21 UTC 2016

On 23/01/2016 7:59 a.m., Alex Rousskov wrote:
> On 01/14/2016 05:53 AM, Amos Jeffries wrote:
> 
>> The ClientSocketContext is renamed and shuffled to Http::StreamContext.
> 
> This class does not belong to the Http namespace. It represents the user
> side of a master transaction.

No, that is HttpRequest.

StreamContext represents an HTTP "stream". The formal definition is at
<http://tools.ietf.org/html/rfc7540#section-5> and HTTP/1 equivalence is
hinted at in the text of <http://tools.ietf.org/html/rfc7230#section-5.7>.

It points to both the HttpRequest ('user side') and HttpReply, and
AdaptedX, and clientStreams processing trail, and any deferred I/O
buffers waiting write(2), and (for now) the Job which is processing all
that data.

> While Squid historically uses classes
> named after HTTP to represent various master transaction aspects (e.g.,
> HttpRequest and HttpReply), we should not fall into this trap when
> naming new classes or renaming the old ones. As you know, the user side
> of a master transaction may correspond to any of these (at least):
> 
>   * An HTTP stream (user coming from an http_port or alike).
>   * An FTP stream (user coming from an ftp_port).
>   * An internal stream (various Squid-originated downloads).

... and HTTP/2 server-initiated streams (PUSH_PROMISE, Bi-Directional
HTTP, etc) will also need to be handled by this class due to it being
the stream object stored in Pipeline.
 The alternative would be a more complex design involving yet another
class to go in Pipeline that references some stream sub-class depending
on what type of protocol and direction of stream etc. IMO that is going
too far right now. These StreamContext are already just a data store for
the crossover between ::Server/ConnStateData and ClientHttpRequest Job
logics.
 One day we might merge it into MasterXaction as just transaction state
data, but you already objected to storing MasterXaction directly in
Pipeline in 2011 when my initial Pipeline design attempted it.

... the FTP relation to "stream" is that one actual FTP transaction is
mapped to *multiple* HTTP streams internally.

IMO FTP transactions should either be storing state data in another
object type entirely, or the FTP code is mapping to these objects
(plural) in their role as HTTP stream. That can be decided later since
it does not prevent these particular objects semantically being HTTP
streams. (it *will* affect ClientHttpRequest decisions when we get to
them, but that is later).

The internal requests generated by Squid are HTTP, so the stream
represents them directly as streams.

> The "Context" suffix adds nothing to the meaning of the new name. It is
> essentially noise, like most "State", "Data", and "Object" suffixes. I
> suggest dropping it.

Okay.

> 
> Since there are many Squid things that can be considered "streams", we
> do need some distinguishing qualifier for the new name. As you know, we
> are avoiding old "client/server side" terminology so ClientStream would
> not work. If this thing is going to be specific to user traffic, we
> could call it UserStream. If you think it will be eventually used for
> peer/origin transactions as well, then a different qualifier is needed.

That is part of why they were specifically under the Http:: prefix.
Http::Stream being what the object is supposed to be (turn into).

clientStream is already taken by the client-streams payload processing
feature. So that is doubly bad.

As mentioned above squid-initiated and server-initiated streams are also
represented by these. So 'user' is not part of the definition.

> 
> The renaming/moving part of the patch scope changes lots of code. It is
> very unfortunate that you have decided to combine the above
> functionality-preserving polishing with the functionality changes below.
> The former makes it very difficult to see the latter. If I had enough
> guts, I would -1 this patch for this reason alone. Renaming/moving
> changes should have been done separately!

The Pipeline related ID changes are small and only needed for HTTP/2
(the combo was due to being sponsored as an line-item). I can separate
them out again.

With that Pipeline update gone there is no need for an audit of the
shuffling according to our policy. Once we settle on the naming (above)
are you happy for the non-logic shuffling bit to go in? and a new patch
to audit only the Pipeline ID part?

> 
> 
>> Pipeline class is updated to use the ID number to manage its contents
>> rather than Pointer value matching. It is also updated to drop the
>> HTTP/1 specific assumptions within the Pipeline implementation. As a
>> behavioural requirement the sequential flow is now left for the Server
>> and ClientHttpRequest Jobs to ensure correctness.
> 
> 
> The new Pipeline::popById() method is a bad idea IMO: Linear search is
> wrong for both HTTP/1 and HTTP/2.
> 

Do you have a better algorithm? it needs to pop using only an ID (HTTP/2
limit due to framing) and works on linear/list storage (optimal for
HTTP/1 sequentialism).

> 
>> The old "client-side" classes are now split along the following scopes:
> 
> 
>> * ::Server / ConnStateData/ *::Server
>>  - the AsynJob managing I/O on a client connection.
>>
>> * ClientHttpRequest
>>  - the AsyncJob operating transaction internal behaviours and Calls.
>> Will make use of ConnStateData and Http::StreamContext to perform its
>> duties (has not yet been refactored to do so cleanly yet).
>>
>> * Http::StreamContext
>>  - the state data for a transaction. Refcounted and used by the Server
>> and ClientHttpRequest Jobs for their mutual data storage.
> 
> 
> As I said earlier, I will have to come back to this part later, but
> perhaps you can already clarify whether this patch has actually changed
> any of the above (besides renaming)?
> 

Besides renaming and re-documenting: no.

The "split" is already existing, this is mostly documentation update to
formalize it so we can see better what logics are out of scope in their
current position. ::Server + children and ClientHttpRequest are Jobs.
Stream is just transaction data storage.

There is one or two static functions that had to be made non-static and
some methods de-duplicated by making a virtual override. But the code in
them and the order that code runs is unchanged.

Amos