[squid-dev] [PATCH] PayloadFormatter (was PackableStream)

Wed Aug 19 15:47:32 UTC 2015

On 19/08/2015 4:50 p.m., Alex Rousskov wrote:
> On 08/15/2015 12:20 AM, Amos Jeffries wrote:
>> I dont like payload particularly either in this case. But page is wrong.
>>
>> Page is what the remote end display tool will be generating. *IF* a
>> "page" exists at all.
> 
> There are many kinds of payloads. There are many kinds of pages. There
> are many kinds of reports. No name is perfect. I am not going to spend
> more time trying to explain why your wrong name is wronger than my wrong
> name. If you think your naming scheme is better, who am I to argue?
> [rhetorical]
> 
> 
>>>> +/**
>>>> + * Produces cache manager report payloads in a human-readable markup syntax
>>>> + * which is parsed by the cachemgr.cgi tool.
>>> ...
>>>> +class PayloadOldCgi: public PayloadFormatter
>>>
>>> I think it is wrong to strongly associate the old format with
>>> cachemgr.cgi.
> 
>> That was intentional. This format is nasty even for a free-form format.
>> You noticed the syntax is ambiguous and commented on it.
>>
>> IMO deprecating this while leaving it available is a good choice. If it
>> were not for backward compatibiliy and transitional needs I would
>> propose removing it entirely (did in fact).
>>  Instead I'm starting with this one, and moving on to YAML etc. to
>> provide the non-ambiguous outputs.
> 
> Your response does not seem to be related to my objection, but again, I
> am not going to spend more time arguing about your naming scheme.
> 
> 
>> For now this class is specifically and intentionally dumping out the
>> (old) format for cachemgr.cgi. Other third-party tools are considered
>> only so far as the cachemgr ambiguous syntax was published in various
>> parts in various places. As long as the output still matches that they
>> should be fine (or better off).
> 
> IMO, made-up after-the-fact syntax rules are irrelevant (because
> "nobody" knows about them, and they are imprecise). Cachemgr.cgi itself
> is irrelevant (because we control it). Admin scripts parsing old output
> are relevant. Thus, we should either keep the old output pretty much "as
> is" or replace it with something sufficiently better to justify the
> change pains for the admins. Adjusting output (while claiming
> compatibility with some syntax rules we made up) is a bad approach
> because it makes admins unhappy while not making us happy.
> 

"The Definitive Guide" 14.2. All the way through its talking about
cachemgr.cgi display "columns".

To find the detail of what a input "column" actually is refer to
cachemgr.cgi code itself. Specificaly the munge_other_line() function
for converting report text/plain to text/html. This code has not changed
in any significant way since the 1990's.

* Any line excluding a \t is a free-form text line. Lets call that ...
comment, kv-pair, LF.

* table is a series of \t delimited cells. Lets call that ... 'table-row'.

* If the first cell of the first line of a table includes a ':' its a
header row. Lets call that ... 'table' for the whole set of rows, and we
will need a optional 'label' for the pre-colon bit in the first table-row.

The Guide is also talking about how several reports including
'redirectors' are *identical* in format to 'idns' ... not today, they
use tab-delimiting. idns got screwed up years later and now uses space
delimiting and looks like crap in CGI display as a result.

The kv-pairs is not part of the Guide + CGI documentation. Those parts I
took from your own emails and IRC chats educating me an kinkie about
what the report format was supposed to be.

It turns out that no, those were just PRE (free-form) lines "more easily
machine interpretable than human readable" as the Guide says. But it
does make sense to have them, and many reports look like they were
created with that in mind so I added the kv-pair descriptor to the
grammar too. You object to that?

The lack of a written copy of the grammar to date does not preclude its
existence prior to now. Human language is a good example there, it
"obviously" has a grammar even though an entire field of research was
created to find out what it is, a long time after use began.

> 
>>>> +class PayloadFormatter
>>> ...
>>>> +    Packable &outBuf;
>>>> +};
>>>
>>> Since the majority of PayloadFormatter methods and callers are going to
>>> assemble and format pieces of text and numbers, I think this should
>>> become an ostream.
> 
>> You want to go back to PackableStream now?
> 
> If possible, the new page/report/payload building interface should be
> ostream-enabled, not PackableStream-enabled. IIRC, that's what I
> sketched a few emails back.

So how does the data get from std::ostreambuf to the client if its not
going near a Packable (ie StoreEntry).

You ask to data-copy into the stream buffer, then data-copy a second
time to the StoreEntry buffer.

PackableStream is an ostream child providing StoreEntry as the buffer.
So PackableStream data-copies to the StoreEntry buffer.

Which would you expect provides better performance:
 one or two data-copies ?

> 
> We do not have to support Packable to provide ostream interface, but if
> Packable backing helps, we should certainly use it.

The alternative as I already mentioned is StoreEntryStream.
 Which pulls in Store.h dependency.
 Which pulls in HttpRequest dependency.
 ... and so on

Appending strings (as ostream does) to a StoreEntry uses the Packable
virtual API. Exclusively. So why would we pull in all that StoreEntry
dependency just to use StoreEntryStream?
(hint: the caller Action already did locking and buffering so those bits
are no-op).

> 
> Said that, ostream is the wrong primary interface for assembling
> payload/pages/reports. IMO, you should reintroduce ostream capabilities,
> but we should not be [going back to] assembling primary
> payload/pages/reports using ostream. More on that below.
> 

The code proposal you submitted has Formatter inherting from ostream.

The only use of Formatter is by Action. Ergo you intend Action to be
using a ostream interface to assemble top-level report data.

So WTF?

Either ostream is a private internal to Formatter for low-level use
only, or it is available (and eventually used) at high-level Action.

Formatter with its clean API distinct from StoreEntry and Action does
not pull in the dependencoes I was worried about with libmem. So it
(alone) can be passed there potentially if ostream& is not wanted later.

> 
>> Because StoreEntryStream would place a Store.h dependency almost as
>> widely as squid.h dependency.
> 
> The _implementation_ of the ostream-based interface may use something
> like PackableStream or even StoreEntryStream without introducing any
> excessive dependencies AFAICT. In the sketch I posted, only one line of
> code new about std::cout backing while the rest was using ostream.

You argued strongly for *not* doing it exactly that way only the other week.

Your response has been to propose an object ... which is an ostream
child. And would thus be used like PackableStream in my first proposal.

So, run "sed s/PackableStream/PageFormatter/g" on my first patch and
re-submit for audit? I doubt.

> 
>>> Almost all the code that uses outBuf and PayloadFormatter methods in the
>>> patch already looks labored/awkward (often worse than the trivial code
>>> it replaces!), and would look much better if replaced by the usual
>>> ostream "<<" expressions.
> 
>> Now you see the benefit of ostream.
> 
> My attitude towards ostream has not changed. Forming payloads/pages
> /reports using ostream as the top-level interface was wrong and still is.

Your objection was something along the lines of "do not use an ostream
in Action".

So you proposed the Formatter class, an ostream child to be used by Action.

WTF?

> 
> Making ostream available for opaque and low-level parts of payload is a
> good approach. My earlier sketch was probably too small to illustrate
> this point, but it does provide high-level report structuring methods
> while keeping ostream available for low-level stuff. More needs to be
> done to polish that split based on actual typical use cases.

The low-level stuff that makes sense using ostream is all internal to
Formatter. We seemed to be agreeing on that much.

What I'm not getting is why you insist this ostream API be exposed and
used by Action now but that PackableStream was inappropriate when it
could simply have had new members added and been identical to your
proposed Formatter.

The problem in the memory libraries is the *StoreEntry*. In particular
its class declaration head and things it pulls in.

Action depends deeply on StoreEntry. So there is no Action dumper in
libmem. Just currently some pool methods taking (ostream&) and free-form
dumping out "tables".

StoreEntry was made a Packable child for the purpose of using the
Packable API, possibly via PackableStream. To store stuff into a
StoreEntry (or just  a Membuf/SBuf) in more complex Formatter-like ways
without having libmem be aware it was anything other than a Packable.

If we are going to use Formatter in the libmem dumper code. Then it has
to be detatched from StoreEntry completely. That also means detatching
it from Action which depends deeply on StoreEntry. And that it can only
use the Packable API, possibly via PackableStream (aka ostream&).

> 
>>>> +    // a comment
>>>> +    virtual void notice(const SBuf &) = 0;
>>>
>>> Let's try to define the purpose of this method more precisely so that we
>>> know when it is being used [in]correctly. For example:
>>>
>>>   /// a free-form comment or informational text that
>>>   /// is not meant to be further parsed by automation tools
> 
> 
>> No.
>>
>> notice = "A free-form informational text block that is meant for display
>> without further processing."
>>
>> comment = "A free-form informational text that is not meant for
>> automated tool processing or display. May be discarded by automation."
> 
> You have documented notice() as "a comment". Now you give two different
> definitions for those two words. Please make up your mind and adjust the
> notice() description accordingly. After that, it should be clear whether
> we agree what notice() is.

Of course. I put the change in and you object again. And we go round in
circles.

Pause, "Let's try to define the purpose of this method more precisely".

*Do you agree to those definitions* before I go changing any more code
around them?

The below seems to say "no, not good enough". So it was a reasonable
choice not to change the code on last patch. I've have removed the "a
comment" bit for now until the final text is agreed.

> 
> Please also adjust the description to emphasize whether a notice() is an
> indication of a possible problem/error or can be safely ignored [by
> automation tools] without any harm. The distinction would be important
> for automation. If there are two kinds of notices (e.g., Warnings and
> FYI messages), then we may need to add a notice(kind) parameter or split
> notice() into two methods.
> 
> For example, "foo is not yet implemented" is probably something that
> tools may safely ignore, but "you do not have permission to access foo"
> is an error that a tool should treat as such. Should both use the same
> notice() method and, hence, be indistinguishable to the automation tool?
> Probably not.

Okay. Good point. Though we don't have any warning or error outputs to
display in any of the reports to date AFAIK. Just the notices about
features and sub-segements being "not implemented", or some reports
being large.

We will probably need error()/warning() when POST and such are coded.
But thats a problem for later.

For now notice() is just informational with a requirement that it
reaches the report reader. There is no other implications on teh
processing tool.

I'm not sure how to emphasise *informational* any better than using it
as the active noune in the descriptive. ALL-CAPS seems over the top.
 Any suggestions?

> 
>>> Capitalization in "foo is Not Implemented" looks weird to me. It is
>>> better to use lower case letters throughput IMO.
>>>
>>
>> Sorry, old habbit. 501. Fixed.
> 
> Now you may want to fix the method description as well. Naming the
> formal parameter and using the name in the fixed description text would
> help clarify the description IMO. For example:
> 
>     /// a 'something is not implemented' notice()
>     void notImplemented(const SBuf &something);
>

Fixed. Thanks.

> 
> 
>>>> + * Syntax ABNF:
>>>> + *  payload     = *( table / kv-list / notice / LF )
>>>> + *  table       = [ label ':' [ table-row ] ] 1*( table-row )
>>>> + *  table-row   = 1*( [ string ] '\t' ) string LF
>>>> + *  kv-list     = label ( '=' / ':' ' ' ) string LF
>>>> + *  label       = 1*( VCHAR / SP ) ; any printable excluding ':' and '='
>>>> + *  notice      = string LF
>>>> + *  string      = 1*OCTET    ; any characters excluding LF (\n) and TAB (\t)
> 
> 
>>> This syntax is very ambiguous AFAICT.
> 
> 
>> Indeed. You see one reason why I have been so impatient about it.
> 
> 
>>> If it actually matches reality
>>> (and cachemgr.cgi just guesses what the next construct is most likely to
>>> be), then you may want to add a comment about that fact. Same if this
>>> syntax only approximates what Squid actually dumps.
> 
> 
>> Yes and no. This combines both what cachemgr.cgi parses (table, notice),
>> and what Squid is elsewhere documented as outputting (kv-list). With
>> extra detail gleaned from what Squid actually outputs byte-wise in
>> reports that use those table/list constructs (kv-list delimiters, tables
>> without header row, and empty-lines).
>>
>> There are a few reports using syntax like SP-padded columns with no tabs
>> which cachemgr.cgi borks display of badly. I count them as wrong reports
>> (using notice lines instead of table or kv-list) to be fixed.
>>
>> And I've not managed to add block indentation in there yet in a way that
>> leaves the syntax readable. So reality is worse than it looks.
> 
> Your description matches my expectations so my suggestion still stands:
> Document that the ABNF is ambiguous, only approximates the reality, will
> not be adhered to (or even guide) future formats, and is provided as an
> illustration only.

Ah. Another misunderstanding. That I grok and agree with.

But I thought it was clear I intend the outputs using OldCgi formatter
*will* adhere to it. I have been over all of Squid reports so many times
I am doubtful that anything was missed. However if something does appear
the syntax will be tweaked to include that too ensuring it corectly
documents what the old report format documented in the FAQ and
Definitive Guide was in ABNF terms.

Added text to that effect with the disclaimer you requested.

> 
>>> Also, the proposed notice() method says nothing about excluding LF (\n)
>>> and TAB (\t) from comment values. The method should document what we
>>> want to exclude from all notices (if anything).
> 
>> I don't want to exclude anything. YAML and other syntaxes wont need to
>> in the end result.
> 
> 
> Your desires and plans only strengthens my suggestion to disclaim the
> actual impact of this formal- and precise-looking ABNF.
> 
> 
> 
>>>> -    virtual void dump(StoreEntry *entry);
>>>> +    virtual void displayWith(Mgr::PayloadFormatter &) override final;
>>>
>>> Why is this method and many others marked as "final"? Is there something
>>> wrong with adding a kid class that would dump a longer menu, for example?
>>
>> Yes. 
> 
> 
> For the record, I think your reasons do not justify "final" use. They
> are "political" reasons, not "code" reasons. However, I am not going to
> insist on a complete "final" removal right now.
> 
> 
>> I absolutely do not want anything deriving from OldCgi and keeping
>> it around longer than it has to be for transition to the better ones.
> 
> If you motivation is to prevent derivatives, then you are misplacing
> "final". It should go into the class declaration line instead (as
> illustrated below). However, since nobody will know why you are abusing
> "final" like this, please also add a comment:
> 
> 
>   /// ...
>   /// Nothing should derive from this class because we are afraid
>   /// that any derivative will keep this class around longer.
>   class PayloadOldCgi final: public PayloadFormatter
> 

Hmm, yes that is simpler. Thanks. Added something similar.

> 
>> As to menu. The length of menu, and any report segments appended is
>> Action/ActionChild scope. Whose duty it is to fetch the menu data and
>> call formatter once per segment type, or simply provide more table rows
>> before ending the menu table.
> 
> I do not know why the above is relevant, but please remove "final" from
> non-OldCgi methods. Overriding them may not be needed right now, but
> there is no reason, even a political one, to prohibit such overriding.
> 
> 
> Going forward, please do not use "final" unless really necessary. Treat
> it like we treat "throw()" declarations.

You mean spotted liberally around the new code in the form of a wrapper
keyword like Must() ?

I dont think so.

> 
> 
> In summary, the most important top-level problems [I remember] are:
> 
> * unavailability of an ostream interface for low-level formatting
> * poorly defined notice() scope
> 

Which we are still hacking out circles in discussion.

New patch to follow tomorrow.

Amos