[squid-dev] [PATCH] PayloadFormatter (was PackableStream)

Thu Aug 20 13:13:11 UTC 2015

On 20/08/2015 9:18 a.m., Alex Rousskov wrote:
> On 08/19/2015 09:47 AM, Amos Jeffries wrote:
>> On 19/08/2015 4:50 p.m., Alex Rousskov wrote:
>>> On 08/15/2015 12:20 AM, Amos Jeffries wrote:
>>>> For now this class is specifically and intentionally dumping out the
>>>> (old) format for cachemgr.cgi. Other third-party tools are considered
>>>> only so far as the cachemgr ambiguous syntax was published in various
>>>> parts in various places. As long as the output still matches that they
>>>> should be fine (or better off).
> 
>>> IMO, made-up after-the-fact syntax rules are irrelevant (because
>>> "nobody" knows about them, and they are imprecise). Cachemgr.cgi itself
>>> is irrelevant (because we control it). Admin scripts parsing old output
>>> are relevant. Thus, we should either keep the old output pretty much "as
>>> is" or replace it with something sufficiently better to justify the
>>> change pains for the admins. Adjusting output (while claiming
>>> compatibility with some syntax rules we made up) is a bad approach
>>> because it makes admins unhappy while not making us happy.
> 
> 
>> "The Definitive Guide" 14.2. All the way through its talking about
>> cachemgr.cgi display "columns".
>>
>> To find the detail of what a input "column" actually is refer to
>> cachemgr.cgi code itself. Specificaly the munge_other_line() function
>> for converting report text/plain to text/html. This code has not changed
>> in any significant way since the 1990's.
>>
>> * Any line excluding a \t is a free-form text line. Lets call that ...
>> comment, kv-pair, LF.
>>
>> * table is a series of \t delimited cells. Lets call that ... 'table-row'.
>>
>> * If the first cell of the first line of a table includes a ':' its a
>> header row. Lets call that ... 'table' for the whole set of rows, and we
>> will need a optional 'label' for the pre-colon bit in the first table-row.
> 
>> The Guide is also talking about how several reports including
>> 'redirectors' are *identical* in format to 'idns' ... not today, they
>> use tab-delimiting. idns got screwed up years later and now uses space
>> delimiting and looks like crap in CGI display as a result.
>>
>>
>> The kv-pairs is not part of the Guide + CGI documentation. Those parts I
>> took from your own emails and IRC chats educating me an kinkie about
>> what the report format was supposed to be.
> 
> 
> This part of the email thread was discussing whether the existence of
> admin scripts (rather than various imprecise syntax rules or
> cachemgr.cgi code) should be the primary factor in our decision making.
> How is the above information relevant to that discussion?
> 

If such admin scripts are based on anything beyond blind guesses it will
be based on that public documentation and dicussion.

> 
>>>>>> +class PayloadFormatter
>>>>> ...
>>>>>> +    Packable &outBuf;
>>>>>> +};
>>>>>
>>>>> Since the majority of PayloadFormatter methods and callers are going to
>>>>> assemble and format pieces of text and numbers, I think this should
>>>>> become an ostream.
>>>
>>>> You want to go back to PackableStream now?
>>>
>>> If possible, the new page/report/payload building interface should be
>>> ostream-enabled, not PackableStream-enabled. IIRC, that's what I
>>> sketched a few emails back.
>>
>> So how does the data get from std::ostreambuf to the client if its not
>> going near a Packable (ie StoreEntry).
> 
> I do not think ostream should use std::ostreambuf if that is what you
> meant. It should use our own buffer, backed by Packable (or StoreEntry).
> However, cache manager code would not know that and, hence, there should
> be no linking problems associated with that knowledge.
> 

Okay. I think I undertand you there.

But the statement about cachemgr does not quite click. Formatter and all
thus *is* cache manager (Mgr::) code. I dont see how Mgr:: code can not
know about how its own internals work.

I assume by "cache manager code" you mean Action and the HTTP
request/reply handling part of Mgr::.

> 
>> You ask to data-copy into the stream buffer, then data-copy a second
>> time to the StoreEntry buffer.
> 
> No, I do not. The stream buffer should use StoreEntry as storage [when
> the output goes to Store].
> 
> 
>> PackableStream is an ostream child providing StoreEntry as the buffer.
>> So PackableStream data-copies to the StoreEntry buffer.
> 
> Yes.
> 
> 
>> Which would you expect provides better performance:
>>  one or two data-copies ?
> 
> One copy.
> 
> (The design choices here are actually not driven by performance, but by
> the requirement to avoid buffering of huge cache manager responses in
> RAM. However, the same custom ostream buffer design happens to eliminate
> the extra copy as a positive performance side effect).

Then I am going back to the PackableStream patch and making a new
iteration that just renames StoreEntryStream and fixes the syntax
things. Patch for that in the other thread as soon as it works.

This patch will then have Formatter creating one of those streams
if/when necessary to drop values into.

Agreed?

> 
>>> Said that, ostream is the wrong primary interface for assembling
>>> payload/pages/reports. IMO, you should reintroduce ostream capabilities,
>>> but we should not be [going back to] assembling primary
>>> payload/pages/reports using ostream. More on that below.
> 
> 
>> The code proposal you submitted has Formatter inherting from ostream.
> 
> Yes, Page is a child of ostream in my sketch.
> 
> 
>> The only use of Formatter is by Action.
> 
> Yes, at least in the scope of this discussion.
> 
> 
>> Ergo you intend Action to be
>> using a ostream interface to assemble top-level report data.
> 
> No. Page provides a set of methods for high-level assembly, including:
> 
>     /// starts recording a list of values
>     virtual Page &beginList() = 0;
>     /// finishes recording a list of values
>     virtual Page &endList() = 0;
> 
>     /// starts recording a list of values
>     virtual Page &beginListItem() = 0;
>     /// finishes recording a list of values
>     virtual Page &endListItem() = 0;
> 
>     /// starts recording a key:value pair with a given key
>     virtual Page &beginKv(const SBuf &key) = 0;
>     /// finishes recording a key:value pair
>     virtual Page &endKv() = 0;
> 

That is not usable by code taking ostream&. Therefore not useful to the
code which most needs to use this objects API.

> 
>> Either ostream is a private internal to Formatter for low-level use
>> only, or it is available (and eventually used) at high-level Action.
> 
> It is a public interface for assembling some opaque/atomic values for
> formal output such as YAML.

This *will not* correctly format YAML or HTML syntax:

  Page << "Hello World";

Making Page/Formatter an ostream child is just not useful for its one
and only user (Action).

> 
> Whether inheriting from ostream is the best approach to accomplish that,
> I am not sure, but it is probably the simplest one. The alternative to
> consider is to provide a method that returns an ostream reference. There
> are some examples below.

I'm certain its not. So why are we still circling this ?

>>>> Because StoreEntryStream would place a Store.h dependency almost as
>>>> widely as squid.h dependency.
> 
>>> The _implementation_ of the ostream-based interface may use something
>>> like PackableStream or even StoreEntryStream without introducing any
>>> excessive dependencies AFAICT. In the sketch I posted, only one line of
>>> code new about std::cout backing while the rest was using ostream.
> 
>> You argued strongly for *not* doing it exactly that way only the other week.
> 
> 
> I did not. I argued against using ostream to format strict syntax (e.g.,
> YAML) output. Later, I provided a sketch how to avoid doing that while
> still supporting ostream-based assembly of YAML atomic values. I now
> included specific examples further below.
> 
> 
>> Your response has been to propose an object ... which is an ostream
>> child. And would thus be used like PackableStream in my first proposal.
> 
> The difference between the Page class and PackableStream is that Page
> provides methods for high-level formal formatting (such as YAML).
> PackableStream does not and should not have such methods. You should
> look at the whole Page class declaration, not just at its first line
> (where ostream inheritance is declared).

I did. The first line indicates that its providing ostream& operators
API ... apparently for use.

The other code was passing it as ostream&. Which elides the new
formatting methods and leaves Action internal methods with only stream
operator<< to work with. That does not fit with the concept of avoiding
Action using streams.

> 
> 
>>>>> Almost all the code that uses outBuf and PayloadFormatter methods in the
>>>>> patch already looks labored/awkward (often worse than the trivial code
>>>>> it replaces!), and would look much better if replaced by the usual
>>>>> ostream "<<" expressions.
>>>
>>>> Now you see the benefit of ostream.
>>>
>>> My attitude towards ostream has not changed. Forming payloads/pages
>>> /reports using ostream as the top-level interface was wrong and still is.
>>
>> Your objection was something along the lines of "do not use an ostream
>> in Action".
>>
>> So you proposed the Formatter class, an ostream child to be used by Action.
>>
>> WTF?
> 
> When you only pay attention the first few words of my paragraph or my
> class declaration, it is easy to arrive at strange conclusions. Here is
> my initial objection, prefaced with its context sample:
> 
> 
>> Mgr::MenuAction::dump(std::ostream &p)
>> {
>>     p.fill(' '); // space pad the menu fields for readability
>>     p.setf(std::ios::left);
>>     for (auto &a : CacheManager::GetInstance()->menu()) {
>>         p << '\t' // menu is a table
>>           << std::setw(22) << a->name << '\t'
>>           << std::setw(32) << a->desc << '\t'
>>           << std::setw(8) << CacheManager::GetInstance()->ActionProtection(*a)
>>           << '\n';
>>     }
>>     p.unsetf(std::ios::left);
>> }
> 
>>> It feels wrong that you want Actions to output using strict syntax but
>>> give them an "free form" or "anything goes" std::ostream as a tool to
>>> accomplish that. Should not Action dumping methods receive a specialized
>>> class that makes it easy to produce the output you want and hard (if not
>>> impossible) to make syntax mistakes?
> 
> The Page class in my sketch is an example of that "specialized" class I
> was talking about in the above paragraph. It makes formal output "easy"
> by providing methods that ensure correct syntax. Here is the sketch code
> implementing something similar to the MenuAction::dump() above:
> 
> 
>>     page.beginList();
>>     for (auto name: {"5min", "60min", "disk_io", "events", "shutdown"}) {
>>         page.beginListItem().
>>             beginHash().
>>                 kv(name, "menu item description to be extracted from Action::desc").
>>                 kv("protection", "public"). // ActionProtection(*a)
>>             endHash().
>>         endListItem();
>>     }
>>     page.endList();
> 
> 
> We can discuss how "hard" it should be for Action to get access to
> Page's ostream interface. In the sketch, the developer would have to use
> an ostream method or operator to get access to ostream features. For
> example:
> 
>   page.beginKv("statistics_foo") << something.mean() << endKv;
>   page.beginkv("worker_id") << "disker" << DiskerId << endKv;
> 
> 
> If that usage of a low-level interface is not visible enough, we can
> hide ostream access behind a dedicated method returning an ostream
> reference:
> 
> class Page {
> public:
> ...
>     /// use this to assemble opaque/atomic values
>     ostream &raw();
> ...
> }
> 
> It might be used along these lines:
> 
>   page.beginKv("statistics_foo").raw() << something.mean() << endKv;
>   page.beginkv("worker_id").raw() << "disker" << DiskerId << endKv;
> 

Aha. Back to Action code using streams. And why you want chaining.

But I think I see what you are imagining in the background now.

This relies on chaining and comes back to the same reasons I dont want that.

* It is reasonable to expect that worker_id has different syntax for
string and numeric representations in one of the syntaxes added later.
 That would call for multiple kvPair(name, value) methods based on the
value received here by the << operator.

* With indentation in the syntax (both current and YAML) the
"something.mean()" output needs a specific prefix applied at the start
of each of its lines.

In both those circumstances it is better to pass 'page' to mean() and
ensure that sub-pieces of something.mean(page) are individually prefixed
and syntaxed by page with correct bits according to its internal state.

Also, the bits between << are now free-flow outputs again. Thats a sure
recipe for some report to make an assumption like you do in the example
code itself (with "disker" assuming a string ID). That a particular kv
thing is going to take a string rather than an integer value.
 This model would kill any hope of an ASN1 or such Formatter/Page.

> 
> Needless to say, there is some danger in providing such raw access. If a
> developer is not careful, they might introduce a syntax error inside
> what they saw were perfectly fine atomic values. We can protect against
> that and offer automatic escaping of bad characters (among other
> things), but that would require more work. I suspect such protections
> would not necessarily require Page interface changes; if I am right, we
> can delay their implementation until it is clear they are actually needed.
> 

I don't think that is necessary though. So I am leaving it out of these
early patches. Okay?

> 
>>> Making ostream available for opaque and low-level parts of payload is a
>>> good approach. My earlier sketch was probably too small to illustrate
>>> this point, but it does provide high-level report structuring methods
>>> while keeping ostream available for low-level stuff. More needs to be
>>> done to polish that split based on actual typical use cases.
> 
>> The low-level stuff that makes sense using ostream is all internal to
>> Formatter. We seemed to be agreeing on that much.
> 
> We do not agree on the "all internal part". IMO, ostream should be
> exposed so that Actions can assemble opaque values using it. I hope the
> specific examples above illustrate what I mean by that.

see above.

The current free-form outputs already place indentation requirements the
mixed methods + ostream Page API cannot meet. Its either ostream with
Action::dump() doing the indentation explicitly, or just the
Page/Formatter methods.

YAML adds extra construct syntax requirements that make it even more
nasty. Then any potential binary outputs we want to generate later will
die horribly.

> 
>> What I'm not getting is why you insist this ostream API be exposed and
>> used by Action now but that PackableStream was inappropriate when it
>> could simply have had new members added and been identical to your
>> proposed Formatter.
> 
> First of all, the first patch I was reviewing did not pass
> PackableStream to Actions. It passed ostream. We cannot add methods to
> ostream.

It is not passed to Actions. Its created by Action and passed to
Action::dump() and action sub-code.

> 
> Second, adding formal-syntax formatting methods to PackableStream (and
> passing PackableStream to Actions) would be wrong. The PackableStream
> class should be dedicated to providing a Packable-backed ostream. It
> should not know anything about Cache Manager reports and their
> formatting. We may use PackableStream outside Cache Manager.

Its only uses were as an output serializer (formatter) for Action dump()
code. The slightly higher-order syntax of what the fields were was being
left in Action::dump() and related functions where it is in current trunk.

We agreed to use Page/Formatter to allow markup injection between fields
now, so re-coding was not necessary later.

In either model Action::dump is still responsible for knowledge of what
field types are being output. Either in the Page/Formatter methods its
calling, or the particular stream << sequencing.

As I mentioned above exposing operator<< to Action just means important
parts of the markup injection is not possible. At least not without
moving format/syntax scope things from Formatter back to Action.

> 
>> The problem in the memory libraries is the *StoreEntry*. In particular
>> its class declaration head and things it pulls in.
>>
>> Action depends deeply on StoreEntry. So there is no Action dumper in
>> libmem. Just currently some pool methods taking (ostream&) and free-form
>> dumping out "tables".
>>
>> StoreEntry was made a Packable child for the purpose of using the
>> Packable API, possibly via PackableStream. To store stuff into a
>> StoreEntry (or just  a Membuf/SBuf) in more complex Formatter-like ways
>> without having libmem be aware it was anything other than a Packable.
>>
>> If we are going to use Formatter in the libmem dumper code. Then it has
>> to be detatched from StoreEntry completely. That also means detatching
>> it from Action which depends deeply on StoreEntry. 
> 
> Yes, I suspect we are [still] in agreement here.
> 
> 
>> And that it can only
>> use the Packable API, possibly via PackableStream (aka ostream&).
> 
> I hope that Page/Formatter does not need to know about Packable, just
> ostream, but there may be some corner cases I am not aware of.

If Formatter is allocating the PackableStream it would. But
PackableStream is an wholly inline thing for exactly that type of reason.

> 
>>>>>> +    // a comment
>>>>>> +    virtual void notice(const SBuf &) = 0;
>>>>>
>>>>> Let's try to define the purpose of this method more precisely so that we
>>>>> know when it is being used [in]correctly. For example:
>>>>>
>>>>>   /// a free-form comment or informational text that
>>>>>   /// is not meant to be further parsed by automation tools
>>>
>>>
>>>> No.
>>>>
>>>> notice = "A free-form informational text block that is meant for display
>>>> without further processing."
>>>>
>>>> comment = "A free-form informational text that is not meant for
>>>> automated tool processing or display. May be discarded by automation."
>>>
>>> You have documented notice() as "a comment". Now you give two different
>>> definitions for those two words. Please make up your mind and adjust the
>>> notice() description accordingly. After that, it should be clear whether
>>> we agree what notice() is.
> 
> 
>> *Do you agree to those definitions* before I go changing any more code
>> around them?
> 
> Do I agree that any notice and any comment fit your definitions? No.
> Fortunately, we are not writing word definitions for a dictionary but
> documenting a specific class method, so we only need to agree on what
> our notice() should do.
> 
> I have proposed how to describe the notice() method. You said "No" and
> provided two different descriptions for the two words you previously
> used as _synonyms_. You did not say which of those two descriptions
> applies to the notice() method. There was no explanatory text
> accompanying the two descriptions either (unless you count "No" as
> such). I hope you can see why I asked you to pick one of the two
> descriptions so that we can focus the discussion on it.

Sorry. I missed off the (). I intended those to be the as-is
documentation for the two methods with those names.

> 
> I believe your "notice" description is closer to what we need than your
> "comment" description. The key here, IMO, is that notice() text is not
> meant to be parsed further by automation tools. If we do not want to
> prohibit comparisons, then a tool may even compare the received notice
> to some notices it knows about, but only as an opaque blob of text, with
> no internal structure.

Okay.

> 
> Do I like "display without further processing"? No, I do not like that
> phrasing because it is not clear whether
> 
>   * further processing for display purposes is allowed;
>   * other use (e.g., comparison with other notices) is prohibited;
>   * automated reaction without display is prohibited.
> 
> In my imperfect description example, I tried to avoid those pitfalls by
> focusing on prohibiting just one key activity: parsing of the notice
> value [to extract some usable parts]. If we prohibit that, then we are
> free to use any format/arrangement for the notice text, but must resist
> the temptation to include some information that automation scripts will
> want to parse.

Okay.

> 
> For example, the following are "bad" notices (Squid should not generate
> them):
> 
>   uptime: 1 hour
>   built with libecap v1.1
>   reconfiguration finished in 5 seconds
>   error: I/O error while responding to a cache manager request
> 
> and the following are "good" notices:
> 
>   reconfiguration finished
>   unsupported action
>   unknown cache manager output format
> 

Agreed.

> 
>> For now notice() is just informational with a requirement that it
>> reaches the report reader. There is no other implications on teh
>> processing tool.
> 
> If we require that a notice reaches a human report reader, does it mean
> that a script receiving a notice cannot act on it autonomously? If yes,
> I do not think we want that. If no, then I recommend avoiding such
> language as "report reader".
> 
> 
>> I'm not sure how to emphasise *informational* any better than using it
>> as the active noune in the descriptive. ALL-CAPS seems over the top.
>>  Any suggestions?
> 
> I do not think we should try to regulate that aspect of notice usage.
> What is "informational" (as in "FYI, X happened, but feel free to
> disregard"?) for one, could be very important (as in "do Y if you
> receive notice X") for another recipient.
> 

I think we agree on intention. But neither have a good text for it yet.

It's that passing-on property we need to clearly state. I've used
"display" there since its variable by context and loose enough to cover
any type of that action.

How about just a terse:

notice() - "An informational text announcement. To be passed on instead
of ignored. Do not automatically parse the text."

comment() - "An informational text comment. May be ignored or dropped.
Do not automatically parse the text."

> 
>>> Going forward, please do not use "final" unless really necessary. Treat
>>> it like we treat "throw()" declarations.
> 
> 
>> You mean spotted liberally around the new code in the form of a wrapper
>> keyword like Must() ?
> 
> No, I mean avoided like the now-deprecated "throw()" exception
> specification:
> 
>   http://en.cppreference.com/w/cpp/language/except_spec
>   http://www.gotw.ca/publications/mill22.htm
> 
> Proposed rule of thumb: If correct inheritance or overriding is
> possible, do not use "final", even if you do not see any good reasons to
> inherit or override.
> 

Aha. Though AFAIK correct inhertence is *always* possible in reasonable
code. So that is just a ban on "final".

As you saw my uses of it are kind of "political" in nature. To enforce
better code use and graceful deprecations of widely used things.

Amos