[squid-dev] [PATCH] PayloadFormatter (was PackableStream)

Alex Rousskov rousskov at measurement-factory.com
Wed Aug 19 21:18:20 UTC 2015


On 08/19/2015 09:47 AM, Amos Jeffries wrote:
> On 19/08/2015 4:50 p.m., Alex Rousskov wrote:
>> On 08/15/2015 12:20 AM, Amos Jeffries wrote:
>>> For now this class is specifically and intentionally dumping out the
>>> (old) format for cachemgr.cgi. Other third-party tools are considered
>>> only so far as the cachemgr ambiguous syntax was published in various
>>> parts in various places. As long as the output still matches that they
>>> should be fine (or better off).

>> IMO, made-up after-the-fact syntax rules are irrelevant (because
>> "nobody" knows about them, and they are imprecise). Cachemgr.cgi itself
>> is irrelevant (because we control it). Admin scripts parsing old output
>> are relevant. Thus, we should either keep the old output pretty much "as
>> is" or replace it with something sufficiently better to justify the
>> change pains for the admins. Adjusting output (while claiming
>> compatibility with some syntax rules we made up) is a bad approach
>> because it makes admins unhappy while not making us happy.


> "The Definitive Guide" 14.2. All the way through its talking about
> cachemgr.cgi display "columns".
> 
> To find the detail of what a input "column" actually is refer to
> cachemgr.cgi code itself. Specificaly the munge_other_line() function
> for converting report text/plain to text/html. This code has not changed
> in any significant way since the 1990's.
> 
> * Any line excluding a \t is a free-form text line. Lets call that ...
> comment, kv-pair, LF.
> 
> * table is a series of \t delimited cells. Lets call that ... 'table-row'.
> 
> * If the first cell of the first line of a table includes a ':' its a
> header row. Lets call that ... 'table' for the whole set of rows, and we
> will need a optional 'label' for the pre-colon bit in the first table-row.

> The Guide is also talking about how several reports including
> 'redirectors' are *identical* in format to 'idns' ... not today, they
> use tab-delimiting. idns got screwed up years later and now uses space
> delimiting and looks like crap in CGI display as a result.
> 
> 
> The kv-pairs is not part of the Guide + CGI documentation. Those parts I
> took from your own emails and IRC chats educating me an kinkie about
> what the report format was supposed to be.


This part of the email thread was discussing whether the existence of
admin scripts (rather than various imprecise syntax rules or
cachemgr.cgi code) should be the primary factor in our decision making.
How is the above information relevant to that discussion?


> It turns out that no, those were just PRE (free-form) lines "more easily
> machine interpretable than human readable" as the Guide says. But it
> does make sense to have them, and many reports look like they were
> created with that in mind so I added the kv-pair descriptor to the
> grammar too. You object to that?

Do I object to adding kv-pair to the grammar? No. I did not even review
those low-level details! I trust you with roughly approximating the
actual output. And a rough approximation is sufficient [and the only
thing possible] here IMO.


> The lack of a written copy of the grammar to date does not preclude its
> existence prior to now. 

In practical terms, it pretty much does preclude it actually: Whenever
dozens of poorly coordinated developers spend years producing informal
output, one can virtually guarantee that no useful formal grammar can
describe what they did.


> Human language is a good example there, it
> "obviously" has a grammar even though an entire field of research was
> created to find out what it is, a long time after use began.

Human language is a good example of where no useful formal grammar
exists. There are various useful approximations, of course, but I do not
think anybody is even trying to confine human language into a useful
formal grammar for the last 30 years or so.



>>>>> +class PayloadFormatter
>>>> ...
>>>>> +    Packable &outBuf;
>>>>> +};
>>>>
>>>> Since the majority of PayloadFormatter methods and callers are going to
>>>> assemble and format pieces of text and numbers, I think this should
>>>> become an ostream.
>>
>>> You want to go back to PackableStream now?
>>
>> If possible, the new page/report/payload building interface should be
>> ostream-enabled, not PackableStream-enabled. IIRC, that's what I
>> sketched a few emails back.
> 
> So how does the data get from std::ostreambuf to the client if its not
> going near a Packable (ie StoreEntry).

I do not think ostream should use std::ostreambuf if that is what you
meant. It should use our own buffer, backed by Packable (or StoreEntry).
However, cache manager code would not know that and, hence, there should
be no linking problems associated with that knowledge.


> You ask to data-copy into the stream buffer, then data-copy a second
> time to the StoreEntry buffer.

No, I do not. The stream buffer should use StoreEntry as storage [when
the output goes to Store].


> PackableStream is an ostream child providing StoreEntry as the buffer.
> So PackableStream data-copies to the StoreEntry buffer.

Yes.


> Which would you expect provides better performance:
>  one or two data-copies ?

One copy.

(The design choices here are actually not driven by performance, but by
the requirement to avoid buffering of huge cache manager responses in
RAM. However, the same custom ostream buffer design happens to eliminate
the extra copy as a positive performance side effect).



>> Said that, ostream is the wrong primary interface for assembling
>> payload/pages/reports. IMO, you should reintroduce ostream capabilities,
>> but we should not be [going back to] assembling primary
>> payload/pages/reports using ostream. More on that below.


> The code proposal you submitted has Formatter inherting from ostream.

Yes, Page is a child of ostream in my sketch.


> The only use of Formatter is by Action.

Yes, at least in the scope of this discussion.


> Ergo you intend Action to be
> using a ostream interface to assemble top-level report data.

No. Page provides a set of methods for high-level assembly, including:

    /// starts recording a list of values
    virtual Page &beginList() = 0;
    /// finishes recording a list of values
    virtual Page &endList() = 0;

    /// starts recording a list of values
    virtual Page &beginListItem() = 0;
    /// finishes recording a list of values
    virtual Page &endListItem() = 0;

    /// starts recording a key:value pair with a given key
    virtual Page &beginKv(const SBuf &key) = 0;
    /// finishes recording a key:value pair
    virtual Page &endKv() = 0;



> Either ostream is a private internal to Formatter for low-level use
> only, or it is available (and eventually used) at high-level Action.

It is a public interface for assembling some opaque/atomic values for
formal output such as YAML.

Whether inheriting from ostream is the best approach to accomplish that,
I am not sure, but it is probably the simplest one. The alternative to
consider is to provide a method that returns an ostream reference. There
are some examples below.


> Formatter with its clean API distinct from StoreEntry and Action does
> not pull in the dependencoes I was worried about with libmem.

Yes, of course.



>>> Because StoreEntryStream would place a Store.h dependency almost as
>>> widely as squid.h dependency.

>> The _implementation_ of the ostream-based interface may use something
>> like PackableStream or even StoreEntryStream without introducing any
>> excessive dependencies AFAICT. In the sketch I posted, only one line of
>> code new about std::cout backing while the rest was using ostream.

> You argued strongly for *not* doing it exactly that way only the other week.


I did not. I argued against using ostream to format strict syntax (e.g.,
YAML) output. Later, I provided a sketch how to avoid doing that while
still supporting ostream-based assembly of YAML atomic values. I now
included specific examples further below.


> Your response has been to propose an object ... which is an ostream
> child. And would thus be used like PackableStream in my first proposal.

The difference between the Page class and PackableStream is that Page
provides methods for high-level formal formatting (such as YAML).
PackableStream does not and should not have such methods. You should
look at the whole Page class declaration, not just at its first line
(where ostream inheritance is declared).


>>>> Almost all the code that uses outBuf and PayloadFormatter methods in the
>>>> patch already looks labored/awkward (often worse than the trivial code
>>>> it replaces!), and would look much better if replaced by the usual
>>>> ostream "<<" expressions.
>>
>>> Now you see the benefit of ostream.
>>
>> My attitude towards ostream has not changed. Forming payloads/pages
>> /reports using ostream as the top-level interface was wrong and still is.
> 
> Your objection was something along the lines of "do not use an ostream
> in Action".
> 
> So you proposed the Formatter class, an ostream child to be used by Action.
> 
> WTF?

When you only pay attention the first few words of my paragraph or my
class declaration, it is easy to arrive at strange conclusions. Here is
my initial objection, prefaced with its context sample:


> Mgr::MenuAction::dump(std::ostream &p)
> {
>     p.fill(' '); // space pad the menu fields for readability
>     p.setf(std::ios::left);
>     for (auto &a : CacheManager::GetInstance()->menu()) {
>         p << '\t' // menu is a table
>           << std::setw(22) << a->name << '\t'
>           << std::setw(32) << a->desc << '\t'
>           << std::setw(8) << CacheManager::GetInstance()->ActionProtection(*a)
>           << '\n';
>     }
>     p.unsetf(std::ios::left);
> }

>> It feels wrong that you want Actions to output using strict syntax but
>> give them an "free form" or "anything goes" std::ostream as a tool to
>> accomplish that. Should not Action dumping methods receive a specialized
>> class that makes it easy to produce the output you want and hard (if not
>> impossible) to make syntax mistakes?

The Page class in my sketch is an example of that "specialized" class I
was talking about in the above paragraph. It makes formal output "easy"
by providing methods that ensure correct syntax. Here is the sketch code
implementing something similar to the MenuAction::dump() above:


>     page.beginList();
>     for (auto name: {"5min", "60min", "disk_io", "events", "shutdown"}) {
>         page.beginListItem().
>             beginHash().
>                 kv(name, "menu item description to be extracted from Action::desc").
>                 kv("protection", "public"). // ActionProtection(*a)
>             endHash().
>         endListItem();
>     }
>     page.endList();


We can discuss how "hard" it should be for Action to get access to
Page's ostream interface. In the sketch, the developer would have to use
an ostream method or operator to get access to ostream features. For
example:

  page.beginKv("statistics_foo") << something.mean() << endKv;
  page.beginkv("worker_id") << "disker" << DiskerId << endKv;


If that usage of a low-level interface is not visible enough, we can
hide ostream access behind a dedicated method returning an ostream
reference:

class Page {
public:
...
    /// use this to assemble opaque/atomic values
    ostream &raw();
...
}

It might be used along these lines:

  page.beginKv("statistics_foo").raw() << something.mean() << endKv;
  page.beginkv("worker_id").raw() << "disker" << DiskerId << endKv;


Needless to say, there is some danger in providing such raw access. If a
developer is not careful, they might introduce a syntax error inside
what they saw were perfectly fine atomic values. We can protect against
that and offer automatic escaping of bad characters (among other
things), but that would require more work. I suspect such protections
would not necessarily require Page interface changes; if I am right, we
can delay their implementation until it is clear they are actually needed.


>> Making ostream available for opaque and low-level parts of payload is a
>> good approach. My earlier sketch was probably too small to illustrate
>> this point, but it does provide high-level report structuring methods
>> while keeping ostream available for low-level stuff. More needs to be
>> done to polish that split based on actual typical use cases.

> The low-level stuff that makes sense using ostream is all internal to
> Formatter. We seemed to be agreeing on that much.

We do not agree on the "all internal part". IMO, ostream should be
exposed so that Actions can assemble opaque values using it. I hope the
specific examples above illustrate what I mean by that.


> What I'm not getting is why you insist this ostream API be exposed and
> used by Action now but that PackableStream was inappropriate when it
> could simply have had new members added and been identical to your
> proposed Formatter.

First of all, the first patch I was reviewing did not pass
PackableStream to Actions. It passed ostream. We cannot add methods to
ostream.

Second, adding formal-syntax formatting methods to PackableStream (and
passing PackableStream to Actions) would be wrong. The PackableStream
class should be dedicated to providing a Packable-backed ostream. It
should not know anything about Cache Manager reports and their
formatting. We may use PackableStream outside Cache Manager.


> The problem in the memory libraries is the *StoreEntry*. In particular
> its class declaration head and things it pulls in.
> 
> Action depends deeply on StoreEntry. So there is no Action dumper in
> libmem. Just currently some pool methods taking (ostream&) and free-form
> dumping out "tables".
> 
> StoreEntry was made a Packable child for the purpose of using the
> Packable API, possibly via PackableStream. To store stuff into a
> StoreEntry (or just  a Membuf/SBuf) in more complex Formatter-like ways
> without having libmem be aware it was anything other than a Packable.
> 
> If we are going to use Formatter in the libmem dumper code. Then it has
> to be detatched from StoreEntry completely. That also means detatching
> it from Action which depends deeply on StoreEntry. 

Yes, I suspect we are [still] in agreement here.


> And that it can only
> use the Packable API, possibly via PackableStream (aka ostream&).

I hope that Page/Formatter does not need to know about Packable, just
ostream, but there may be some corner cases I am not aware of.


>>>>> +    // a comment
>>>>> +    virtual void notice(const SBuf &) = 0;
>>>>
>>>> Let's try to define the purpose of this method more precisely so that we
>>>> know when it is being used [in]correctly. For example:
>>>>
>>>>   /// a free-form comment or informational text that
>>>>   /// is not meant to be further parsed by automation tools
>>
>>
>>> No.
>>>
>>> notice = "A free-form informational text block that is meant for display
>>> without further processing."
>>>
>>> comment = "A free-form informational text that is not meant for
>>> automated tool processing or display. May be discarded by automation."
>>
>> You have documented notice() as "a comment". Now you give two different
>> definitions for those two words. Please make up your mind and adjust the
>> notice() description accordingly. After that, it should be clear whether
>> we agree what notice() is.


> *Do you agree to those definitions* before I go changing any more code
> around them?

Do I agree that any notice and any comment fit your definitions? No.
Fortunately, we are not writing word definitions for a dictionary but
documenting a specific class method, so we only need to agree on what
our notice() should do.

I have proposed how to describe the notice() method. You said "No" and
provided two different descriptions for the two words you previously
used as _synonyms_. You did not say which of those two descriptions
applies to the notice() method. There was no explanatory text
accompanying the two descriptions either (unless you count "No" as
such). I hope you can see why I asked you to pick one of the two
descriptions so that we can focus the discussion on it.


I believe your "notice" description is closer to what we need than your
"comment" description. The key here, IMO, is that notice() text is not
meant to be parsed further by automation tools. If we do not want to
prohibit comparisons, then a tool may even compare the received notice
to some notices it knows about, but only as an opaque blob of text, with
no internal structure.

Do I like "display without further processing"? No, I do not like that
phrasing because it is not clear whether

  * further processing for display purposes is allowed;
  * other use (e.g., comparison with other notices) is prohibited;
  * automated reaction without display is prohibited.

In my imperfect description example, I tried to avoid those pitfalls by
focusing on prohibiting just one key activity: parsing of the notice
value [to extract some usable parts]. If we prohibit that, then we are
free to use any format/arrangement for the notice text, but must resist
the temptation to include some information that automation scripts will
want to parse.

For example, the following are "bad" notices (Squid should not generate
them):

  uptime: 1 hour
  built with libecap v1.1
  reconfiguration finished in 5 seconds
  error: I/O error while responding to a cache manager request

and the following are "good" notices:

  reconfiguration finished
  unsupported action
  unknown cache manager output format


> For now notice() is just informational with a requirement that it
> reaches the report reader. There is no other implications on teh
> processing tool.

If we require that a notice reaches a human report reader, does it mean
that a script receiving a notice cannot act on it autonomously? If yes,
I do not think we want that. If no, then I recommend avoiding such
language as "report reader".


> I'm not sure how to emphasise *informational* any better than using it
> as the active noune in the descriptive. ALL-CAPS seems over the top.
>  Any suggestions?

I do not think we should try to regulate that aspect of notice usage.
What is "informational" (as in "FYI, X happened, but feel free to
disregard"?) for one, could be very important (as in "do Y if you
receive notice X") for another recipient.



>> Going forward, please do not use "final" unless really necessary. Treat
>> it like we treat "throw()" declarations.


> You mean spotted liberally around the new code in the form of a wrapper
> keyword like Must() ?

No, I mean avoided like the now-deprecated "throw()" exception
specification:

  http://en.cppreference.com/w/cpp/language/except_spec
  http://www.gotw.ca/publications/mill22.htm

Proposed rule of thumb: If correct inheritance or overriding is
possible, do not use "final", even if you do not see any good reasons to
inherit or override.


HTH,

Alex.



More information about the squid-dev mailing list