[squid-dev] [PATCH] PayloadFormatter (was PackableStream)

Fri Aug 21 00:54:12 UTC 2015

On 08/20/2015 07:13 AM, Amos Jeffries wrote:
>> This part of the email thread was discussing whether the existence of
>> admin scripts (rather than various imprecise syntax rules or
>> cachemgr.cgi code) should be the primary factor in our decision making.
>> How is the above information relevant to that discussion?

> If such admin scripts are based on anything beyond blind guesses it will
> be based on that public documentation and dicussion.

Thank you for explaining why you talked about Guide and cachemgr.cgi code.

I suspect most admin scripts are based on the output received from
Squid. You may call that "blind guesses" if you wish; the label is not
important in this case. Since there was no public documentation when
most of those scripts were hacked together, I do not think we should
expect much else, *even* if we fantasize about admins that read guide
books and study obscure source code just to parse simple (but
non-standard) output.

>>>>>>> +class PayloadFormatter
>>>>>> ...
>>>>>>> +    Packable &outBuf;
>>>>>>> +};
>>>>>>
>>>>>> Since the majority of PayloadFormatter methods and callers are going to
>>>>>> assemble and format pieces of text and numbers, I think this should
>>>>>> become an ostream.
>>>>
>>>>> You want to go back to PackableStream now?
>>>>
>>>> If possible, the new page/report/payload building interface should be
>>>> ostream-enabled, not PackableStream-enabled. IIRC, that's what I
>>>> sketched a few emails back.
>>>
>>> So how does the data get from std::ostreambuf to the client if its not
>>> going near a Packable (ie StoreEntry).
>>
>> I do not think ostream should use std::ostreambuf if that is what you
>> meant. It should use our own buffer, backed by Packable (or StoreEntry).
>> However, cache manager code would not know that and, hence, there should
>> be no linking problems associated with that knowledge.
>>
> 
> Okay. I think I undertand you there.
> 
> But the statement about cachemgr does not quite click. Formatter and all
> thus *is* cache manager (Mgr::) code. I dont see how Mgr:: code can not
> know about how its own internals work.
> 
> I assume by "cache manager code" you mean Action and the HTTP
> request/reply handling part of Mgr::.

... and Page/Formatter itself. The code that needs to know about
StoreEntry is the code that creates a specific ostream object to
configure the Page/Formatter object with. Whether that code is inside
the Mgr:: namespace is irrelevant to the linking problems you were
worried about.

>>> You ask to data-copy into the stream buffer, then data-copy a second
>>> time to the StoreEntry buffer.
>>
>> No, I do not. The stream buffer should use StoreEntry as storage [when
>> the output goes to Store].
>>
>>
>>> PackableStream is an ostream child providing StoreEntry as the buffer.
>>> So PackableStream data-copies to the StoreEntry buffer.
>>
>> Yes.
>>
>>
>>> Which would you expect provides better performance:
>>>  one or two data-copies ?
>>
>> One copy.
>>
>> (The design choices here are actually not driven by performance, but by
>> the requirement to avoid buffering of huge cache manager responses in
>> RAM. However, the same custom ostream buffer design happens to eliminate
>> the extra copy as a positive performance side effect).
> 
> 
> Then I am going back to the PackableStream patch and making a new
> iteration that just renames StoreEntryStream and fixes the syntax
> things. Patch for that in the other thread as soon as it works.

Re-introducing PackableStream sounds good to me. Please note that
Actions should not get PackableStream, StoreEntryStream, or ostream as
the new dump() argument though! They should get an object with
payload/page/report formatting methods. See my Page sketch for an example.

> This patch will then have Formatter creating one of those streams
> if/when necessary to drop values into.

That does not sound quite right:

* an ostream would be needed for virtually any Formatter/Page method
and, hence, should be created once;

* an ostream should be backed by StoreEntry or another complex
destination that Formatter/Page should not know about and, hence, it
should be created outside Formatter/Page.

> Agreed?

To avoid misunderstanding: I hesitate reviewing a future patch now,
based on a verbal description of changes, especially since "fixes the
syntax things" may mean very different things to me and you. Even the
specific red flags I noted above may not match your true intent and/or
the next patch iteration!

>>>> Said that, ostream is the wrong primary interface for assembling
>>>> payload/pages/reports. IMO, you should reintroduce ostream capabilities,
>>>> but we should not be [going back to] assembling primary
>>>> payload/pages/reports using ostream. More on that below.

>> Page provides a set of methods for high-level assembly, including:
>>
>>     /// starts recording a list of values
>>     virtual Page &beginList() = 0;
>>     /// finishes recording a list of values
>>     virtual Page &endList() = 0;
>>
>>     /// starts recording a list of values
>>     virtual Page &beginListItem() = 0;
>>     /// finishes recording a list of values
>>     virtual Page &endListItem() = 0;
>>
>>     /// starts recording a key:value pair with a given key
>>     virtual Page &beginKv(const SBuf &key) = 0;
>>     /// finishes recording a key:value pair
>>     virtual Page &endKv() = 0;

> That is not usable by code taking ostream&. Therefore not useful to the
> code which most needs to use this objects API.

Sorry, I do not understand this comment. There should be no
Action::dump2 "code taking ostream" if I interpret you correctly. The
new Action::dump2 methods should accept Page reference as an argument,
not an ostream reference.

>> class Page {
>> public:
>> ...
>>     /// use this to assemble opaque/atomic values
>>     ostream &raw();
>> ...
>> }
>>
>> It might be used along these lines:
>>
>>   page.beginKv("statistics_foo").raw() << something.mean() << endKv;
>>   page.beginkv("worker_id").raw() << "disker" << DiskerId << endKv;

> This relies on chaining

The above illustrates ostream usage for atomic value assembly. Chaining
is a separate issue. The above can be trivially rewritten if chaining is
not supported. We can also remove convenience manipulators if we decide
they are bad for some reason:

  page.beginKv("statistics_foo");
  page.raw() << something.mean();
  page.endKv();
  ...

> * It is reasonable to expect that worker_id has different syntax for
> string and numeric representations in one of the syntaxes added later.

I am happy to assume that for the sake of the argument.

> That would call for multiple kvPair(name, value) methods based on the
> value received here by the << operator.

I think that is more likely to call for a multiple value wrapping
methods. It is not related to the "key" part of "kv".

This is not an argument against chaining or for banning ostream access
AFAICT. To address multiple value types, we will add
Page::beginValueFoo() and valueBar(...) methods and alike.

> * With indentation in the syntax (both current and YAML) the
> "something.mean()" output needs a specific prefix applied at the start
> of each of its lines.

The mean() method returns a double (an average value over some sample).
Sorry if that was not clear from the example.

beginKv(key) should provide any necessary value prefixes, of course (at
least until we need value-type-specific wrappers).

> In both those circumstances it is better to pass 'page' to mean() and
> ensure that sub-pieces of something.mean(page) are individually prefixed
> and syntaxed by page with correct bits according to its internal state.

Sorry, but passing Page to a statistics function returning a mean value
of some sample does not make sense to me. There is no need to expose
statistics code to these formatting issues.

However, let's assume a more complicated example where we are not
dumping a simple double value but something more complex that really
needs to be exposed to Page. How does that more complex code dump a
simple double value keyed to some name? Using the same begin/endKey()
pair with a "<<" operator as illustrated by the above example.

> Also, the bits between << are now free-flow outputs again. Thats a sure
> recipe for some report to make an assumption like you do in the example
> code itself (with "disker" assuming a string ID). That a particular kv
> thing is going to take a string rather than an integer value.

The ostream part of the interface is reserved for assembly of atomic
values. You argue that it is _possible_ to misuse it for non-atomic
values. That statement is true for any interface!

Regardless of the interface you provide, it is possible to use it
incorrectly. For example, if your interface only accepts string values,
it is possible to incorrectly pass it a non-atomic "foo: bar" value. In
fact, with a string-only API, it may be arguably harder for reviewers to
spot that mistake because no "<<" operator or .raw() call will attract
their attention.

>> Needless to say, there is some danger in providing such raw access. If a
>> developer is not careful, they might introduce a syntax error inside
>> what they saw were perfectly fine atomic values. We can protect against
>> that and offer automatic escaping of bad characters (among other
>> things), but that would require more work. I suspect such protections
>> would not necessarily require Page interface changes; if I am right, we
>> can delay their implementation until it is clear they are actually needed.

> I don't think that is necessary though. So I am leaving it out of these
> early patches. Okay?

Pronouns "that" and "it" below a 7-line paragraph make it difficult for
me to guess what exactly you are leaving out. Not adding extra layer of
value protection/wrapping (for now) is the right approach. Using
string-based interfaces for string values is fine. Avoiding ostream
interfaces where they help (or introducing barriers to their future
addition) is not a good idea.

> The current free-form outputs already place indentation requirements the
> mixed methods + ostream Page API cannot meet. Its either ostream with
> Action::dump() doing the indentation explicitly, or just the
> Page/Formatter methods.
> 
> YAML adds extra construct syntax requirements that make it even more
> nasty. Then any potential binary outputs we want to generate later will
> die horribly.

Any format has atomic values. Ostream is the right interface for
assembling those atomic values, especially where everything has to
eventually go into a StoreEntry-backed stream or alike. There is no
reason to avoid ostream where it is useful.

Please note that I am not saying that everything has to be formatted
using ostream. It is just an optional part of the interface, used where
it is helpful, ignored where it is not.

>>> What I'm not getting is why you insist this ostream API be exposed and
>>> used by Action now but that PackableStream was inappropriate when it
>>> could simply have had new members added and been identical to your
>>> proposed Formatter.

>> First of all, the first patch I was reviewing did not pass
>> PackableStream to Actions. It passed ostream. We cannot add methods to
>> ostream.

> It is not passed to Actions. Its created by Action and passed to
> Action::dump() and action sub-code.

Yes, "passed to Action::dump()" is what I meant by "passed to Actions".

You asserted that PackableStream "could simply have had new members
added". That assertion did not apply to your patch because
PackableStream was not passed to Action::dump() where those methods were
needed. You passed ostream instead. We cannot add methods to ostream.

>> Second, adding formal-syntax formatting methods to PackableStream (and
>> passing PackableStream to Actions) would be wrong. The PackableStream
>> class should be dedicated to providing a Packable-backed ostream. It
>> should not know anything about Cache Manager reports and their
>> formatting. We may use PackableStream outside Cache Manager.

> Its only uses were as an output serializer (formatter) for Action dump()
> code. The slightly higher-order syntax of what the fields were was being
> left in Action::dump() and related functions where it is in current trunk.
> 
> We agreed to use Page/Formatter to allow markup injection between fields
> now, so re-coding was not necessary later.

I answered your "why" question with two specific reasons/explanations.
It feels like you are now arguing about something else. I do not know
what you are arguing about.

> In either model Action::dump is still responsible for knowledge of what
> field types are being output. Either in the Page/Formatter methods its
> calling, or the particular stream << sequencing.

I do not know which two models you are talking about, but yes, the new
Action::dump2 would know about payload/page/report structure in
virtually any reasonable implementation.

> As I mentioned above exposing operator<< to Action just means important
> parts of the markup injection is not possible. 

That assertion does not compute for me: How can giving Action and
_additional_ and _optional_ tool make something impossible?!

>> I hope that Page/Formatter does not need to know about Packable, just
>> ostream, but there may be some corner cases I am not aware of.

> If Formatter is allocating the PackableStream it would. But
> PackableStream is an wholly inline thing for exactly that type of reason.

I do not know why Page/Formatter would need to allocate PackableStream.
Ideally, Page/Formatter should not know exactly where its output is
going. If possible, Page/Formatter should just accept ostream (or, if
really needed, PackableStream) from its caller/creator/configurator.

>>>>>>> +    // a comment
>>>>>>> +    virtual void notice(const SBuf &) = 0;
>>>>>>
>>>>>> Let's try to define the purpose of this method more precisely so that we
>>>>>> know when it is being used [in]correctly.

>> For example, the following are "bad" notices (Squid should not generate
>> them):
>>
>>   uptime: 1 hour
>>   built with libecap v1.1
>>   reconfiguration finished in 5 seconds
>>   error: I/O error while responding to a cache manager request
>>
>> and the following are "good" notices:
>>
>>   reconfiguration finished
>>   unsupported action
>>   unknown cache manager output format

> I think we agree on intention. But neither have a good text for it yet.
> 
> It's that passing-on property we need to clearly state. I've used
> "display" there since its variable by context and loose enough to cover
> any type of that action.
> 
> 
> How about just a terse:
> 
> notice() - "An informational text announcement. To be passed on instead
> of ignored. Do not automatically parse the text."

We should not document what should not be ignored. "Not ignoring" is the
assumed default, of course. "Passed on" to what or to whom? If I am an
admin looking at the notice on my terminal, I do not need to pass it on
further. How about this:

notice(): Text containing potentially useful information for admins.
Expected to be treated as a single opaque string by automation tools.

BTW, if we want to make admins and support folks life easier, I would
actually add a notice ID as a notice parameter and require that all
notices with the same ID mean the same thing, regardless of the text
(which we may change). Without an explicit ID, the text itself
essentially becomes an ID, which makes it difficult to adjust it.

> comment() - "An informational text comment. May be ignored or dropped.
> Do not automatically parse the text."

This one is much easier, IMO (unless we disagree what this method should
be used for):

comment(): A comment as defined by the format syntax. Usually contains
information meant for developers. Expected to be discarded by automation
tools.

>>>> Going forward, please do not use "final" unless really necessary. Treat
>>>> it like we treat "throw()" declarations.
>>
>>
>>> You mean spotted liberally around the new code in the form of a wrapper
>>> keyword like Must() ?
>>
>> No, I mean avoided like the now-deprecated "throw()" exception
>> specification:
>>
>>   http://en.cppreference.com/w/cpp/language/except_spec
>>   http://www.gotw.ca/publications/mill22.htm
>>
>> Proposed rule of thumb: If correct inheritance or overriding is
>> possible, do not use "final", even if you do not see any good reasons to
>> inherit or override.
>>
> 
> Aha. Though AFAIK correct inhertence is *always* possible in reasonable
> code. So that is just a ban on "final".

It depends on how you define "correct". Some say[1] that there is no
_correct_ way to subclass std::vector, for example. Since this is just a
rule of thumb, I am not sure we need to make it more precise.

[1]
http://stackoverflow.com/questions/6806173/subclass-inherit-standard-containers

In practice, I think it is used for low-level stuff that plays dirty
tricks with memory or devices. FWIW, STL implementation in GCC v4.8 has
about a dozen "final" classes and structures. All very low-level memory
allocation and shared pointers stuff AFAICT -- I do not claim to
understand exactly why those complex classes are final!

Here is one easy-to-understand example of where final would be useful:
https://akrzemi1.wordpress.com/2012/09/30/why-make-your-classes-final/

> As you saw my uses of it are kind of "political" in nature. To enforce
> better code use and graceful deprecations of widely used things.

I do not think final should be used for graceful deprecation. Final
should not imply "dying" or "soon to go away". And even if something is
planned to be removed, it may still make sense to subclass or override
it (all other alternatives can be worse than extending deprecated code).

Cheers,

Alex.