[squid-dev] squid.conf future

Tue Feb 25 17:41:12 UTC 2020

On 2/25/20 1:31 AM, Amos Jeffries wrote:
> On 25/02/20 6:11 am, Alex Rousskov wrote:
>> On 2/24/20 3:11 AM, Amos Jeffries wrote:
>>> For the future I am considering a switch of cf.data.pre to a format like
>>> SGML or XML which we can better generate the website contents from.

>> If you meant something specific by "SGML", please clarify.

> We have the Linuxdoc toolchain already used for release notes
> etc. so long as we have a simple set of rules about the markup used for
> bits that cf_gen needs to pull out for code generation we can use any of
> the more powerful markup in the documentation comment parts.

With all due respect to LDP, LinuxDoc feels like is a dying project
these days -- not a lot of activity and a lot of stale sites. The
current(?) toolchain maintainer said[1] that modern (at the time)
developers prefer DocBook: "DocBook DTD [...] is now a more popular DTD
than LinuxDoc in writing technical software documentation".

That was ... 11 years ago.

[1] https://gitlab.com/agmartin/linuxdoc-tools

>> Automated rendering of squid.conf sources, including web site content
>> generation, should be straightforward with any good source format,
>> including writer-friendly formats. Thus, web site generation is not an
>> important deciding criteria here AFAICT.

> It is an existing use-case for documentation output we need to maintain.

Agreed. My point is _not_ that we do not need to support web site
generation. My point is that any decent tool, including custom scripts,
can generate web sites these days (and, in most cases, do a better job
than what we have today). Thus, we should decide based on other, more
selective factors first.

>> IMO, an ideal markup language for cf.data.pre (or its replacements)
>> would satisfy these draft high-level criteria:
>>
>> 1. Writer-friendly. Proper whitespace, indentation, and other
>> presentation features of the _rendered_ output are the responsibility of
>> renderes, not content writers. Decent _sources_ formatting should be
>> automatically handled by popular modern text editors that developers
>> already use. No torturing humans with counting tags or brackets.

> This nullifies the argument that XML is torturous. Good editing tools
> can handle XML easily.

Good editors can close the current tag for you, but closing tags and
dealing with all the other machine noise is still tedious for most
humans. XML is just not designed to be friendly to human writers (and
readers!). It is like JSON: Yes, one can edit JSON by hand, especially
with a good editor, but that does not make it human-friendly. Both
formats are meant for exchanging information between programs.

>> 2. Expressive enough to define all the squid.conf concepts that we want
>> to keep/support, so that they can be rendered beautifully without hacks.
>> For example, if we agree that those sections are a good idea, then this
>> item includes support for introduction sections that define no
>> configuration options themselves.

> What are you calling squid.conf concepts here?

Everything that may need to be referenced or rendered specially. For
example, directive names, directive parameter lists, individual
parameter documentation, parameter defaults, default parameter
documentation, configuration examples, C++ macro guards, AND prose
elements such as sections, paragraphs, lists, emphasized phrases,
verbatim text, hyperlinks, etc.

>> 3. Supports documentation duplication avoidance so that we do not have
>> to duplicate a lot of text or refer the reader to directive X for
>> details of directive Y functionality.

> The XML idea supports that. I am not sure about SGML.

With XML (and many SGML DTDs), the question is not so much whether it is
_possible_ to support Foo or Bar, but how difficult that support is
going to be (for documentation writers, readers, and tool
developers/admins). I suspect that reusing XML snippets is going to
require custom tooling unless those snippets are isolated into
entities/macros. We can live with that isolation, but a more flexible
"foo.faz documentation is the same as bar.baz documentation (after
replacing baz with faz)" may work a lot better.

N.B. If by "SGML" you mean Linuxdoc DTD, then I am not sure whether it
supports quoting. SGML itself, being a meta-language (compared to XML),
can "support" anything XML can support and a lot more.

> All the other text syntax I'm aware of do not have nice writer-friendly
> referencing. The YAML-like one we currently have is a case in point.

I am not aware of _nice_ referencing in XML either. FWIW, Markdown
referencing is OK. Certainly not nice, just OK. We have no referencing
today in cf.data.pre AFAIK.

Please note that referencing and quoting/reusing content are different
beasts: Item 3 is about the latter (which is more difficult to find good
support for compared to the more prevailing referencing).

>> 6. Git-friendly: Adding two new unrelated directives does not lead to
>> conflicting pull requests.
> 
> This is unrealistic so long as the source code remains in one file. Only
> edits to independent files are guaranteed not to conflict.

Yes, I know. Some formats support individual files better than others.
And, if we decide to go this way, each individual file should be
validatable and renderable on its own (ideally). Splitting itself is
easy, but nice split support is difficult. Again, we may never find a
format that satisfies all the criteria, but it is a valid criteria to
consider.

> What I am considering is a change to the internal syntax within
> cf.data.pre. 

Yes, I understand and support evaluating other grammars. Splitting
cf.data.pre in the process is a fair consideration IMO. I am not saying
the split is required, of course. I am only saying it should be
considered because it reduces merge conflicts (at least).

> At most a filename/extension change to match. It remains a
> source code file like any other.

Not sure what you mean by the "source code file" in this context. The
fact that we currently extract pieces of cf.data.pre to build Squid
executable is unimportant for this discussion AFAICT. Do you assign some
other special properties/implications to "source code" that I should be
aware of in this context?

>> 7. Either already well-known or easy to learn by example (as far as
>> major used concepts are concerned).

> AFAIK, that effectively means SGML or XML.

If "SGML" is "Linuxdoc", then I would agree with your conclusion ... ~20
years ago. Today, most kids use markdown and such for most text-centric
tasks, probably because the latter are much more human-friendly (and
were designed to be that way). Again, markdown can be expressed in an
SGML DTD.

>> 8. Can be easily parsed using programming languages that our renderers
>> are (going to be) written in (e.g., using existing parser libraries). We
>> should probably discuss whether these renderers should be (re)written in
>> some specific languages.
> 
> This is where XML has the the advantage over wider SGML. Both are
> parseable, but XML end-tags and libxml make is a bit more simple for the
> cf_gen implementation.

To simplify, I suggest excluding XML from consideration. The more I
think about it, the more I am convinced that it would be a big mistake
to go to a machine-centric format that "nobody" uses these days.

As for SGML DTDs, I think we should rule out LinuxDoc for similar
reasons. It is possible that I can be convinced to change my mind on
LinuxDoc, but I am pretty sure there are much better options out there.

>> 9. Translation-friendly. (I do not know what that entails, but I am sure
>> that others can detail this reqiurement.)
> 
> Human language translation requires one of the formats which
> translate-toolkit can use as input, OR being easily converted into one
> of those with machine translation.

I suspect that with tools like https://pandoc.org/, we can convert
virtually any standard format to any modern format if needed, but it may
be good to know what translate-toolkit can use as input.

> Machine translation only requires a toolchain that knows both formats.
> Translation of cf.data.pre into HTML for the web and man pages for
> distro documentation of using SGML.

I do not think we should worry about machine translation -- readers can
do that on their own.

> Any suggestions of formats I should look at then?

I will need to research that. I will get back to you.

Thank you,

Alex.