[squid-dev] squid.conf future

Tue Feb 25 06:31:14 UTC 2020

On 25/02/20 6:11 am, Alex Rousskov wrote:
> On 2/24/20 3:11 AM, Amos Jeffries wrote:
> 
>> While doing some polish to cf_gen tool (PR #558) I am faced with some
>> large code edits to get that tool any more compliant with our current
>> guidelines. With that comes the question of whether that more detailed
>> work is worth doing at all ...
> 
> Probably not. Even PR #558 changes might be going a bit too far (or not
> far enough). Ideally, we should agree on key code cleanup principles
> before doing such cleanup, to minimize tensions in every such PR.
> Cleanup for the sake of cleanup should be done under a general
> agreement/consent rather than ad-hoc. I am working on the corresponding
> suggestions but need another week or so to post a specific proposal.
> 
> 
>> For the future I am considering a switch of cf.data.pre to a format like
>> SGML or XML which we can better generate the website contents from.
> 
> I do support fixing cf.data.pre-related issues -- they are a well-known
> constant (albeit moderate) pain for developers and users alike. However,
> using writer-unfriendly formats such as XML is not the best solution
> IMO. SGML may be a good fit, but that concept covers such a wide variety
> of languages that it is difficult to say anything specific about it in
> this context (e.g., both raw XML and wiki-like markups can be valid
> SGML!). If you meant something specific by "SGML", please clarify.

Exactly. We have the Linuxdoc toolchain already used for release notes
etc. so long as we have a simple set of rules about the markup used for
bits that cf_gen needs to pull out for code generation we can use any of
the more powerful markup in the documentation comment parts.

> 
> Automated rendering of squid.conf sources, including web site content
> generation, should be straightforward with any good source format,
> including writer-friendly formats. Thus, web site generation is not an
> important deciding criteria here AFAICT.

It is an existing use-case for documentation output we need to maintain.
We can still decide to forego adding nice-to-have outputs that do not exist.

> 
> IMO, an ideal markup language for cf.data.pre (or its replacements)
> would satisfy these draft high-level criteria:
> 
> 1. Writer-friendly. Proper whitespace, indentation, and other
> presentation features of the _rendered_ output are the responsibility of
> renderes, not content writers. Decent _sources_ formatting should be
> automatically handled by popular modern text editors that developers
> already use. No torturing humans with counting tags or brackets.

This nullifies the argument that XML is torturous. Good editing tools
can handle XML easily.

For writers dealing with the tags directly a simple SGML markup is
better. But not a huge amount.

> 
> 2. Expressive enough to define all the squid.conf concepts that we want
> to keep/support, so that they can be rendered beautifully without hacks.
> For example, if we agree that those sections are a good idea, then this
> item includes support for introduction sections that define no
> configuration options themselves.

What are you calling squid.conf concepts here?

> 
> 3. Supports documentation duplication avoidance so that we do not have
> to duplicate a lot of text or refer the reader to directive X for
> details of directive Y functionality.
> 

The XML idea supports that. I am not sure about SGML.

All the other text syntax I'm aware of do not have nice writer-friendly
referencing. The YAML-like one we currently have is a case in point.

> 4. Allows for automated validation of internal cross-references (and
> possibly other internal concepts that can be validated). Specification
> of these cross-references is covered by item 2.
> 
> 5. Allows for automated spellchecking without dangerous exceptions.
> 

Any syntax we choose with good tooling should support that. If not the
requirement to translate between formats will at least involve moving
the text parts into a format that can be spell-checked (HTML).

> 6. Git-friendly: Adding two new unrelated directives does not lead to
> conflicting pull requests.

This is unrealistic so long as the source code remains in one file. Only
edits to independent files are guaranteed not to conflict.

What I am considering is a change to the internal syntax within
cf.data.pre. At most a filename/extension change to match. It remains a
source code file like any other.

> 
> 7. Either already well-known or easy to learn by example (as far as
> major used concepts are concerned).
> 

AFAIK, that effectively means SGML or XML.

> 8. Can be easily parsed using programming languages that our renderers
> are (going to be) written in (e.g., using existing parser libraries). We
> should probably discuss whether these renderers should be (re)written in
> some specific languages.

This is where XML has the the advantage over wider SGML. Both are
parseable, but XML end-tags and libxml make is a bit more simple for the
cf_gen implementation.

> 
> 9. Translation-friendly. (I do not know what that entails, but I am sure
> that others can detail this reqiurement.)

Human language translation requires one of the formats which
translate-toolkit can use as input, OR being easily converted into one
of those with machine translation.

Machine translation only requires a toolchain that knows both formats.
Translation of cf.data.pre into HTML for the web and man pages for
distro documentation of using SGML.

> 
> It is unlikely that we can find a language that fully satisfies all the
> criteria, but I hope that we can come close. It is not a new/unusual
> problem. Let's not rush into rewrites until we agree on this.
> 

XML stand out in front IMO. Including with the writer-friendly criteria
- it can be as simple as what we have now, or as complicated as one
wants to make it.

A few lines of awk and we could auto-convert a XML file for input to
cf_gen from the existing cf.data.pre with no change to what writers deal
with beyond suddenly gaining the ability to use XML references in the
DOC_* sections.

> 
>> The main point in favour of these is that we already have infrastructure
>> in place for ESI and release notes. It would be less work to re-use one
>> of those than integrate a new library or tooling for some other format.
> 
> Reusing existing infrastructure is a nice bonus, of course, but I think
> that any major format rework should be focusing on optimizing for the
> long-term. Any infrastructure changes required to render static content
> on a web site seem relatively small to me. (And does not ESI support
> injection of any content, not just XML-based?)
> 

Any suggestions of formats I should look at then?

Amos