[squid-users] FW: Encrypted browser-Squid connection errors
squid3 at treenet.co.nz
squid3 at treenet.co.nz
Sun Oct 30 12:59:00 UTC 2022
On 2022-10-23 06:10, Grant Taylor wrote:
> On 10/21/22 11:30 PM, Amos Jeffries wrote:
>> Not just convention. AFAICT was formally registered with W3C, before
>> everyone went to using IETF for registrations.
>
> Please elaborate on what was formally registered. I've only seen 3128
> / 3129 be the default for Squid (and a few things emulating squid).
> Other proxies of the time, namely Netscape's and Microsoft's
> counterparts, tended to use 8080.
>
> I'd genuinely like to learn more about and understand the history /
> etymology / genesis of the 3128 / 3129.
Duane W. would be the best one to ask about the details.
What I know is that some 10-12 years ago I discovered an message by
Duane mentioning that W3C had (given or accepted) port 3128 for Squid
use. I've checked the squid-cache archives and not seeing the message.
Right now it looks like the W3C changed their systems and only track the
standards documents. So I cannot reference their (outdated?) protocol
registry :-{ . Also checked the squid-cache archives and not finding it
email history. Sorry.
>
>> FYI, discussion started ~30 years ago.
>
> ACK
>
>> The problem:
>>
>> For bandwidth savings HTTP/1.0 defined different URL syntax for origin
>> and relay/proxy requests. The form sent to an origin server lacks any
>> information about the authority. That was expected to be known
>> out-of-band by the origin itself.
>>
>> HTTP/1.1 has attempted several different mechanisms to fix this over
>> the years. None of them has been universally accepted, so the problem
>> remains. The best we have is mandatory Host header which most (but
>> sadly not all) clients and servers use.
>>
>> HTTP/2 cements that design with mandatory ":authority" pseudo-header
>> field. So the problem is "fixed"for native HTTP/2+ traffic. But until
>> HTTP/1.0 and broken HTTP/1.1 clients are all gone the issue will still
>> crop up.
>
> I'm not entirely sure what you mean by "the authority". I'm taking it
> to mean the identity of the service that you are wanting content from.
> The Host: header comment with HTTP/1.1 is what makes me think this.
>
I mean "authority" as used by HTTP specification, which refers to
https://www.rfc-editor.org/rfc/rfc3986#section-3.2
> My understanding is that neither HTTP/0.9 nor HTTP/1.0 had a Host:
> header and that it was assumed that the IP address you were connecting
> to conveyed the server that you were wanting to connect to.
Yes exactly. That is the source of the problem, perpetuated by the need
to retain on-wire byte/octet backward compatibility until HTTP/2 changed
to binary format.
Consider what the proxy has to do when (not if) the IP:port being
connected to are that proxy's (eg localhost:80) and the URL is only a
path ("/") on an origin server somewhere else. Does the "GET / HTTP/1.0"
mean "http://example.com/" or "http://example.net/" ?
>
>> More importantly the proxy hostname:port the client is opening TCP
>> connections to may be different from the authority-info specified in
>> the HTTP request message (or lack thereof).
>
> My working understanding of what the authority is seems to still work
> with this.
>
The key point is that the proxy host:port and the origin host:port are
two different authority and only the origin may be passed along in the
URL (or URL+Host header). When the client uses port 80 and 443 thinking
they are origin services it is *required* (per
https://www.rfc-editor.org/rfc/rfc9112.html#name-origin-form) to omit
the real origins info. Enter problems.
>> This crosses security boundaries and involves out-of-band information
>> sources at all three endpoints involved in the transaction for the
>> message semantics and protocol negotiations to work properly.
>
> I feel like the nature of web traffic tends to frequently, but not
> always, cross security / administrative boundaries. As such, I don't
> think that existence of proxies in the communications path alters
> things much.
>
> Please elaborate on what out-of-band information you are describing.
> The most predominant thing that comes to mind, particularly with
> HTTP/1.1 and HTTP/2 is name resolution -- ostensibly DNS -- to identify
> the IP address to connect to.
>
I refer to all the many ways the clients may be explicitly or implicitly
configured to be aware that it is talking to a proxy - such that it
explicitly avoids sending the problematic origin-form URLs.
>> What that text does not say is that when they are omitted by the
>> **user** they are taken from configuration settings in the OS:
>>
>> * the environment variable name provides:
>> - the protocol name ("http" or "HTTPS", aka plain-text or
>> encrypted)
>> - the expected protocol syntax/semantics ("proxy" aka
>> forward-proxy)
>>
>> * the machine /etc/services configuration provides the default port
>> for the named protocol.
>
> Ergo the use of /default/ values when values are not specified.
The defaults though are tuned for origin server (or reverse-proxy)
direct contact.
No Browser I know supports
"http-alt://proxy.example.com?http://origin.example.net/index.html"
URLs.
>
> I feel like this in a round about way supports my stance that the
> default ports are perfectly fine to use.
>
... "at your own risk" they technically might be. So long as you only
receive one of the three types of syntax there - port 80/443 being
officially registered for origin / reverse-proxy syntax.
>> Attempting to use a reverse-proxy or origin server such a
>> configuration may work for some messages, but **will** fail due to
>> syntax or semantic errors on others.
>
> I question the veracity of that statement.
It is based on experience. Squid used to be a lot more lenient and tried
for decades to do the syntax auto-detection. The path from that to
separate ports is littered with CVEs. Most notably the curse that keeps
on giving: CVE-2009-0801, which is just the trigger issue for a whole
nest of bad side effects.
Amos
More information about the squid-users
mailing list