[squid-users] FW: Encrypted browser-Squid connection errors

Sun Oct 30 12:59:00 UTC 2022

On 2022-10-23 06:10, Grant Taylor wrote:
> On 10/21/22 11:30 PM, Amos Jeffries wrote:
>> Not just convention. AFAICT was formally registered with W3C, before 
>> everyone went to using IETF for registrations.
> 
> Please elaborate on what was formally registered.  I've only seen 3128 
> / 3129 be the default for Squid (and a few things emulating squid).  
> Other proxies of the time, namely Netscape's and Microsoft's 
> counterparts, tended to use 8080.
> 
> I'd genuinely like to learn more about and understand the history / 
> etymology / genesis of the 3128 / 3129.

Duane W. would be the best one to ask about the details.

What I know is that some 10-12 years ago I discovered an message by 
Duane mentioning that W3C had (given or accepted) port 3128 for Squid 
use. I've checked the squid-cache archives and not seeing the message.

Right now it looks like the W3C changed their systems and only track the 
standards documents. So I cannot reference their (outdated?) protocol 
registry :-{ . Also checked the squid-cache archives and not finding it 
email history. Sorry.

> 
>> FYI, discussion started ~30 years ago.
> 
> ACK
> 
>> The problem:
>> 
>> For bandwidth savings HTTP/1.0 defined different URL syntax for origin 
>> and relay/proxy requests. The form sent to an origin server lacks any 
>> information about the authority. That was expected to be known 
>> out-of-band by the origin itself.
>> 
>> HTTP/1.1 has attempted several different mechanisms to fix this over 
>> the years. None of them has been universally accepted, so the problem 
>> remains. The best we have is mandatory Host header which most (but 
>> sadly not all) clients and servers use.
>> 
>> HTTP/2 cements that design with mandatory ":authority" pseudo-header 
>> field. So the problem is "fixed"for native HTTP/2+ traffic. But until 
>> HTTP/1.0 and broken HTTP/1.1 clients are all gone the issue will still 
>> crop up.
> 
> I'm not entirely sure what you mean by "the authority".  I'm taking it 
> to mean the identity of the service that you are wanting content from. 
> The Host: header comment with HTTP/1.1 is what makes me think this.
> 

I mean "authority" as used by HTTP specification, which refers to 
https://www.rfc-editor.org/rfc/rfc3986#section-3.2

> My understanding is that neither HTTP/0.9 nor HTTP/1.0 had a Host: 
> header and that it was assumed that the IP address you were connecting 
> to conveyed the server that you were wanting to connect to.

Yes exactly. That is the source of the problem, perpetuated by the need 
to retain on-wire byte/octet backward compatibility until HTTP/2 changed 
to binary format.

Consider what the proxy has to do when (not if) the IP:port being 
connected to are that proxy's (eg localhost:80) and the URL is only a 
path ("/") on an origin server somewhere else. Does the "GET / HTTP/1.0" 
mean "http://example.com/" or "http://example.net/" ?

> 
>> More importantly the proxy hostname:port the client is opening TCP 
>> connections to may be different from the authority-info specified in 
>> the HTTP request message (or lack thereof).
> 
> My working understanding of what the authority is seems to still work 
> with this.
> 

The key point is that the proxy host:port and the origin host:port are 
two different authority and only the origin may be passed along in the 
URL (or URL+Host header). When the client uses port 80 and 443 thinking 
they are origin services it is *required* (per 
https://www.rfc-editor.org/rfc/rfc9112.html#name-origin-form) to omit 
the real origins info. Enter problems.

>> This crosses security boundaries and involves out-of-band information 
>> sources at all three endpoints involved in the transaction for the 
>> message semantics and protocol negotiations to work properly.
> 
> I feel like the nature of web traffic tends to frequently, but not 
> always, cross security / administrative boundaries.  As such, I don't 
> think that existence of proxies in the communications path alters 
> things much.
> 
> Please elaborate on what out-of-band information you are describing. 
> The most predominant thing that comes to mind, particularly with 
> HTTP/1.1 and HTTP/2 is name resolution -- ostensibly DNS -- to identify 
> the IP address to connect to.
> 

I refer to all the many ways the clients may be explicitly or implicitly 
configured to be aware that it is talking to a proxy - such that it 
explicitly avoids sending the problematic origin-form URLs.

>> What that text does not say is that when they are omitted by the 
>> **user** they are taken from configuration settings in the OS:
>> 
>>   * the environment variable name provides:
>>      - the protocol name ("http" or "HTTPS", aka plain-text or 
>> encrypted)
>>      - the expected protocol syntax/semantics ("proxy" aka 
>> forward-proxy)
>> 
>>   * the machine /etc/services configuration provides the default port 
>> for the named protocol.
> 
> Ergo the use of /default/ values when values are not specified.

The defaults though are tuned for origin server (or reverse-proxy) 
direct contact.
No Browser I know supports 
"http-alt://proxy.example.com?http://origin.example.net/index.html" 
URLs.

> 
> I feel like this in a round about way supports my stance that the 
> default ports are perfectly fine to use.
> 

... "at your own risk" they technically might be. So long as you only 
receive one of the three types of syntax there - port 80/443 being 
officially registered for origin / reverse-proxy syntax.

>> Attempting to use a reverse-proxy or origin server such a 
>> configuration may work for some messages, but **will** fail due to 
>> syntax or semantic errors on others.
> 
> I question the veracity of that statement.

It is based on experience. Squid used to be a lot more lenient and tried 
for decades to do the syntax auto-detection. The path from that to 
separate ports is littered with CVEs. Most notably the curse that keeps 
on giving: CVE-2009-0801, which is just the trigger issue for a whole 
nest of bad side effects.

Amos