[squid-users] https log message formatting help

Mon Apr 10 03:35:56 UTC 2017

On 10/04/2017 1:36 p.m., daveh wrote:
> Thanks for the reply.
> 
> Im parsing squid logs to send to a SIEM to identify IOCs. The SIEM agent
> requires a URL to be formatted with http|https://<URI>
> 
> It knows then that it can break the string out into various components such
> as request URL authority, host etc

So it can understand *URL* format. But that is not what is being logged.
Squid technically logs a URI, and this log processing is one of the
cases were the difference between URI and URL matters.

> 
> Your comment on logging https connections is not what I have found. I would

I think you misread what I wrote. There are only two ways to get Squid
to know what the https:// URL was - neither of them are normal proxy usage.

> expect that typing https://something.net will return that extact string in
> the log. Every https connection is logged as a CONNECT with the FQDN
> appended the :443.

You expect wrong.

The URL you entered into some client software starts with the schema
"https://" ... which requires that the fetching of that URL is done
securely. The last thing you should expect is that URL being sent over
plain-text / "in the clear" to some external software.

To do HTTPS the client software has to setup multiple layers of
protocols and security.

1) First it has to open a TCP connection to the proxy.

2) It does then have to tell the proxy where it is going to. But no more
than that. Thus the CONNECT request. As per
<https://tools.ietf.org/html/rfc7230#section-5.3.3> all that any
plain-text connection to a proxy contains is:

 CONNECT www.example.com:443 HTTP/1.1

3) Then it has to setup TLS/SSL encryption over those two TCP
connections. So the crypto happens directly between the client and the
server (as if the proxy were not there).

4) Then, and only then, after all that has been successful does it start
to send the first (or potentially many, hundreds, thousands...) of HTTP
requests over the connection:

  GET /index.html HTTP/1.1
  Host: example.com
   ...

If you look closely at that #4 layer request there is no "https://"
there. Nor any way to reconstruct it.
 It might even be another CONNECT (thought TOR invented onion routing?
HTTPS beat it by decades).

That meme from The Matrix "there is no spoon" has never been more apt.
There is no "https://" - at least, not once the client interprets its
input URL. It vanishes right there and then.

> Is there something in the config to force this to happen?

There is no simple config option. In fact we go out of our way to ensure
data accuracy. So the log contains reality and log interpreters can make
whatever assumptions you want it to about what they read there.

p-PS. I find it particularly odd that you would be trying to feed false
information into a SIEM system - security event detection depends on
accuracy of inputs. But its your neck.

> DOesnt seem to be a way of doing it with log formatting
> 

There is that logformat directive and the codes I gave in my earlier
mail. <http://www.squid-cache.org/Doc/config/logformat/> and
"%>rs://%>rd:%>rP%>rp"

If the %>rs is not producing a scheme for CONNECT transactions you could
hard-code "https". Either way its a good idea to log these faked-up
records to a different log all of their own.

Use the access_log directive to setup multiple outputs:
 <http://www.squid-cache.org/Doc/config/access_log/>

Amos