[squid-users] Content Adaptation with HTTPs

Amos Jeffries squid3 at treenet.co.nz
Sun Aug 20 08:19:11 UTC 2017


On 20/08/17 16:05, Christopher Ahrens wrote:
> 
> The current solution doesn't work for me since it only supports a very 
> limited number of clients.  I am working with a charity that provides 
> internet services to those with impaired vision, the intention of my 
> project was to set up a semi-public proxy for recipient of the charity 
> (EG, we would install DD-WRT like routers within their homes that would 
> create a tunnel into our network so that they could browse the internet 
> using off-the-shelf systems.  We recently received a large number of 
> tablets form a corporate donor, the tablets themselves will work for our 
> recipients, but unfortunately the internet at large does not.

FYI: If you can get the adaptation part to be small enough a non-caching 
Squid should be able to run on those WRT-like devices with under 32 MB 
of RAM needed. So the tunnel may not be necessary, just a way to update 
the software and its config.

> 
> We've looked into commercial systems in the past, but we cannot afford 
> the cost of commercial systems, especially since we are unsure about the 
> exact licensing that would be needed for our endeavor.  We have also 
> been burnt in the past with commercial software where the project either 
> goes dead, begins to require insanely expensive appliances, or the 
> license price is sent sky-high.
> 
> Would it be possible to use a setup of Squid <-> Privoxy <-> Squid to 
> execute this?  I figure we'd build an internal instance that will handle 
> the client<->proxy part, Privoxy handles the content modification, then 
> a second Squid instance to handle the web server<->proxy part.

Squid will only send SSL-Bump'ed HTTPS traffic over encrypted 
connections. So that is only possible if privoxy accepts TLS connections 
from Squid. In which case you probably do not need the second Squid, as 
privoxy would also be doing the HTTPS to-server part easily enough itself.


> 
> SO it looks like the solution would be to find a developer to write an 
> ECAP to cycle through regexes to replace/remove HTML/CSS content.  So 
> time to dig out my old C++ books and get to work...

If the existing ICAP/eCAP options are not suitable, then yes a custom 
one would be needed.

It is not as easy as a few regex replacements though. Adaptors are 
streamed the full on-wire HTTP message format with only minor 
sanitization by Squids parser. To alter the content you will have to 
deal with data encodings, object ranges, partially received objects. And 
it is best to assume everything is of infinite length unless explicitly 
told otherwise - so no buffer-then-adapt code.
  eCAP is simpler than ICAP, but still has to deal with these HTTP features.

Those are a big part of why available software is so sparse. The other 
part being that HTTP traffic payloads are copyright content, so there 
are legal issues with selling software for the purpose of altering 
copyright content sans authors permission.

Amos


More information about the squid-users mailing list