[squid-users] Content Adaptation with HTTPs

Christopher Ahrens christopher at leviacomm.net
Sun Aug 20 20:06:27 UTC 2017


Amos Jeffries wrote:
> On 20/08/17 16:05, Christopher Ahrens wrote:
>>
>> The current solution doesn't work for me since it only supports a very
>> limited number of clients.  I am working with a charity that provides
>> internet services to those with impaired vision, the intention of my
>> project was to set up a semi-public proxy for recipient of the charity
>> (EG, we would install DD-WRT like routers within their homes that
>> would create a tunnel into our network so that they could browse the
>> internet using off-the-shelf systems.  We recently received a large
>> number of tablets form a corporate donor, the tablets themselves will
>> work for our recipients, but unfortunately the internet at large does
>> not.
>
> FYI: If you can get the adaptation part to be small enough a non-caching
> Squid should be able to run on those WRT-like devices with under 32 MB
> of RAM needed. So the tunnel may not be necessary, just a way to update
> the software and its config.

Part of it is to pre-shrink the size of the pages to prevent saturating 
the tunnel.  A lot of our recipients have low-cost internet connections 
(Usually between 1-5 Mbps).  From my personal experiences, the 
transformation are probably cutting about 75%-80% of excess garbage from 
website.

We're also looking at possibly building tiny x86 or ARM-based boxes that 
can be deployed to their homes to do caching to further reduce the load 
on their internet connections.  The biggest complaint we have is why it 
takes so long to load pictures and words especially since a lot of the 
pictures are the same page-to-page (I am having a very hard time arguing 
with them...)

We can get a lot of hardware from local companies, but not so much in 
the way of software or services

>
>>
>> We've looked into commercial systems in the past, but we cannot afford
>> the cost of commercial systems, especially since we are unsure about
>> the exact licensing that would be needed for our endeavor.  We have
>> also been burnt in the past with commercial software where the project
>> either goes dead, begins to require insanely expensive appliances, or
>> the license price is sent sky-high.
>>
>> Would it be possible to use a setup of Squid <-> Privoxy <-> Squid to
>> execute this?  I figure we'd build an internal instance that will
>> handle the client<->proxy part, Privoxy handles the content
>> modification, then a second Squid instance to handle the web
>> server<->proxy part.
>
> Squid will only send SSL-Bump'ed HTTPS traffic over encrypted
> connections. So that is only possible if privoxy accepts TLS connections
> from Squid. In which case you probably do not need the second Squid, as
> privoxy would also be doing the HTTPS to-server part easily enough itself.
>

Unfortunately Privoxy doesn't do HTTPs.  We looked into using it, but it 
can only do domain blocking for HTTPs, not content manipulation.


>
>>
>> SO it looks like the solution would be to find a developer to write an
>> ECAP to cycle through regexes to replace/remove HTML/CSS content.  So
>> time to dig out my old C++ books and get to work...
>
> If the existing ICAP/eCAP options are not suitable, then yes a custom
> one would be needed.
>
> It is not as easy as a few regex replacements though. Adaptors are
> streamed the full on-wire HTTP message format with only minor
> sanitization by Squids parser. To alter the content you will have to
> deal with data encodings, object ranges, partially received objects. And
> it is best to assume everything is of infinite length unless explicitly
> told otherwise - so no buffer-then-adapt code.
>  eCAP is simpler than ICAP, but still has to deal with these HTTP features.
>
> Those are a big part of why available software is so sparse. The other
> part being that HTTP traffic payloads are copyright content, so there
> are legal issues with selling software for the purpose of altering
> copyright content sans authors permission.
>

Yeah, I was a bit afraid that would be the case.  I was planning on 
seeing how GreaseMonkey and ABP handle data streams since they seem to 
be able to handle streaming media.  Or dig into Privoxy to see how 
things are done in there. Might find it to be easier to adapt it as an 
ICAP/ECAP by changing its input / output functions to be ICAP/ECAP 
interface rather than TCP.

For now, I'm thinking that I'll just let HTTPS pass through without 
modification and let Privoxy handle http.  Seems to be the easiest way 
to do things.

> Amos
> _______________________________________________
> squid-users mailing list
> squid-users at lists.squid-cache.org
> http://lists.squid-cache.org/listinfo/squid-users



More information about the squid-users mailing list