[squid-users] SSL Bump Failures with Google and Wikipedia [SOLVED]

Jeffrey Merkey jeffmerkey at gmail.com
Sun Oct 1 03:12:08 UTC 2017


On 9/30/17, Rafael Akchurin <rafael.akchurin at diladele.com> wrote:
> Hello Jeff,
>
> Do not forget Google and YouTube are now using brotli encoding extensively,
> not only gzip.
>
> Best regards,
> Rafael Akchurin
>
>> Op 30 sep. 2017 om 23:49 heeft Jeffrey Merkey <jeffmerkey at gmail.com> het
>> volgende geschreven:
>>
>>> On 9/30/17, Eliezer Croitoru <eliezer at ngtech.co.il> wrote:
>>> Hey Jeffrey,
>>>
>>> What happens when you disable the next icap service this way:
>>> icap_service service_avi_resp respmod_precache
>>> icap://127.0.0.1:1344/cherokee bypass=0
>>> adaptation_access service_avi_resp deny all
>>>
>>> Is it still the same?
>>> What I suspect is that the requests are defined to accept gzip
>>> compressed
>>> objects and the icap service is not "gnuzip" them which results in what
>>> you
>>> see.
>>>
>>> To make sure that squid is not at fault here try to disable both icap
>>> services and then add then one at a time and see which of this triangle
>>> is
>>> giving you trouble.
>>> I enhanced an ICAP library which is written in GoLang at:
>>> https://github.com/elico/icap
>>>
>>> And I have couple examples on how to work with http requests and
>>> responses
>>> at:
>>> https://github.com/andybalholm/redwood/
>>> https://github.com/andybalholm/redwood/search?utf8=%E2%9C%93&q=gzip&type=
>>>
>>> Let me know if you need help finding out the issue.
>>>
>>> All The Bests,
>>> Eliezer
>>>
>>> ----
>>> Eliezer Croitoru
>>> Linux System Administrator
>>> Mobile: +972-5-28704261
>>> Email: eliezer at ngtech.co.il
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: squid-users [mailto:squid-users-bounces at lists.squid-cache.org] On
>>> Behalf Of Jeffrey Merkey
>>> Sent: Saturday, September 30, 2017 23:28
>>> To: squid-users <squid-users at lists.squid-cache.org>
>>> Subject: [squid-users] SSL Bump Failures with Google and Wikipedia
>>>
>>> Hello All,
>>>
>>> I have been working with the squid server and icap and I have been
>>> running into problems with content cached from google and wikipedia.
>>> Some sites using https, such as Centos.org work perfectly with ssl
>>> bumping and I get the decrypted content as html and it's readable.
>>> Other sites, such as google and wikipedia return what looks like
>>> encrypted traffic, or perhaps mime encoded data, I am not sure which.
>>>
>>> Are there cases where squid will default to direct mode and not
>>> decrypt the traffic?  I am using the latest squid server 3.5.27.  I
>>> really would like to get this working with google and wikipedia.  I
>>> reviewed the page source code from the browser viewer and it looks
>>> nothing like the data I am getting via the icap server.
>>>
>>> Any assistance would be greatly appreciated.
>>>
>>> The config I am using is:
>>>
>>> #
>>> # Recommended minimum configuration:
>>> #
>>>
>>> # Example rule allowing access from your local networks.
>>> # Adapt to list your (internal) IP networks from where browsing
>>> # should be allowed
>>>
>>> acl localnet src 127.0.0.1
>>> acl localnet src 10.0.0.0/8     # RFC1918 possible internal network
>>> acl localnet src 172.16.0.0/12  # RFC1918 possible internal network
>>> acl localnet src 192.168.0.0/16 # RFC1918 possible internal network
>>> acl localnet src fc00::/7       # RFC 4193 local private network range
>>> acl localnet src fe80::/10      # RFC 4291 link-local (directly
>>> plugged) machines
>>>
>>> acl SSL_ports port 443
>>> acl Safe_ports port 80          # http
>>> acl Safe_ports port 21          # ftp
>>> acl Safe_ports port 443         # https
>>> acl Safe_ports port 70          # gopher
>>> acl Safe_ports port 210         # wais
>>> acl Safe_ports port 1025-65535  # unregistered ports
>>> acl Safe_ports port 280         # http-mgmt
>>> acl Safe_ports port 488         # gss-http
>>> acl Safe_ports port 591         # filemaker
>>> acl Safe_ports port 777         # multiling http
>>> acl CONNECT method CONNECT
>>>
>>> #
>>> # Recommended minimum Access Permission configuration:
>>> #
>>> # Deny requests to certain unsafe ports
>>> http_access deny !Safe_ports
>>>
>>> # Deny CONNECT to other than secure SSL ports
>>> http_access deny CONNECT !SSL_ports
>>>
>>> # Only allow cachemgr access from localhost
>>> http_access allow localhost manager
>>> http_access deny manager
>>>
>>> # We strongly recommend the following be uncommented to protect innocent
>>> # web applications running on the proxy server who think the only
>>> # one who can access services on "localhost" is a local user
>>> #http_access deny to_localhost
>>>
>>> #
>>> # INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
>>> #
>>>
>>> # Example rule allowing access from your local networks.
>>> # Adapt localnet in the ACL section to list your (internal) IP networks
>>> # from where browsing should be allowed
>>> http_access allow localnet
>>> http_access allow localhost
>>>
>>> # And finally deny all other access to this proxy
>>> http_access deny all
>>>
>>> # Squid normally listens to port 3128
>>> #http_port 3128
>>>
>>> # Uncomment and adjust the following to add a disk cache directory.
>>> #cache_dir ufs /usr/local/squid/var/cache/squid 100 16 256
>>>
>>> # Leave coredumps in the first cache dir
>>> coredump_dir /usr/local/squid/var/cache/squid
>>>
>>> #
>>> # Add any of your own refresh_pattern entries above these.
>>> #
>>> refresh_pattern ^ftp:           1440    20%     10080
>>> refresh_pattern ^gopher:        1440    0%      1440
>>> refresh_pattern -i (/cgi-bin/|\?) 0     0%      0
>>> refresh_pattern .               0       20%     4320
>>>
>>> http_port 3128 ssl-bump generate-host-certificates=on
>>> dynamic_cert_mem_cache_size=4MB cert=/etc/squid/ssl_cert/myCA.pem
>>> http_port 3129
>>>
>>> # SSL Bump Config
>>> always_direct allow all
>>> ssl_bump server-first all
>>> sslproxy_cert_error deny all
>>> sslproxy_flags DONT_VERIFY_PEER
>>> sslcrtd_program /usr/local/squid/libexec/ssl_crtd -s /var/lib/ssl_db
>>> -M 4MB sslcrtd_children 8 startup=1 idle=1
>>>
>>> # For squid 3.5.x
>>> #sslcrtd_program /usr/local/squid/libexec/ssl_crtd -s /var/lib/ssl_db -M
>>> 4MB
>>>
>>> # For squid 4.x
>>> # sslcrtd_program /usr/local/squid/libexec/security_file_certgen -s
>>> /var/lib/ssl_db -M 4MB
>>>
>>> icap_enable on
>>> icap_send_client_ip on
>>> icap_send_client_username on
>>> icap_client_username_header X-Authenticated-User
>>> icap_preview_enable on
>>> icap_preview_size 1024
>>> icap_service service_avi_req reqmod_precache
>>> icap://127.0.0.1:1344/request
>>> bypass=1
>>> adaptation_access service_avi_req allow all
>>>
>>> icap_service service_avi_resp respmod_precache
>>> icap://127.0.0.1:1344/cherokee bypass=0
>>> adaptation_access service_avi_resp allow all
>>>
>>> Jeff
>>> _______________________________________________
>>> squid-users mailing list
>>> squid-users at lists.squid-cache.org
>>> http://lists.squid-cache.org/listinfo/squid-users
>>>
>>>
>>
>> Eliezer,
>>
>> Well, you certainly hit the nail on the head.  I added the following
>> code to check the content being sent to the icap server from squid,
>> and here is what I found when I check the headers being sent from the
>> remote web server:
>>
>> Code to check for content type and encoding received by the icap
>> server added to c-icap:
>>
>>    hdrs = ci_http_response_headers(req);
>>    content_type = ci_headers_value(hdrs, "Content-Type");
>>    if (content_type)
>>       ci_debug_printf(1,"srv_cherokee:  content-type: %s\n",
>>                       content_type);
>>
>>    content_encoding = ci_headers_value(hdrs, "Content-Encoding");
>>    if (content_encoding)
>>       ci_debug_printf(1,"srv_cherokee:  content-encoding: %s\n",
>>                       content_encoding);
>>
>> And the output from scanned pages sent over from squid:
>>
>> srv_cherokee:  init request 0x7f3dbc008eb0
>> pool hits:1 allocations: 1
>> Allocating from objects pool object 5
>> pool hits:1 allocations: 1
>> Geting buffer from pool 4096:1
>> Requested service: cherokee
>> Read preview data if there are and process request
>> srv_cherokee:  content-type: text/html; charset=utf-8
>> srv_cherokee:  content-encoding: gzip         <-- As you stated, I am
>> getting gzipped data
>> srv_cherokee:  we expect to read :-1 body data
>> Allow 204...
>> Preview handler return allow 204 response
>> srv_cherokee:  release request 0x7f3dbc008eb0
>> Store buffer to long pool 4096:1
>> Storing to objects pool object 5
>> Log request to access log file /var/log/i-cap_access.log
>>
>>
>> Wikipedia  at https://en.wikipedia.org/wiki/HTTP_compression describes
>> the process as:
>>
>> " ...
>>   Compression scheme negotiation[edit]
>>   In most cases, excluding the SDCH, the negotiation is done in two
>> steps, described in
>>   RFC 2616:
>>
>>   1. The web client advertises which compression schemes it supports
>> by including a list
>>   of tokens in the HTTP request. For Content-Encoding, the list in a
>> field called Accept -
>>   Encoding; for Transfer-Encoding, the field is called TE.
>>
>>   GET /encrypted-area HTTP/1.1
>>   Host: www.example.com
>>   Accept-Encoding: gzip, deflate
>>
>>   2. If the server supports one or more compression schemes, the
>> outgoing data may be
>>   compressed by one or more methods supported by both parties. If
>> this is the case, the
>>   server will add a Content-Encoding or Transfer-Encoding field in
>> the HTTP response with
>>   the used schemes, separated by commas.
>>
>>   HTTP/1.1 200 OK
>>   Date: mon, 26 June 2016 22:38:34 GMT
>>   Server: Apache/1.3.3.7 (Unix)  (Red-Hat/Linux)
>>   Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
>>   Accept-Ranges: bytes
>>   Content-Length: 438
>>   Connection: close
>>   Content-Type: text/html; charset=UTF-8
>>   Content-Encoding: gzip
>>
>>   The web server is by no means obligated to use any compression method –
>> this
>>   depends on the internal settings of the web server and also may
>> depend on the internal
>>   architecture of the website in question.
>>
>>   In case of SDCH a dictionary negotiation is also required, which
>> may involve additional
>>   steps, like downloading a proper dictionary from .
>> .."
>>
>>
>> So, it looks like it is a feature of the browser.  So, is it possible
>> to have squid gunzip the data or configure the browser not to send the
>> header  to remove "Accept-Encoding: gzip, deflate" from the request
>> sent to the remote server telling it to gzip the data?
>>
>> Thanks
>>
>> Jeff
>> _______________________________________________
>> squid-users mailing list
>> squid-users at lists.squid-cache.org
>> http://lists.squid-cache.org/listinfo/squid-users
>

Well,

After reviewing this problem and all of the great technical
information folks provided, I have it working and I figured out the
best way to deal with this transparently allowing squid to remotely
spoof the server side with modified request headers.

Compile squid with the flags:

--enable-http-violation

then add the following to the squid.conf file:

# disable remote html data compression by replacing HTTP request headers
# requires squid build option --enable-http-violations
request_header_access Accept-Encoding deny all
request_header_replace Accept-Encoding *;q=0

Then tell squid to strip all Accept-Encoding request headers and
substitute the string " *;q=0 " which tells the server not to send any
compressed data.  I tested this with chrome which was configured to
always send Acccept-Encoding:gzip,deflate and all of the C-ICAP data I
am seeing is plain text html, which is what I wanted.

So adding those two header instructions transparently spoofs the
remote server into always sending uncompressed data.  I did note that
google has some nonsense going on with chrome (mozilla does not do
this) so that even if chrome request headers have been spoofed by
squid specifying no gzip or deflate, the google servers will still
send a content-encoding:gzip response header on any responses which do
not contain any data (???), which is clearly a bug of some sort on
HTTP responses .

The browser treats the data as plain text and works correctly, even
though it gets a content-encoding:gzip header.  It only seems to do
this on HTTP header only payload request/responses which have no body
text.   Responses which actually contain body data are missing the
content-encoding header, which is what I wanted to see happen.

So to summarize, the above changes will enable squid to filter and
spoof Accept-Encoding: HTTP requests to tell the server to always send
uncompressed data, and from my testing it works transparently on
Chrome and Mozilla, with C-ICAP getting the uncompressed, unencrypted
data it needs.

Jeff


More information about the squid-users mailing list