[squid-users] Inconsistent accessing of the cache, craigslist.org images, wacky stuff.
Jester Purtteman
jester at optimera.us
Wed Oct 28 01:05:05 UTC 2015
So, here is the problem: I want to cache the images on craigslist. The
headers all look thoroughly cacheable, some browsers (I'm glairing at you
Chrome) send with this thing that requests that they not be cachable, but
craigslist replies anyway and says "sure thing! Cache that sucker!" and
firefox doesn't even do that. An example of URL:
http://images.craigslist.org/00o0o_3fcu92TR5jB_600x450.jpg
The request headers look like:
Host: images.craigslist.org
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:41.0) Gecko/20100101
Firefox/41.0
Accept: image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://seattle.craigslist.org/oly/hvo/5288435732.html
Cookie: cl_tocmode=sss%3Agrid; cl_b=hlJExhZ55RGzNupTXAYJOAIcZ80;
cl_def_lang=en; cl_def_hp=seattle
Connection: keep-alive
The response headers are:
Cache-Control: public, max-age=2592000 <-- doesn't that say "keep that a
very long time"?
Content-Length: 49811
Content-Type: image/jpeg
Date: Tue, 27 Oct 2015 23:04:14 GMT
Last-Modified: Tue, 27 Oct 2015 23:04:14 GMT
Server: craigslist/0
Access log says:
1445989120.714 265 192.168.2.56 TCP_MISS/200 50162 GET
http://images.craigslist.org/00Y0Y_kMkjOhL1Lim_600x450.jpg -
ORIGINAL_DST/208.82.236.227 image/jpeg
And Store Log says:
1445989120.714 RELEASE -1 FFFFFFFF 27C2B2CEC9ACCA05A31E80479E5F0E9C ?
? ? ? ?/? ?/? ? ?
I started out with a configuration from here:
http://wiki.sebeka.k12.mn.us/web_services:squid_update_cache but have made a
lot of tweaks to it. In fact, I've dropped all the updates, all the
rewrite, store id, and a lot of other stuff. I've set cache allow all
(which, I suspect I can simply leave blank, but I don't know) I've cut it
down quite a bit, the one I am testing right now for example looks like
this:
My squid.conf (which has been hacked mercilously trying stuff, admittedly)
looks like this:
<BEGIN SQUID.CONF >
acl localnet src 10.0.0.0/8 # RFC1918 possible internal network
acl localnet src 172.16.0.0/12 # RFC1918 possible internal network
acl localnet src 192.168.0.0/16 # RFC1918 possible internal network
acl localnet src fc00::/7 # RFC 4193 local private network range
acl localnet src fe80::/10 # RFC 4291 link-local (directly plugged)
machines
acl SSL_ports port 443
acl Safe_ports port 80 # http
acl Safe_ports port 21 # ftp
acl Safe_ports port 443 # https
acl Safe_ports port 70 # gopher
acl Safe_ports port 210 # wais
acl Safe_ports port 1025-65535 # unregistered ports
acl Safe_ports port 280 # http-mgmt
acl Safe_ports port 488 # gss-http
acl Safe_ports port 591 # filemaker
acl Safe_ports port 777 # multiling http
acl CONNECT method CONNECT
http_access allow localnet
http_access allow localhost
# And finally deny all other access to this proxy
http_access deny all
http_port 3128
http_port 3129 tproxy
cache_dir aufs /var/spool/squid/ 40000 32 256
cache_swap_low 90
cache_swap_high 95
dns_nameservers 8.8.8.8 8.8.4.4
cache allow all
maximum_object_size 8000 MB
range_offset_limit 8000 MB
quick_abort_min 512 KB
cache_store_log /var/log/squid/store.log
access_log daemon:/var/log/squid/access.log squid
cache_log /var/log/squid/cache.log
coredump_dir /var/spool/squid
max_open_disk_fds 8000
vary_ignore_expire on
request_entities on
refresh_pattern -i .*\.(gif|png|jpg|jpeg|ico|webp)$ 10080 100% 43200
ignore-no-store ignore-private ignore-reload store-stale
refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern -i .*\.index.(html|htm)$ 2880 40% 10080
refresh_pattern -i .*\.(html|htm|css|js)$ 120 40% 1440
refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
refresh_pattern . 0 40% 40320
cache_mgr <my address>
cache_effective_user proxy
cache_effective_group proxy
<END SQUID.CONF>
There is a good deal of hacking that has gone into this configuration, and I
accept that this will eventually be gutted and replaced with something less,
broken. Where I am pulling my hair out is trying to figure out why things
are cached and then not cached. That top refresh line (the one looking for
jpg, gifs etc) has taken many forms, and I am getting inconsistent results.
The above image will cache just fine, a couple times, but if I go back,
clear the cache on the browser, close out, restart and reload, it releases
the link and never again shall it cache. What is worse, it appears to get
getting worse over time until it isn't really picking up much of anything.
What starts out as a few missed entries piles up into a huge list of cache
misses over time.
Right now, I am running somewhere in the 0.1% hits rate, and I can only
assume I have buckled something in all the compile and re-compiles, and
reconfigurations. What started out as "gee, I wonder if I can cache
updates" has turned into quite the rabbit hole!
So, big question, what debug level do I use to see this thing making
decisions on whether to cache, and any tips anyone has about this would be
appreciated. Thank you!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squid-cache.org/pipermail/squid-users/attachments/20151027/fbf58684/attachment-0001.html>
More information about the squid-users
mailing list