[squid-users] Inconsistent accessing of the cache, craigslist.org images, wacky stuff.

Jester Purtteman jester at optimera.us
Wed Oct 28 01:05:05 UTC 2015

So, here is the problem:  I want to cache the images on craigslist.  The
headers all look thoroughly cacheable, some browsers (I'm glairing at you
Chrome) send with this thing that requests that they not be cachable, but
craigslist replies anyway and says "sure thing! Cache that sucker!" and
firefox doesn't even do that.  An example of URL:


The request headers look like:

Host: images.craigslist.org

User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:41.0) Gecko/20100101

Accept: image/png,image/*;q=0.8,*/*;q=0.5

Accept-Language: en-US,en;q=0.5

Accept-Encoding: gzip, deflate

Referer: http://seattle.craigslist.org/oly/hvo/5288435732.html

Cookie: cl_tocmode=sss%3Agrid; cl_b=hlJExhZ55RGzNupTXAYJOAIcZ80;
cl_def_lang=en; cl_def_hp=seattle

Connection: keep-alive


The response headers are:

Cache-Control: public, max-age=2592000  <-- doesn't that say "keep that a
very long time"?

Content-Length: 49811

Content-Type: image/jpeg

Date: Tue, 27 Oct 2015 23:04:14 GMT

Last-Modified: Tue, 27 Oct 2015 23:04:14 GMT

Server: craigslist/0


Access log says:
1445989120.714    265 TCP_MISS/200 50162 GET
http://images.craigslist.org/00Y0Y_kMkjOhL1Lim_600x450.jpg -
ORIGINAL_DST/ image/jpeg


And Store Log says:
1445989120.714 RELEASE -1 FFFFFFFF 27C2B2CEC9ACCA05A31E80479E5F0E9C   ?
?         ?         ? ?/? ?/? ? ?


I started out with a configuration from here:
http://wiki.sebeka.k12.mn.us/web_services:squid_update_cache but have made a
lot of tweaks to it.  In fact, I've dropped all the updates, all the
rewrite, store id, and a lot of other stuff.  I've set cache allow all
(which, I suspect I can simply leave blank, but I don't know)  I've cut it
down quite a bit, the one I am testing right now for example looks like


My squid.conf (which has been hacked mercilously trying stuff, admittedly)
looks like this:



acl localnet src     # RFC1918 possible internal network

acl localnet src  # RFC1918 possible internal network

acl localnet src # RFC1918 possible internal network

acl localnet src fc00::/7       # RFC 4193 local private network range

acl localnet src fe80::/10      # RFC 4291 link-local (directly plugged)


acl SSL_ports port 443

acl Safe_ports port 80          # http

acl Safe_ports port 21          # ftp

acl Safe_ports port 443         # https

acl Safe_ports port 70          # gopher

acl Safe_ports port 210         # wais

acl Safe_ports port 1025-65535  # unregistered ports

acl Safe_ports port 280         # http-mgmt

acl Safe_ports port 488         # gss-http

acl Safe_ports port 591         # filemaker

acl Safe_ports port 777         # multiling http



http_access allow localnet

http_access allow localhost


# And finally deny all other access to this proxy

http_access deny all


http_port 3128

http_port 3129 tproxy


cache_dir aufs /var/spool/squid/ 40000 32 256


cache_swap_low 90

cache_swap_high 95




cache allow all


maximum_object_size 8000 MB

range_offset_limit 8000 MB

quick_abort_min 512 KB


cache_store_log /var/log/squid/store.log

access_log daemon:/var/log/squid/access.log squid

cache_log /var/log/squid/cache.log

coredump_dir /var/spool/squid


max_open_disk_fds 8000


vary_ignore_expire on

request_entities on


refresh_pattern -i .*\.(gif|png|jpg|jpeg|ico|webp)$ 10080 100% 43200
ignore-no-store ignore-private ignore-reload store-stale


refresh_pattern ^ftp: 1440 20% 10080

refresh_pattern ^gopher: 1440 0% 1440

refresh_pattern -i .*\.index.(html|htm)$ 2880 40% 10080

refresh_pattern -i .*\.(html|htm|css|js)$ 120 40% 1440

refresh_pattern -i (/cgi-bin/|\?) 0 0% 0

refresh_pattern . 0 40% 40320


cache_mgr <my address>

cache_effective_user proxy

cache_effective_group proxy




There is a good deal of hacking that has gone into this configuration, and I
accept that this will eventually be gutted and replaced with something less,
broken.  Where I am pulling my hair out is trying to figure out why things
are cached and then not cached.  That top refresh line (the one looking for
jpg, gifs etc) has taken many forms, and I am getting inconsistent results.
The above image will cache just fine, a couple times, but if I go back,
clear the cache on the browser, close out, restart and reload, it releases
the link and never again shall it cache.  What is worse, it appears to get
getting worse over time until it isn't really picking up much of anything.
What starts out as a few missed entries piles up into a huge list of cache
misses over time.


Right now, I am running somewhere in the 0.1% hits rate, and I can only
assume I have buckled something in all the compile and re-compiles, and
reconfigurations.  What started out as "gee, I wonder if I can cache
updates" has turned into quite the rabbit hole!


So, big question, what debug level do I use to see this thing making
decisions on whether to cache, and any tips anyone has about this would be
appreciated.  Thank you!

