[squid-users] squid 3.5.27 does not respect cache_dir-size but uses 100% of partition and fails
Amos Jeffries
squid3 at treenet.co.nz
Wed Jul 11 13:20:17 UTC 2018
On 11/07/18 22:39, pete dawgg wrote:
> Hello list,
>
> i run squid 3.5.27 with some special settings for windows updates as suggested here: https://wiki.squid-cache.org/ConfigExamples/Caching/WindowsUpdates It's been running almost trouble-free for some time, but for ~2 months the cache-partition has been filling up to 100% (space; inodes were OK) and squid then failed.
>
That implies that either your cache_dir size accounting is VERY badly
broken, something else is filling the disk (eg failing to rotate
swap.state journals), or disk purging is not able to keep up with the
traffic flow.
> the cache-dir is on a 100GB ext2-partition and configured like this:
>
Hmm, a partition. What else is using the same physical disk?
Squid puts such random I/O pattern on cache disks its best not to be
using the actual physical drive for other things in parallel - they can
slow Squid down, and conversely Squid can cause problems to other uses
by flooding the disk controller queues.
> cache_dir aufs /mnt/cache/squid 75000 16 256
These numbers do matter for ext2 more than for other FS types. You need
them to be large enough not to allocate too many inodes per directory. I
would use "64 256" here, or even "128 256" for a bigger safety margin.
(I *think* modern ext2 implementations have resolved the core issue, but
that may be wrong and ext2 is old enough to be wary.)
> cache_swap_low 60
> cache_swap_high 75
> minimum_object_size 0 KB
> maximum_object_size 6000 MB
If you bumped this for the Win8 sizes mentioned in our wiki, the Win10
major updates have bumped sizes up again past 10GB. So you may need to
increase this.
>
> some special settings for the windows updates:
> range_offset_limit 6000 MB
Add the ACLs necessary to restrict this to WU traffic. Its really hard
on cache space**, so should not be allowed to just any traffic.
** What I mean by that is it may result in N parallel fetches of the
entire object unless collapsed forwarding feature is used.
In regards to your situation; consider a 10GB WU object being fetched
10 times -> 10*10 GB of disk space required just to fetch. Which
over-fills your available 45GB (60% of 75000 MB [cache_swap_low/100 *
cache_dir] ). And 11 will overflow your whole disk.
> maximum_object_size 6000 MB
> quick_abort_min -1
> quick_abort_max -1
> quick_abort_pct -1
>
> when i restart squid with its initscript it sometimes expunges some stuff from the cache but then fails again after a short while:
> before restart:
> /dev/sdb2 99G 93G 863M 100% /mnt/cache
> after restart:
> /dev/sdb2 99G 87G 7,4G 93% /mnt/cache
>
How much of that /mnt/cache size is in /mnt/cache/squid ?
Is it one physical HDD spindle (versus a RAID drive) ?
>
> there are two types of errors in cache.log:
> FATAL: Ipc::Mem::Segment::open failed to
shm_open(/squid-cf__metadata.shm): (2) No such file or directory
The cf__metadata.shm error is quite bad - it means your collapsed
forwarding is now working well. Which implies it is not preventing the
disk overflow on parallel huge WU fetches.
Are you able to try the new Squid-4? there are some collapsed forwarding
and cache management changes that may fix or allow better diagnosis of
these particularly and maybe your disk usage problem.
> FATAL: Failed to rename log file /mnt/cache/squid/swap.state.new to
/mnt/cache/squid/swap.state
This is suspicious, how large are those swap files?
Does your proxy have correct access permissions on them and the
directories in their path - both Unix filesystem and SELinux / AppArmour
/ whatever your system uses for advanced access matter here.
Same things to check for the /dev/shm device and *.shm file access error
above. But /dev/shm should be root things rather than Squid user access.
>
> What should i do to make squid work with windows updates reliably again?
Some other things you can check;
You can try to make the cache_swap_high/low be closer together and much
larger (eg the default 90 and 95 values). Current 3.5 have fixed the bug
which made smaller values necessary on some earlier installs.
If you can afford the delays it introduces to restart, you could run a
full scan of the cached data (stop Squid, delete the swap.state* files,
then restart Squid and wait).
- you could do that with a copy of Squid not handling user traffic if
necessary, but the running one cannot use the cache while its happening.
Otherwise, have you tried purging the entire cache and starting Squid
with a clean slate?
that would be a lot faster for recovery than the above scan. But does
have a bit more bandwidth spent short-term while re-filling the cache.
Amos
More information about the squid-users
mailing list