[squid-users] Performance issue /cache_dir / cache_mem / SMP workers

Sat Nov 24 07:10:37 UTC 2018

On 24/11/18 3:21 am, pacolo wrote:
> Hello again,
> 
> We have found an issue in backend.conf, as the Rock cache_dir is SMP aware.
> 
> Change this...
> #cache_dir rock /cache${process_number} 2097152
> to this...
> cache_dir rock /cache1 2097152
> 
> 
> ... then the new errors are:
> Nov 23 14:55:28 px06 squid[12559]: ERROR: /cache1/rock communication channel
> establishment timeout
> Nov 23 14:55:28 px06 squid[12559]: FATAL: Rock cache_dir at /cache1/rock
> failed to open db file: (0) No error.
> 
> We have search for in the forum
> (http://squid-web-proxy-cache.1019090.n4.nabble.com/RockStore-quot-Fatal-Error-quot-td4666691.html),
> and tried what other people suggested without success.
> 

If you have a mix of "/cache${process_number}" and "/cache1" in your
config files you may still be mixing SMP-aware and SMP-disabled access
to the "/cache1" path.

By your mention of "backend.conf" I assume you are trying to use
something based on our example SMP CARP cluster configuration.
 If that is correct please compare what you have to the current example
config <https://wiki.squid-cache.org/ConfigExamples/SmpCarpCluster>.

It has had a few changes since initially written, and people
copy-pasting it into tutorials without linking back to our info have got
various bugs in their texts. Sometimes because they copied old versions
that no longer work, or because they made arbitrary changes without
properly understanding the consequences.

> 
> /cache1
> drwxr-xr-x 2 squid squid 16384 nov 21 13:05 lost+found
> drwxr-xr-x 2 squid squid  4096 nov 23 12:38 rock
> 
> The permissions in /dev/shm are correct, too. Squid is writing some files.
> ls_-l_dev_shm.txt
> <http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/ls_-l_dev_shm.txt>  
> 
> In addition, it appears that Squid can write in the localstatedir...
> 
> --localstatedir=/var'
> squid_-v.txt
> <http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/squid_-v.txt>  
> 
> /var/run/squid
> srwxr-x--- 1 squid squid 0 nov 23 14:55 squid-coordinator.ipc
> srwxr-x--- 1 squid squid 0 nov 23 14:55 squid-kid-1.ipc
> srwxr-x--- 1 squid squid 0 nov 23 14:55 squid-kid-2.ipc
> srwxr-x--- 1 squid squid 0 nov 23 14:55 squid-kid-3.ipc
> srwxr-x--- 1 squid squid 0 nov 23 13:00 squid-kid-4.ipc
> srwxr-x--- 1 squid squid 0 nov 23 13:00 squid-kid-5.ipc
> 
> 
> SELinux status:                 disabled
> 
> var_log_messages.txt
> <http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/var_log_messages.txt>  
> 

The log you provide has a mixture of multiple process outputs. But
appears to be lacking the "kidN" information Squid attaches to every log
line indicating which ${process_number} is writing to the log.

That makes it very hard to determine the source of SMP issues from a log
like this. Luckily you did provide the whole log and Squid-4 logs this
detail at the startup:

 (squid-coord-4) process 12557 started
 (squid-disk-3) process 12558 started
 (squid-2) process 12559 started
 (squid-1) process 12560 started

The Disker cannot open the configure cache_dir rock:

 Nov 23 14:55:21 px06 squid[12558]: ERROR: cannot open /cache1/rock:
(21) Is a directory

The SMP worker did not receiver any registration response from the
Disker, the cache_dir access fails. Worker aborts and enters into a loop
of constantly dying due to unresponsive Disker.

 Nov 23 14:55:28 px06 squid[12559]: ERROR: /cache1/rock communication
channel establishment timeout
 Nov 23 14:55:28 px06 squid[12559]: Not currently OK to rewrite swap log.
 Nov 23 14:55:28 px06 squid[12559]: storeDirWriteCleanLogs: Operation
aborted.
 Nov 23 14:55:28 px06 squid[12559]: FATAL: Rock cache_dir at
/cache1/rock failed to open db file: (0) No error.

So the fix is to:

1) stop Squid.

2) make sure it is fully shutdown with no residual instances or
processes running.

3) make sure the SMP /dev/shm sockets opened by Squid are fully gone.
Delete manually if necessary.

4) make sure the PID file is fully gone. Delete manually if necessary.

5) erase everything in the /cache1 directory.

5a) optionally: erase any other caches you may have.
  This will speed up the -z process, but only the cache showing errors
actually needs to be fully clean to fix this error message.

6) run "squid -z" manually and wait until it completes.

7) start Squid.

Amos