[squid-users] Stack overflow with large IP lists

Fri Jul 28 16:24:58 UTC 2023

On 7/28/23 10:51, magri at web.de wrote:

> I did a complete rewrite of my patch to remove all recursive
> calls in SplayNode. See GitHub Pull Request #1431.

Thank you! Already reviewed :-).

> For our large number of IP addresses and clients (request/s) a different
> storage type should indeed be better suited to avoid the overhead of
> permanent tree rebalancing.

There seems to be strong consensus regarding replacing Squid splay 
trees. I hope somebody will do it in the foreseeable future, but I do 
not know of anybody working on that replacement right now.

FWIW, my _primary_ replacement motivation is not persistent rebalancing 
costs but terrible worst-case performance combined with a real 
possibility of triggering that worst-case performance, completely 
outside admin knowledge or control. Your use case with an idle Squid is 
a good illustration of that problem, but things can get much worse.

Cheers,

Alex.

> Am 26.07.23 um 18:22 schrieb Alex Rousskov:
>> On 7/26/23 11:25, magri at web.de wrote:
>>
>>> I've attached a patch that modifies the destroy method of SplayNode
>>> to avoid recursive calls of destroy in single child nodes by moving
>>> the subtrees of the child to the parent node before destroying the
>>> child non recursively.
>>
>> Hi Martin,
>>
>>      Thanks a lot for sharing your well-documented fix! Would you be
>> willing to post it as a GitHub Pull Request? If you do, then the
>> following questions can be discussed on GitHub instead:
>>
>> * The patch reduces recursion during tree destruction. Squid splay tree
>> implementation contains other recursive operations (e.g., visit()).
>> Would not those other operations suffer from the same problem if left
>> untouched? (And, if they are immune, then why not mimic their wonderful
>> code to destroy the tree as well?!)
>>
>> * The proposed patch still destroys one child recursively (when both
>> children are present). That feels like an incomplete solution that will
>> still hit stack limits on some trees. I do understand that you are
>> addressing a specific case of an idle Squid with a freshly configured
>> degenerate tree, but there ought to be other cases that lead to similar
>> results as well (accidentally or with a malicious actor help). Have you
>> considered removing recursion completely instead (by using more heap
>> memory as needed)?
>>
>> * I am curious whether your specific use case (going beyond splay tree
>> destruction) be better addressed by a different storage type than splay
>> trees. For example, have you considered whether using a IP
>> address-friendly hash would be faster for, say, one million IP addresses?
>>
>>
>> Cheers,
>>
>> Alex.
>>
>>
>> On 7/26/23 11:25, magri at web.de wrote:
>>> Hi squid community,
>>>
>>> we experience segfaults with squid 6.1 (and also older versions) during
>>> "squid -k reconfigure" on several linux systems, caused by a stack
>>> overflow.
>>>
>>> The circumstances are rather special:
>>>
>>> - we have a huge dst ip blacklist (> 300.000 enties)
>>> - this original list is sorted (by raw ip value)
>>> - the proxy is a hot standby system so there are no client requests
>>>    between start of squid and reconfigure
>>>
>>> We could trace the cause of the segfault to the destroy-method of
>>> SplayNode (in include/splay.h):
>>> - destroy of the top node is run on every reconfigure
>>> - destroy is recursive, so the function calls stack on the stack :-)
>>>    up to the depth of the splay tree
>>> - the insertion of an ordered ip list into the splay tree creates a
>>>    degenerate splay tree (each node has only a left child) with a depth
>>>    of the number of ip entries in the list
>>> - due to the inactivity of the proxy there is no rebalancing of the
>>>    tree through find calls
>>>
>>> On amd64 every stack frame needs 32 byte. So the regular linux stack of
>>> 8 MB can hold about ~262.000 stack frames. On other architectures
>>> the limit is reached even faster (s390x: 160 bytes per stack frame ->
>>> 52.000 frames).
>>>
>>> We tried to increase the stack size via ulimits and systemd, but though
>>> the squid master process got the increased limits, the kid-processes are
>>> always spawned with a soft limit of 8 MB and crash on reaching it.
>>> Only forcing a new soft limit via prlimit on an already running kid
>>> could avoid the crash.
>>>
>>> I've attached a patch that modifies the destroy method of SplayNode to
>>> avoid recursive calls of destroy in single child nodes by moving the
>>> subtrees of the child to the parent node before destroying the child non
>>> recursively.
>>> With this patch we could no longer reproduce the segfaults, even with a
>>> list of 10.000.000 ordered ips.
>>>
>>> Any ideas, comments, better solutions?
>>>
>>> Thanks in advance for any advice.
>>>
>>> best regards
>>>   Martin
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> squid-users mailing list
>>> squid-users at lists.squid-cache.org
>>> http://lists.squid-cache.org/listinfo/squid-users
>>
>> _______________________________________________
>> squid-users mailing list
>> squid-users at lists.squid-cache.org
>> http://lists.squid-cache.org/listinfo/squid-users
> _______________________________________________
> squid-users mailing list
> squid-users at lists.squid-cache.org
> http://lists.squid-cache.org/listinfo/squid-users