[squid-users] Stack overflow with large IP lists

magri at web.de magri at web.de
Fri Jul 28 14:51:02 UTC 2023


Hi Alex,

visit and walk should share the same problem, but they don't seem to be
used for acls, except for dumping.

Regardless I did a complete rewrite of my patch to remove all recursive
calls in SplayNode. See GitHub Pull Request #1431.

For our large number of IP addresses and clients (request/s) a different
storage type should indeed be better suited to avoid the overhead of
permanent tree rebalancing.


Cheers,
Martin

Am 26.07.23 um 18:22 schrieb Alex Rousskov:
> On 7/26/23 11:25, magri at web.de wrote:
>
>> I've attached a patch that modifies the destroy method of SplayNode
>> to avoid recursive calls of destroy in single child nodes by moving
>> the subtrees of the child to the parent node before destroying the
>> child non recursively.
>
> Hi Martin,
>
>      Thanks a lot for sharing your well-documented fix! Would you be
> willing to post it as a GitHub Pull Request? If you do, then the
> following questions can be discussed on GitHub instead:
>
> * The patch reduces recursion during tree destruction. Squid splay tree
> implementation contains other recursive operations (e.g., visit()).
> Would not those other operations suffer from the same problem if left
> untouched? (And, if they are immune, then why not mimic their wonderful
> code to destroy the tree as well?!)
>
> * The proposed patch still destroys one child recursively (when both
> children are present). That feels like an incomplete solution that will
> still hit stack limits on some trees. I do understand that you are
> addressing a specific case of an idle Squid with a freshly configured
> degenerate tree, but there ought to be other cases that lead to similar
> results as well (accidentally or with a malicious actor help). Have you
> considered removing recursion completely instead (by using more heap
> memory as needed)?
>
> * I am curious whether your specific use case (going beyond splay tree
> destruction) be better addressed by a different storage type than splay
> trees. For example, have you considered whether using a IP
> address-friendly hash would be faster for, say, one million IP addresses?
>
>
> Cheers,
>
> Alex.
>
>
> On 7/26/23 11:25, magri at web.de wrote:
>> Hi squid community,
>>
>> we experience segfaults with squid 6.1 (and also older versions) during
>> "squid -k reconfigure" on several linux systems, caused by a stack
>> overflow.
>>
>> The circumstances are rather special:
>>
>> - we have a huge dst ip blacklist (> 300.000 enties)
>> - this original list is sorted (by raw ip value)
>> - the proxy is a hot standby system so there are no client requests
>>    between start of squid and reconfigure
>>
>> We could trace the cause of the segfault to the destroy-method of
>> SplayNode (in include/splay.h):
>> - destroy of the top node is run on every reconfigure
>> - destroy is recursive, so the function calls stack on the stack :-)
>>    up to the depth of the splay tree
>> - the insertion of an ordered ip list into the splay tree creates a
>>    degenerate splay tree (each node has only a left child) with a depth
>>    of the number of ip entries in the list
>> - due to the inactivity of the proxy there is no rebalancing of the
>>    tree through find calls
>>
>> On amd64 every stack frame needs 32 byte. So the regular linux stack of
>> 8 MB can hold about ~262.000 stack frames. On other architectures
>> the limit is reached even faster (s390x: 160 bytes per stack frame ->
>> 52.000 frames).
>>
>> We tried to increase the stack size via ulimits and systemd, but though
>> the squid master process got the increased limits, the kid-processes are
>> always spawned with a soft limit of 8 MB and crash on reaching it.
>> Only forcing a new soft limit via prlimit on an already running kid
>> could avoid the crash.
>>
>> I've attached a patch that modifies the destroy method of SplayNode to
>> avoid recursive calls of destroy in single child nodes by moving the
>> subtrees of the child to the parent node before destroying the child non
>> recursively.
>> With this patch we could no longer reproduce the segfaults, even with a
>> list of 10.000.000 ordered ips.
>>
>> Any ideas, comments, better solutions?
>>
>> Thanks in advance for any advice.
>>
>> best regards
>>   Martin
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> squid-users mailing list
>> squid-users at lists.squid-cache.org
>> http://lists.squid-cache.org/listinfo/squid-users
>
> _______________________________________________
> squid-users mailing list
> squid-users at lists.squid-cache.org
> http://lists.squid-cache.org/listinfo/squid-users


More information about the squid-users mailing list