[squid-users] Stack overflow with large IP lists
Alex Rousskov
rousskov at measurement-factory.com
Wed Jul 26 16:22:21 UTC 2023
On 7/26/23 11:25, magri at web.de wrote:
> I've attached a patch that modifies the destroy method of SplayNode
> to avoid recursive calls of destroy in single child nodes by moving
> the subtrees of the child to the parent node before destroying the
> child non recursively.
Hi Martin,
Thanks a lot for sharing your well-documented fix! Would you be
willing to post it as a GitHub Pull Request? If you do, then the
following questions can be discussed on GitHub instead:
* The patch reduces recursion during tree destruction. Squid splay tree
implementation contains other recursive operations (e.g., visit()).
Would not those other operations suffer from the same problem if left
untouched? (And, if they are immune, then why not mimic their wonderful
code to destroy the tree as well?!)
* The proposed patch still destroys one child recursively (when both
children are present). That feels like an incomplete solution that will
still hit stack limits on some trees. I do understand that you are
addressing a specific case of an idle Squid with a freshly configured
degenerate tree, but there ought to be other cases that lead to similar
results as well (accidentally or with a malicious actor help). Have you
considered removing recursion completely instead (by using more heap
memory as needed)?
* I am curious whether your specific use case (going beyond splay tree
destruction) be better addressed by a different storage type than splay
trees. For example, have you considered whether using a IP
address-friendly hash would be faster for, say, one million IP addresses?
Cheers,
Alex.
On 7/26/23 11:25, magri at web.de wrote:
> Hi squid community,
>
> we experience segfaults with squid 6.1 (and also older versions) during
> "squid -k reconfigure" on several linux systems, caused by a stack
> overflow.
>
> The circumstances are rather special:
>
> - we have a huge dst ip blacklist (> 300.000 enties)
> - this original list is sorted (by raw ip value)
> - the proxy is a hot standby system so there are no client requests
> between start of squid and reconfigure
>
> We could trace the cause of the segfault to the destroy-method of
> SplayNode (in include/splay.h):
> - destroy of the top node is run on every reconfigure
> - destroy is recursive, so the function calls stack on the stack :-)
> up to the depth of the splay tree
> - the insertion of an ordered ip list into the splay tree creates a
> degenerate splay tree (each node has only a left child) with a depth
> of the number of ip entries in the list
> - due to the inactivity of the proxy there is no rebalancing of the
> tree through find calls
>
> On amd64 every stack frame needs 32 byte. So the regular linux stack of
> 8 MB can hold about ~262.000 stack frames. On other architectures
> the limit is reached even faster (s390x: 160 bytes per stack frame ->
> 52.000 frames).
>
> We tried to increase the stack size via ulimits and systemd, but though
> the squid master process got the increased limits, the kid-processes are
> always spawned with a soft limit of 8 MB and crash on reaching it.
> Only forcing a new soft limit via prlimit on an already running kid
> could avoid the crash.
>
> I've attached a patch that modifies the destroy method of SplayNode to
> avoid recursive calls of destroy in single child nodes by moving the
> subtrees of the child to the parent node before destroying the child non
> recursively.
> With this patch we could no longer reproduce the segfaults, even with a
> list of 10.000.000 ordered ips.
>
> Any ideas, comments, better solutions?
>
> Thanks in advance for any advice.
>
> best regards
> Martin
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> squid-users mailing list
> squid-users at lists.squid-cache.org
> http://lists.squid-cache.org/listinfo/squid-users
More information about the squid-users
mailing list