[squid-dev] [PATCH] Rework acl/RegexData optimization to use SBufList
Kinkie
gkinkie at gmail.com
Mon Nov 14 16:34:03 UTC 2016
Hi,
the attached patch fixes the issue with missing ACL entries
(verified) and streamlines the code a bit.
Performance-wise it improves a bit, parsing the same 1M-entry ACL in
19.4 seconds (17.8 seconds in userland).
On Mon, Nov 14, 2016 at 11:04 AM, Kinkie <gkinkie at gmail.com> wrote:
> On Mon, Nov 14, 2016 at 5:59 AM, Amos Jeffries <squid3 at treenet.co.nz> wrote:
>> On 14/11/2016 6:30 p.m., Alex Rousskov wrote:
>>> On 11/13/2016 05:11 PM, Kinkie wrote:
>>>
>>>> the attached patch moves away from hand-rolling a c-string onto
>>>> joining a SBufList for optimizing regexes in RegexData.cc.
>>>
>>>> You can find attached as a test case the output of squidclient
>>>> mgr:config taken on trunk and on the submitted code. It is slightly
>>>> different in that the new code allows for a longer optimized regex.
>>>
>>> I see the following before/after difference:
>>>
>>> -acl bigacl dstdom_regex ... (pattern84) (pattern86) ...
>>> +acl bigacl dstdom_regex ... (pattern84)|(pattern85) ...
>>>
>>> Does changing space into "|" result in one internal regex instead of
>>> two?
>>
>> It does / should.
>>
>>> And that is the optimization you have implemented?
>>
>> No, that was implemented long ago by Marcus. We got a ~20% speed
>> increase from that. More complexity in the individual regex test seemed
>> to counteract the fewer tests done.
>>
>>
>>> How is that
>>> related to SBufs (i.e., why could not old c-strings accommodate longer
>>> regexes)?
>>
>> FYI (to kinkie): The 100 patterns per aggregation was an arbitrary
>> limit, but the total string length is limited by the ConfigParser buffers.
>>
>> The plain-text length of the aggregated regex string needs to fit within
>> one squid.conf line length. With space for the directive name, label,
>> type and flags being included in the limit.
>
> that's BUFSIZ, or 4k. We currently have 1K + separator, name and flags.
> If anything, we could be more aggressive in having longer regexes,
> modulo OOM in regcomp.
>
>>> And why is there no pattern85 in the "before" test?!
>>>
>>>
>>>> The code is expected to cause a small performance regression on
>>>> parse and reconfigure due to extra data copies. The regression is
>>>> negligible: the attached test cases perform "squid -k parse" in 60
>>>> msec in trunk and 62 on the branch on a warm cache on my i5 macbook
>>>> air.
>>>
>>> A 3% performance regression is not necessarily negligible. Have you
>>> tested with more ACL lines (not larger individual ACL lines) that take a
>>> few seconds to load (as opposed to 60ms) ? If not, please do unless that
>>> test would be irrelevant somehow.
>>>
>>> And what is the expected performance improvement from having fewer
>>> longer regexes?
>>
>> I assume you mean performance of normal request processing through the
>> regex ACL. Agreed, that need to be checked/compared to current speeds.
>
> Time needed for squid -k parse on my Mac 1.3GHz i5 macbook air, for a
> 1M-entry random ACL, 13Mb-big:
> new: 20.8s
> trunk: 18.3s
>
> Hopefully in time we'll move the config parser to SBuf..
>
> --
> Francesco
--
Francesco
-------------- next part --------------
A non-text attachment was scrubbed...
Name: wordlist-sbuflist-aclregexdata-v2.patch
Type: application/octet-stream
Size: 8718 bytes
Desc: not available
URL: <http://lists.squid-cache.org/pipermail/squid-dev/attachments/20161114/d23e1805/attachment.obj>
More information about the squid-dev
mailing list