[squid-dev] [PATCH] Rework acl/RegexData optimization to use SBufList

Kinkie gkinkie at gmail.com
Mon Nov 14 11:04:21 UTC 2016


On Mon, Nov 14, 2016 at 5:59 AM, Amos Jeffries <squid3 at treenet.co.nz> wrote:
> On 14/11/2016 6:30 p.m., Alex Rousskov wrote:
>> On 11/13/2016 05:11 PM, Kinkie wrote:
>>
>>> the attached patch moves away from hand-rolling a c-string onto
>>> joining a SBufList for optimizing regexes in RegexData.cc.
>>
>>> You can find attached as a test case the output of squidclient
>>> mgr:config taken on trunk and on the submitted code. It is slightly
>>> different in that the new code allows for a longer optimized regex.
>>
>> I see the following before/after difference:
>>
>>   -acl bigacl dstdom_regex ... (pattern84) (pattern86) ...
>>   +acl bigacl dstdom_regex ... (pattern84)|(pattern85) ...
>>
>> Does changing space into "|" result in one internal regex instead of
>> two?
>
> It does / should.
>
>> And that is the optimization you have implemented?
>
> No, that was implemented long ago by Marcus. We got a ~20% speed
> increase from that. More complexity in the individual regex test seemed
> to counteract the fewer tests done.
>
>
>> How is that
>> related to SBufs (i.e., why could not old c-strings accommodate longer
>> regexes)?
>
> FYI (to kinkie): The 100 patterns per aggregation was an arbitrary
> limit, but the total string length is limited by the ConfigParser buffers.
>
> The plain-text length of the aggregated regex string needs to fit within
> one squid.conf line length. With space for the directive name, label,
> type and flags being included in the limit.

that's BUFSIZ, or 4k. We currently have 1K + separator, name and flags.
If anything, we could be more aggressive in having longer regexes,
modulo OOM in regcomp.

>> And why is there no pattern85 in the "before" test?!
>>
>>
>>> The code is expected to cause a small performance regression on
>>> parse and reconfigure due to extra data copies. The regression is
>>> negligible: the attached test cases perform "squid -k parse" in 60
>>> msec in trunk and 62 on the branch on a warm cache on my i5 macbook
>>> air.
>>
>> A 3% performance regression is not necessarily negligible. Have you
>> tested with more ACL lines (not larger individual ACL lines) that take a
>> few seconds to load (as opposed to 60ms) ? If not, please do unless that
>> test would be irrelevant somehow.
>>
>> And what is the expected performance improvement from having fewer
>> longer regexes?
>
> I assume you mean performance of normal request processing through the
> regex ACL. Agreed, that need to be checked/compared to current speeds.

Time needed for squid -k parse on my Mac 1.3GHz i5 macbook air, for a
1M-entry random ACL, 13Mb-big:
new: 20.8s
trunk: 18.3s

Hopefully in time we'll move the config parser to SBuf..

-- 
    Francesco


More information about the squid-dev mailing list