[squid-dev] Sad performance trend

Alex Rousskov rousskov at measurement-factory.com
Mon Sep 12 17:36:31 UTC 2016


On 09/12/2016 09:38 AM, Amos Jeffries wrote:
> On 7/09/2016 5:43 a.m., Alex Rousskov wrote:
>> On 09/06/2016 08:27 AM, Amos Jeffries wrote:
>>> On 27/08/2016 12:32 p.m., Alex Rousskov wrote:
>>>>         W1  W2  W3  W4  W5  W6
>>>>   v3.1  32% 38% 16% 48% 16+ 9%
>>>>   v3.3  23% 31% 14% 42% 15% 8%
>>>>   v3.5  11% 16% 12% 36%  7% 6%
>>>>   v4.0  11% 15%  9% 30% 14% 5%

> Since the test was a bit unreliable I ran it freshly against each branch
> that would build when I wanted to check progress.
> The last test run can be found in parserng-polygraph if you want to dig
> into the logs for other measures.

Sorry, I do not know how to get to "parserng-polygraph". All Jenkins
results published at http://build.squid-cache.org/ appear to be too old,
but perhaps I am looking in the wrong place. It is probably not
important for this specific discussion though.


> branch    : Mean RPS
> --------------------
> squid-3.2 : 1933.98
> squid-3.3 : 1932.81
> squid-3.4 : 1931.12
> squid-3.5 : 1926.13

Looks like performance may be gradually getting worse in this macro test
as well, but the decrease is not as visible/pronounced as in micro tests
(which is understandable/expected, of course).


> The fluctuation / error bars seemed to be about 1 RPS for that polygraph
> workload.

Please correct me if I am misinterpreting what you are saying, but to me
it sounds like "The results are not getting gradually worse because each
result has a ~1 rps precision". That conclusion does not compute for me
for two independent reasons:

1. There is a significant difference between results fluctuation and a
trend. Individual test results fluctuation may co-exist with a
real/meaningful downward trend. Whether that happens here, I do not
know, but your comment appears to imply that you are dismissing the
visible trend for the wrong/inapplicable reason.

2. Even if 1 unit difference is insignificant, the results show that the
performance got worse by 7+ units (~1934 vs ~1926).

BTW, I would caution against thinking of these numbers as RPS. That test
is not designed to predict sustained request rates. A 1 unit difference
in these results may correspond to ~0% or, say, ~10% difference in
actual sustained request rates.

Alex.



More information about the squid-dev mailing list