[squid-dev] Fwd: [PATCH] for loops modernization

Thu Mar 16 05:34:33 UTC 2017

On 16/03/2017 3:43 a.m., Adam Majer wrote:
> On 03/15/2017 03:17 PM, Amos Jeffries wrote:
>> Theoretically range-for loops should allow multi-threaded CPU to run
>> those loops a bit faster. If that can be demonstrated using a tool like
>> polygraph you have a good argument for a patch containing that change to
>> go in as a pure performance change.
> 
> No. That would break many many things. There are special directives that
> allow this to happen with things like OpenMPI compilers, but that's not
> what we are talking about here.
> 
> And theoretically, if you blindly allow compilers to optimize loops like
> that, you are just as likely to introduce hardware stalls that will
> result in slower execution of the overall loop. The only way to look at
> these,
> 
>     for (TYPE _i : _c )
> 
> is syntactic sugar.
> 
> 
> Best regards,
> - Adam
> 
> PS. And if you are talking about vertorization of these loops, that
> already happens with regular loops. See,
> 
>     https://gcc.gnu.org/projects/tree-ssa/vectorization.html

I mean tricks like compiler with CPU-specific knowledge being able to
emit assembly that helps pre-fetch the address pointers for all objects
in the container, and/or if it can prove the objects are read-only can
have hyper-threads pre-load the container contents into L1/L2 cache in
time for the main thread to run the business logic faster without much
loading delays.

AFAIK the range-for does allow certain code flow guarantees (like
full-length container iteration) being known without any analysis. So
not completely syntactic sugar. Yes compiler could do the same with
traditional loops, but only after extra analysis which might be turned off.

I'm not sure if thus would have any visible effect at all. We might be
unlucky in that these things are not supportable, or the data sizes
Squid handles blow the benefits away. Thus the request for proof.

Amos