[squid-dev] [RFC] CRUSH peer selection method

Wed Apr 19 12:51:40 UTC 2017

On 04/18/2017 01:00 AM, Loic Dachary wrote:
> 
> 
> On 04/17/2017 09:10 PM, Loic Dachary wrote:
>>
>>
>> On 04/17/2017 08:34 PM, Alex Rousskov wrote:
>>> On 04/17/2017 10:53 AM, Loic Dachary wrote:
>>>> On 04/17/2017 06:28 PM, Alex Rousskov wrote:
>>>>> On 04/17/2017 09:08 AM, Loic Dachary wrote:
>>>>>
>>>>>>   peer 1 targeted by a set of requests (X1)
>>>>>>   peer 2 targeted by a set of requests (X2)
>>>>>
>>>>>> and we add a new peer, CRUSH makes it so 1/3 of the requests from peer 1 and 2 move to peer 3
>>>>>
>>>>>>   peer 1 targeted by a set of requests (2/3 of X1)
>>>>>>   peer 2 targeted by a set of requests (2/3 of X2)
>>>>>>   peer 3 targeted by a set of requests (1/3 of X1 + 1/3 of X2)
>>>>>
>>>>> How would the last three lines look for CARP and sourcehash?
>>>
>>>
>>>> I'm not sure :-) But I could write a simulation to figure it out.
>>>> Unless I missed a part of src/carp.cc that mitigates the problem, the
>>>> situation should not be very different from what is described at
>>>> https://docs.openstack.org/developer/swift/ring_background.html#part-2
>>>
>>> It is totally up to you whether to do anything else. FWIW, AFAICT:
>>>
>>> * only math (not simulation) is needed to show that any "peer =
>>> request_hash % n" mapping algorithm (where "n" is the number of peers)
>>> may result in a lot of "moved" request:peer assignments;
>>>
>>> * many folks do not know whether CRUSH, CARP and sourcehash use a "peer
>>> = request_hash % n" mapping algorithm (you should say that explicitly);
>>>
>>> * it would be both interesting and useful to know how the amount of
>>> "movement" changes with "n" in all compared algorithms (e.g., should I
>>> bother with CRUSH if I have 10 peers?);
>>>
>>> * do not assume that we know that a 1/3 movement shown by CRUSH (for
>>> n=2) is a lot less than some X shown by CARP (for the same "n");
>>>
>>> * disclose the price of the reduction of movement (e.g., "a few RAM
>>> bytes per peer to store the more complex mapping and a few CPU cycles to
>>> find the right peer for each request").
>>>
>>> Perhaps your article already addresses all these concerns, but if you
>>> want others to be enthusiastic about CRUSH, it may help to distill its
>>> advantages and costs to something one can grasp by reading a
>>> two-paragraph email with an easy-to-grok table.
>>
>> That's what I should have done to begin with. Explaining with maths would be complicated, I'll write some code and run a simulation, it will be easier to understand.
> 
> It turns out I misunderstood how CARP works ( https://tools.ietf.org/html/draft-vinod-carp-v1-03 ) and it performs as well as CRUSH. 
> 
> After implementing something reasonably similar at http://libcrush.org/dachary/python-crush/commit/fced5135562da09eaf597ac7de74712022d48136 and running a simulation adding two new peers, here is what I get.
> 
> The rows below show the number of requests moved from the given peer to each peer named in the columns. The requests% at the end of the rows shows shows the percentage of the total number of requests that is moved away from this particular peer. The last row shows the percentage of the total number of requests that is moved to the peer named in the column.
> 
> With CRUSH:
> 
> crush compare --rule data --replication-count 1 --origin tests/peers-crushmap-before.json --destination tests/peers-crushmap-after.json
> 
>           peer9  peera requests%
> peer0       970   1090    2.06%
> peer1      1012    997    2.01%
> peer2      1025   1012    2.04%
> peer3      1004   1071    2.08%
> peer4       983   1025    2.01%
> peer5      1008   1033    2.04%
> peer6       951    965    1.92%
> peer7      1006   1052    2.06%
> peer8      1001    994    1.99%
> request%  8.96%  9.24%   18.20%
> 
> With CARP:
> 
> crush compare --carp --origin tests/peers-crushmap-before.json --destination tests/peers-crushmap-after.json
> 
>           peer9   peera requests%
> peer4      6311       0    6.31%
> peer8         0   12340   12.34%
> request%  6.31%  12.34%   18.65%
> 

After taking a closer look, I found one use case where CRUSH behaves significantly better than CARP. When you have 10 peers (peer0 to peer9), each with weight 1 and you want to increase the weight of one peer (say peer4) to 2. With CARP ~69% of the requests will move from one peer to another (because changing the weight of one peer changes the load_multiplier of all peers). With CRUSH ~8% of the requests will move.

When increasing the weight of peer4 from 1 to 2:

With CRUSH[1]

moved 8.11% of the requests

          peer4 objects%
peer0       896    0.90%
peer1       858    0.86%
peer2       978    0.98%
peer3       913    0.91%
peer5       911    0.91%
peer6       897    0.90%
peer7       910    0.91%
peer8       898    0.90%
peer9       846    0.85%
objects%  8.11%    8.11%

With CARP[2]

moved 69% of the requests

        peer0   peer1   peer2   peer3   peer4   peer5   peer6   peer7   peer8   peer9 
peer0       0       0       0       0    1424     562     643     594     629     571 
peer1       0       0       0       0    1504     592     548     605     574     584 
peer2       0       0       0       0    1449     558     590     663     637     563 
peer3       0       0       0       0    1418     607     604     610     591     606 
peer4     634     616     566     596       0     773    1178    1202    1179    1182 
peer5     619     609     556     586    1956       0     992    1213    1063    1193 
peer6     672     637     641     592    2031    1241       0     538    1224    1299 
peer7     542     589     603     586    1938    1095    1111       0     938    1182 
peer8     595     634     611     592    1999    1222    1173    1210       0     780 
peer9     510     496     526     528    2515    1103    1001    1086    1032       0 

I think CRUSH and CARP have similar CPU / RAM requirements. CRUSH needs a copy of the peers in a map, in an array of int and a temporary space that is roughly the same size. For N peers this would be in the range of 2KB + N * 4 * 2. It also needs a ~4KB static table to calculate an approximation of log2 without using floating point. For each request hashed, CRUSH will calculate the score based on the weight of the peer. This requires more CPU than a single multiplication which is what CARP does. CRUSH does not do any floating point calculation which saves CPU.

Cheers

[1] CRUSH simulation http://libcrush.org/dachary/python-crush/commits/wip-squid
[2] CARP simulation http://libcrush.org/dachary/python-crush/blob/e7579b28f4288525750e94fe9bdc4bf985cb5d1f/carp.cc#L371
-- 
Loïc Dachary, Artisan Logiciel Libre