[squid-dev] [RFC] CRUSH peer selection method
Loic Dachary
loic at dachary.org
Wed Apr 19 12:51:40 UTC 2017
On 04/18/2017 01:00 AM, Loic Dachary wrote:
>
>
> On 04/17/2017 09:10 PM, Loic Dachary wrote:
>>
>>
>> On 04/17/2017 08:34 PM, Alex Rousskov wrote:
>>> On 04/17/2017 10:53 AM, Loic Dachary wrote:
>>>> On 04/17/2017 06:28 PM, Alex Rousskov wrote:
>>>>> On 04/17/2017 09:08 AM, Loic Dachary wrote:
>>>>>
>>>>>> peer 1 targeted by a set of requests (X1)
>>>>>> peer 2 targeted by a set of requests (X2)
>>>>>
>>>>>> and we add a new peer, CRUSH makes it so 1/3 of the requests from peer 1 and 2 move to peer 3
>>>>>
>>>>>> peer 1 targeted by a set of requests (2/3 of X1)
>>>>>> peer 2 targeted by a set of requests (2/3 of X2)
>>>>>> peer 3 targeted by a set of requests (1/3 of X1 + 1/3 of X2)
>>>>>
>>>>> How would the last three lines look for CARP and sourcehash?
>>>
>>>
>>>> I'm not sure :-) But I could write a simulation to figure it out.
>>>> Unless I missed a part of src/carp.cc that mitigates the problem, the
>>>> situation should not be very different from what is described at
>>>> https://docs.openstack.org/developer/swift/ring_background.html#part-2
>>>
>>> It is totally up to you whether to do anything else. FWIW, AFAICT:
>>>
>>> * only math (not simulation) is needed to show that any "peer =
>>> request_hash % n" mapping algorithm (where "n" is the number of peers)
>>> may result in a lot of "moved" request:peer assignments;
>>>
>>> * many folks do not know whether CRUSH, CARP and sourcehash use a "peer
>>> = request_hash % n" mapping algorithm (you should say that explicitly);
>>>
>>> * it would be both interesting and useful to know how the amount of
>>> "movement" changes with "n" in all compared algorithms (e.g., should I
>>> bother with CRUSH if I have 10 peers?);
>>>
>>> * do not assume that we know that a 1/3 movement shown by CRUSH (for
>>> n=2) is a lot less than some X shown by CARP (for the same "n");
>>>
>>> * disclose the price of the reduction of movement (e.g., "a few RAM
>>> bytes per peer to store the more complex mapping and a few CPU cycles to
>>> find the right peer for each request").
>>>
>>> Perhaps your article already addresses all these concerns, but if you
>>> want others to be enthusiastic about CRUSH, it may help to distill its
>>> advantages and costs to something one can grasp by reading a
>>> two-paragraph email with an easy-to-grok table.
>>
>> That's what I should have done to begin with. Explaining with maths would be complicated, I'll write some code and run a simulation, it will be easier to understand.
>
> It turns out I misunderstood how CARP works ( https://tools.ietf.org/html/draft-vinod-carp-v1-03 ) and it performs as well as CRUSH.
>
> After implementing something reasonably similar at http://libcrush.org/dachary/python-crush/commit/fced5135562da09eaf597ac7de74712022d48136 and running a simulation adding two new peers, here is what I get.
>
> The rows below show the number of requests moved from the given peer to each peer named in the columns. The requests% at the end of the rows shows shows the percentage of the total number of requests that is moved away from this particular peer. The last row shows the percentage of the total number of requests that is moved to the peer named in the column.
>
> With CRUSH:
>
> crush compare --rule data --replication-count 1 --origin tests/peers-crushmap-before.json --destination tests/peers-crushmap-after.json
>
> peer9 peera requests%
> peer0 970 1090 2.06%
> peer1 1012 997 2.01%
> peer2 1025 1012 2.04%
> peer3 1004 1071 2.08%
> peer4 983 1025 2.01%
> peer5 1008 1033 2.04%
> peer6 951 965 1.92%
> peer7 1006 1052 2.06%
> peer8 1001 994 1.99%
> request% 8.96% 9.24% 18.20%
>
> With CARP:
>
> crush compare --carp --origin tests/peers-crushmap-before.json --destination tests/peers-crushmap-after.json
>
> peer9 peera requests%
> peer4 6311 0 6.31%
> peer8 0 12340 12.34%
> request% 6.31% 12.34% 18.65%
>
After taking a closer look, I found one use case where CRUSH behaves significantly better than CARP. When you have 10 peers (peer0 to peer9), each with weight 1 and you want to increase the weight of one peer (say peer4) to 2. With CARP ~69% of the requests will move from one peer to another (because changing the weight of one peer changes the load_multiplier of all peers). With CRUSH ~8% of the requests will move.
When increasing the weight of peer4 from 1 to 2:
With CRUSH[1]
moved 8.11% of the requests
peer4 objects%
peer0 896 0.90%
peer1 858 0.86%
peer2 978 0.98%
peer3 913 0.91%
peer5 911 0.91%
peer6 897 0.90%
peer7 910 0.91%
peer8 898 0.90%
peer9 846 0.85%
objects% 8.11% 8.11%
With CARP[2]
moved 69% of the requests
peer0 peer1 peer2 peer3 peer4 peer5 peer6 peer7 peer8 peer9
peer0 0 0 0 0 1424 562 643 594 629 571
peer1 0 0 0 0 1504 592 548 605 574 584
peer2 0 0 0 0 1449 558 590 663 637 563
peer3 0 0 0 0 1418 607 604 610 591 606
peer4 634 616 566 596 0 773 1178 1202 1179 1182
peer5 619 609 556 586 1956 0 992 1213 1063 1193
peer6 672 637 641 592 2031 1241 0 538 1224 1299
peer7 542 589 603 586 1938 1095 1111 0 938 1182
peer8 595 634 611 592 1999 1222 1173 1210 0 780
peer9 510 496 526 528 2515 1103 1001 1086 1032 0
I think CRUSH and CARP have similar CPU / RAM requirements. CRUSH needs a copy of the peers in a map, in an array of int and a temporary space that is roughly the same size. For N peers this would be in the range of 2KB + N * 4 * 2. It also needs a ~4KB static table to calculate an approximation of log2 without using floating point. For each request hashed, CRUSH will calculate the score based on the weight of the peer. This requires more CPU than a single multiplication which is what CARP does. CRUSH does not do any floating point calculation which saves CPU.
Cheers
[1] CRUSH simulation http://libcrush.org/dachary/python-crush/commits/wip-squid
[2] CARP simulation http://libcrush.org/dachary/python-crush/blob/e7579b28f4288525750e94fe9bdc4bf985cb5d1f/carp.cc#L371
--
Loïc Dachary, Artisan Logiciel Libre
More information about the squid-dev
mailing list