[squid-dev] [PATCH] improve CharacterSet

Kinkie gkinkie at gmail.com
Sun Dec 27 09:37:04 UTC 2015


On Sun, Dec 27, 2015 at 2:50 AM, Alex Rousskov
<rousskov at measurement-factory.com> wrote:
> On 12/26/2015 09:03 AM, Kinkie wrote:
>> Hi all,
>>   the attached patch improves CharacterSet by adding a subtraction
>> operator, equality test and associated unit tests.
>

Hi Alex, thanks for taking the time.

> Perhaps this is different in C++11, but IIRC we are supposed to:
>
> 1. declare non-modifying operators such as "-" outside the class

That's typical but not mandatory. Changing all to reflect

> 2. declare an inequality operator "!=" when declaring an equality operator

If unequality is not tested.. Still, adding it is trivial, so I'm doing that.

> For #1, you can claim code consistency as an excuse because the "+"
> operator was already misplaced, but since no existing or proposed code
> uses the new operators (AFAIK), it seems like it would be a good idea to
> fix the wrong location before (or while) adding the new [unused] ones.

Done.

>
>
>> +    if (this == &c) // identity test
>> +        return true;
>
> I would remove this check as unneeded unless you are sure that the
> comparison with self is common enough to optimize for. If you keep this,
> please s/identity test/optimization: fast comparison with self/ because
> I do not think the term "identity test" is standard [enough] to be
> recognized by many readers and googling it does not help.
>
>
>> +    auto e=chars_.cend();
>> +    for (auto i = chars_.cbegin(), j = c.chars_.cbegin(); i != e; ++i, ++j) {
>> +        if (*i != *j)
>> +            return false;
>> +    }
>
> Is the existing Storage::operator "==" so bad that we need to hand-roll
> a loop [with a critical hidden assumption] like that? If hand-rolling is
> needed, please at least add a comment that the code assumes that all
> chars_ containers have the same length.

That's ignorance on my side; I now checked that
std::vector::operator== compares elements; replacing whole method with
call to that.
The assumption is not hidden, it's an invariant enforced by all constructors.

>> +    /// remove all characters from the given CharacterSet to this one
>> +    CharacterSet &operator -=(const CharacterSet &src);
>
> Please remove the "to this one" copy-paste typo.
>
> I recommend rephrasing the description to document what happens when the
> character cannot be removed because it is not in the "this" set. For
> example:
>
> /// set subtraction: remove all characters that are also in src
>
>
> "src" is a weird name for the parameter subtraction operator argument.
> You can cite code consistency as an excuse but consider using "other" or
> perhaps something like "banned" instead.

rhs? that's fairly common for operators

>>      /// return a new CharacterSet containing the union of two sets, labeled as the first argument
>>      CharacterSet operator +(const CharacterSet &src) const;
> ...
>> +    /// return a new CharacterSet containing the set of characters in the first but not in the second
>> +    CharacterSet operator -(const CharacterSet &src) const;
>
>
> It is funny that you preserved many bad things but decided to drop
> documentation about the label. Please add it.
>
> Please note that neither the old operator "+" nor the new operator "-"
> have two [explicit] arguments [until they are moved outside the class as
> discussed in #1 above] so the comment using words "first" and "second"
> is rather confusing!

Moved outside class; renamed parameters to lhs and rhs, reworked
doxygen comment.

>> +static
>> +std::ostream& operator<< (std::ostream &s, const CharacterSet &c)
>> +{
>> +    s << "CharacterSet(" << c.name << ')';
>> +    return s;
>> +}
>> +
>
> Perhaps this should be provided in the CharacterSet header, for all to
> use? You would have to split the definition from the declaration to
> avoid dragging iostream into the header, but this operator feels useful
> to me, even in its current "primitive" form.

I've thought about it, and decided not to, as it'd pull in iostream to
all users of CharacterSet.
This is mostly meant as a convenience to allow using
CPPUNIT_ASSERT_EQUAL which is more telling that CPPUNIT_ASSERT but
requires being able to print arguments.


>
>
>> -        for (int j = 0; j < 255; ++j)
>> -            if (j != '0')
>> +        for (int j = 0; j < 255; ++j) {
>> +            if (j != '0') {
>>                  CPPUNIT_ASSERT_EQUAL(false,t[j]);
>> +            } else {
>> +                CPPUNIT_ASSERT_EQUAL(true,t[j]);
>> +            }
>> +        }
>
> If polishing this code is in scope, consider fixing the loop boundary as
> well: There is nothing so special about character with codepoint 255 to
> exclude it AFAICT.

Right.

>> +void
>> +testCharacterSet::CharacterSetSubtract()
>> +{
>> +    CharacterSet sample(NULL, "0123456789aAbBcCdDeEfFz");
>> +    sample -= CharacterSet(NULL, "z");
>> +    CPPUNIT_ASSERT_EQUAL(CharacterSet::HEXDIG, sample);
>> +}
>
>
> It would be good to have an A-B test where B has [some] characters not
> present in A.
>
> Since operator "-" operator is implemented using operator "-=" it may be
> better to test the former (if we have to pick between testing one or the
> other).

Ok.

Updated patch attached; thanks again!


-- 
    Francesco
-------------- next part --------------
A non-text attachment was scrubbed...
Name: characterset-improvements-v2.patch
Type: text/x-patch
Size: 16539 bytes
Desc: not available
URL: <http://lists.squid-cache.org/pipermail/squid-dev/attachments/20151227/439d5326/attachment-0001.bin>


More information about the squid-dev mailing list