Friday, July 24, 2015

Efficient counting with lodash

Counting with lodash is easy. A lot of the time, an array is returned, so I just need to read the length property. If I'm working with a chain, I can simply use the size() function. What I like about size() is that it works with anything — strings, arrays, and objects. However, using length or size() isn't always the most efficient answer.

Consider the following pattern:

_(hugeCollection)
    .filter({ enabled: true })
    .size();

Here we have a rather large collection, and we're using the filter() function to retrieve only the items we need. This builds a new collection, and we call size() on it. Apart from working with many types of values, size() is kind of limiting here. Why should I have to build a new collection I never use? It's the count we're interested in, and nothing more.

Let's see if we can fix this situation with a mixin:

_.mixin({
    count: function(collection, predicate) {
        if (!predicate) {
            return collection.length;   
        }
        
        var callback = _.callback(predicate);

        return _.reduce(collection, function(result, item) {
            return callback(item) ? result + 1 : result;
        }, 0); 
    }
}, { chain: false });

This is a simple count() mixin, which does essentially the same thing as size(), it's how it does it that's different. The idea is to reduce the collection to a count. It does by calling reduce(), and incrementing the count each item the predicate passes. It's actually following the filter pattern, only when a match is found, the count is incremented instead of adding the item to the new collection.

Let's see how this thing works, shall we?

var collection = [ 1, 2, 3 ];
_.count(collection);
// → 3

We're just getting the length of the collection here, since there's no filtering happening. We could just read collection.length instead, and that's actually what our mixin is doing. When there's no predicate provided, it just returns the length of the collection, and avoids iterating over it.

Let's try adding some basic filtering capabilities.

_.count(collection, function(item) {
    return item > 1;
});
// → 2

_.count(collection, function(item) {
    return item < 3;
});
// → 2

Now we're getting somewhere. Despite this being a trivially-small collection, we can see that we didn't have to create a new collection of length 2, only to use the length property. This optimization matters when we're working with large collections, because it avoids allocating large amounts of memory and the subsequent CPU cycles to garbage collect it.

Let's try a different kind of filter.

_.count(collection, _.partial(_.isEqual, 2));

Here we're aiming for some more specific criteria — the item must be exactly 2. But do we really need to make our own callback function for that, or can we use a short-hand?

var collection = [
    { enabled: true },
    { enabled: false },
    { enabled: true },
    { enabled: false }
];

_.count(collection, 'enabled');
// → 2

_.count(collection, { enabled: false });
// → 2

There's two shorthands. The first looks for items with a thruthy enabled property. The second shorthand looks for items where the enabled property is false. The reason these shorthands are possible is because our mixin is using the callback() function to turn the passed-in predicate into a callable function, even when strings or objects are passed.

No comments :

Post a Comment