Friday, August 10, 2012

Python Coroutines and Counters

The collections module, since Python 2.4, includes a Counter type. I wondered, when I first saw this class, how useful could a counter possibly be? It turns out, very useful. You can do a lot with counters because of their flexibility in terms of what they can count. For instance, you can pass a counter a list of strings, a dictionary who's values represent counts, or strings, where each unique character is counted.

The documentation on the Counter class has some great examples showing off it's capabilities. I thought I would share my experience in taking a counter and using it in conjunction with coroutines. Dave Beazly has an excellent introduction to coroutines in Python. Including the coroutine decorator. You can do some interesting things with counters and coroutines own their own, but I figured I would try and combine the two.

What I came up with was a coroutine for feeding words into a counter. Another coroutine gets the instance of the counter every time it is updated with new words, and displays the most common word. Quite simple, but it accomplishes a lot given the amount of code.

"""
Simple word counter coroutines using the Counter 
class.
"""

import sys, re, time
from random import choice
from collections import Counter

def coroutine(func):

    """
    David Beazly's decorator to make a function a 
    coroutine.
    """

    def start(*args,**kwargs):

        cr = func(*args,**kwargs)
        cr.next()
        return cr

    return start

@coroutine
def word_counter(target):

    """
    The word_counter coroutine takes a target coroutine 
    to send the counter to when it gets updated.  When 
    this coroutine receives data, the counter gets updated 
    with a new list of words.  We then send the target 
    coroutine the counter instance.
    """
    
    counter = Counter()

    while True:

        data = (yield)

        counter.update(re.findall('\w+', data))

        target.send(counter)

@coroutine
def most_common():

    """
    The most_common coroutine receives counter instances 
    when the're updated, and prints the most common item, 
    along with it's count value.
    """

    while True:

        counter = (yield)

        word, count = counter.most_common(1)[0]

        sys.stdout.write(
            'Most Common: "%s" (%d)      \r' %\
            (word, count)
        )
        sys.stdout.flush()

# Main Demo

if __name__ == '__main__':

    # Static words used to generate text and feed the 
    # word_counter coroutine.
    words = (
        'Ada',
        'Bash',
        'C',
        'Delphi',
        'Erlang',
        'Fortran',
        'Groovy',
        'Haskell',
        'Java',
        'Lisp',
        'Python',
        'Ruby',
        'Smalltalk',
        'Tcl',
        'VisualBasic'
    )

    # Create our word counter, passing in the most_common 
    # coroutine as the target.
    counter = word_counter(
        most_common()
    )

    # Feed the coroutine for a while.  Generate a line of 
    # text, and send it to the counter.  We're sleeping so 
    # we can actually see the output.
    for i in range(20):

        line = ' '.join([choice(words) for i in range(100)])
        counter.send(line)
        time.sleep(0.5)

The word_counter coroutine creates the Counter instance and continues to feed it with strings as they're sent. On it's own, the word_counter coroutine wouldn't serve much purpose. It creates the counter instance and updates it as new text arrives. That is part one of it's responsibility. The second part is to notify a target coroutine when the counter changes state. That is, when new text arrives and is fed into the counter, we want to pass the counter down the coroutine pipeline for further processing.

And that is the essence of this example. When we created the word_counter coroutine, we passed it the most_common coroutine as the target. Remember, word_counter will send it's target the counter instance. All most_common does is take the counter and print some data about it. In this case, we can use these two coroutines to keep a real-time display of the most common word passed to the counter.

In summary, the Counter class keeps a tally on the most common word. We can use a coroutine process this data as new input becomes available. Notice that the word_counter coroutine intentionally passes the counter instance to a generic target. This means that we can easily swap out the most_common coroutine in favour of something else. Alternatively, the most _common coroutine could forward the counter onto another target coroutine for further processing. The choice really depends on the application, but there is much flexibility.