Monday, October 26, 2009

Python Named Tuples

Tuples in Python are similar to lists in Python. The main difference of course being that tuples are immutable data structures. This means that once a tuple is instantiated, elements cannot be added or removed from the tuple as they can in list instances. The benefit to using tuples in Python applications is that they are used more efficiently by the interpreter simply because they are of fixed length.

The collections module provides efficient container structures that expand upon the primitive Python container types such as tuples, lists, and dictionaries. One of the container types offered by the collections module is the named tuple. This functionality is available in Python 2.6 or later. A named tuple is essentially a tuple which enables elements to be referenced by a field rather than an integer index, although the index may still be used as well. Below is an example of how to create an use a named tuple.
#Example; Named tuple benchmark.

#Do imports.
from collections import namedtuple
from timeit import Timer

#Create a named tuple type along with fields.
MyTuple=namedtuple("MyTuple", "one two three")

#Instantiate a test named tuple and dictionary.
my_tuple=MyTuple(one=1, two=2, three=3)
my_dict={"one":1, "two":2, "three":3}

#Test function. Read tuple values.
def run_tuple():
one=my_tuple.one
two=my_tuple.two
three=my_tuple.three

#Test function. Read dictionary values.
def run_dict():
one=my_dict["one"]
two=my_dict["two"]
three=my_dict["three"]

#Main.
if __name__=="__main__":

#Setup timers.
tuple_timer=Timer("run_tuple()",\
"from __main__ import run_tuple")
dict_timer=Timer("run_dict()",\
"from __main__ import run_dict")

#Display results.
print "TUPLE:", tuple_timer.timeit(10000000)
print "DICT: ", dict_timer.timeit(10000000)
Here, we create a new named tuple data type, MyTuple, by invoking namedtuple(). The namedtuple() is a factory function provided by the collections module. It is a factory function because it takes a set of fields as a parameter and assembles a new named tuple class. Next, we create a MyTuple instance by supplying it the tuple data.

Now we have two instances; my_tuple is a named tuple while my_dict is an ordinary dictionary instance. Next, we have two functions that will read values from our two data structure instances, run_tuple() and run_dict().

When I run this example, the run_tuple() takes significantly longer to execute than run_dict() does. So what does this mean? Well, what it means to me is that if you are already using dictionaries to read data in your program, keep using them. Especially if the elements are referenced by key.

They power of named tuples comes into play when developers have no choice but to deal with tuples. These tuples may be returned from some other developers code, or they just can't change it for whatever reason. Rather than having to stare at a meaningless integer index, named tuples can add meaning to code which can in turn have a huge impact.