Friday, April 3, 2009

Magic methods of the Django QuerySet

The QuerySet class defined in the Django Python web application framework is used to manage query results returned from the database. From the developer perspective, QuerySet instances provide a high-level interface for dealing with query results. Part of the reason for this is that instances of this class behave very similarly to other Python primitive types such as lists. This is accomplished by defining "magic" methods, or, operator overloading. If an operator for a specific instance is overloaded, it simply means that the default behavior for that operator has been replaced when the operand involves this instance. This means that we can do things such as use two QuerySet instances in an and expression that will invoke custom behavior rather than the default behavior. This useful because it is generally easier and more intuitive to use operators in certain contexts than invoking methods. In the case of using two instances in an and expression, it makes more sense to use the built-in and Python keyword than it does to invoke some and() method on the instance. The Django provides several "magic" methods that do just this and we will discuss some of them below.

Python instances that need to provide custom pickling behavior need to implement the __getstate__() method. The QuerySet class provides an implementation of this method. The Django QuerySet implementation removes any references to self by copying the __dict__ attribute. Here is an example of how this method might get invoked.
import pickle
pickle.dumps(query_set_obj)

The representation of Python instances is provided by the __repr__() method if it is defined. The Django QuerySet implementation will call the builtin list() function on itself. The size of the resulting list is based on the REPR_OUTPUT_SIZE variable which defaults to 20. The result of calling the builtin repr() function on this new list is then returned. Here is an example of how this method might get invoked.
print query_set_obj

The length of the QuerySet instance can be obtained by calling the builtin len() function while using the instance as the parameter. In order for this to work, a __len__() method must be defined by the instance. In the case of Django QuerySet instances, the first thing checked is the length of the result cache. The result cache is simply a Python list that exists in Python memory to try and save on database query costs. However, if the result cache is empty, it is filled by the QuerySet iterator. If the result cache is not empty, it is extended by the QuerySet iterator. This means that the result cache is updated with any results that should be in the result cache but are not at the time of the __len__() invocation. The builtin len() function is then called on the result cache and returned. Here is an example of how this method might get invoked.
print len(query_set_obj)

Python allows user-defined instance to participate in iterations. In order to do so, the class must define an __iter__() method. This method defines the behavior for how individual elements in a set are returned in an iteration. The QuerySet implementation of this method will first check if the result cache exists. If the QuerySet result cache is empty, the iterator defined for the QuerySet instance is returned. The iterator does the actual SQL querying and so in this scenario, the iteration looses out on the performance gained by having cached results. However, if a result cache does exist in the QuerySet instance, the builtin iter() function is called on the cache and this new iterator is returned. Here is an example of how the __iter__() method might be invoked.
for row in query_set_obj:
print row

Python instances can also take part in if statements and invoke custom behavior defined by the __nonzero__() method. If an instance defines this method, it will be invoked if the instance is an operand in a truth test. The Django QuerySet implementation of this method first checks for a result cache, as does most of the other "magic" methods. If the result cache does exist, it will return the result of calling the builtin bool() function on the result cache. If there is no result cache yet, the method will attempt to retrieve the first item by performing an iteration on the QuerySet instance. If the first item cannot be found, false is returned. Here is an example of how the __nonzero__() might be invoked.
if query_set_obj:
print "NOT EMPTY"

Finally, Python instance may be part or and an or Python expressions. The Django QuerySet instance defines both __and__() and __or__() methods. When these methods are invoked, they will change the underlying SQL query used by returning a new QuerySet instance. Here is an example of how both these methods may be used.
print query_set_obj1 and query_set_obj2
print query_set_obj1 or query_set_obj2