Wednesday, March 11, 2009

Interesting bug found in the boduch Python library

In the latest release of the boduch Python library, Set instances can now be iterated over. This is done by defining a custom iterator class, SetIterator, that is returned by the Set.__iter__() method. I thought I would further test out this new functionality in a hope that I would discover some new unit tests that I can include with the library. But before I could even get to the Set iteration testing, I discovered an entirely new bug with the Set class.

Firstly, here is the code I used to find the bug.
#Example; boduch Set bug.

from boduch.data import Set
from boduch.handle import Handle
from boduch.event import subscribe, threaded, EventSetPush

class MyHandle(Handle):
def __init__(self, *args, **kw):
Handle.__init__(self, *args, **kw)

def run(self):
pass

if __name__=="__main__":
threaded(True)
subscribe(EventSetPush, MyHandle)

set_obj1=Set()
set_obj2=Set()

set_obj1.push("data1")
set_obj2.push("data2")

print "SET1",set_obj1.data
print "SET2",set_obj2.data
Here, we defined a custom event handle called MyHandle. I the run method doesn't actually do anything because I discovered the bug before I wrote any handling code. In the main program, we set the event manager to threaded mode. Next, we subscribe our custom event handle to the EventSetPush event. This means that every time Set.push() is invoked, so is MyHandle.run() (in a new thread since we are running in threaded mode here). We then create two set instances and push some data onto each set. Finally, we print the underlying Python lists associated with each set instance.

Here is my initial output.
SET1 ['data1', 'data2']
SET2 ['data1', 'data2']
Slightly different from what was expected. Each set instance should have had one element each. Instead, the lists look identical. Naturally, I assumed that they were the same list. This lead me to start examining the thread manager, thinking that since I was testing in threaded mode, there must be some sort of cross-thread data contamination. Thankfully, the problem got much simpler since I was able to eliminate this as a potential cause. Next in line, the event manager. I tried everything to try and prove that the Set instances were in fact the same instance. Not so. The instances had different memory addresses.

I then realized that Set inherits from Type but the constructor for type is not invoked. Odd. I tried to think of a reason why I would want to inherit something with no static functionality and not initialize it. I think I may have done this because the underlying list instances of Set objects are stored in an attribute called data. Type instances also define a data attribute. I must have thought, during the original implementation, that defining a data attribute for the Set class would have some adverse effect on the Type functionality. Not so. So now, the Type constructor is invoked but with no parameters. This means that the initial value of the Set.data attribute is actually an empty dictionary since this is what the Type constructor will initialize it as. The Set constructor will then initialize the data attribute to a list accordingly.

This however, wasn't the problem either. I was still getting the strange output that pointed so convincingly at the fact that the Set.data attribute was pointing to the same list instance. So, I took a look at the way in which the data attribute is initialized for Set instances. The Set constructor will accept a data keyword parameter. The default value of this parameter is an empty list. This parameter then becomes the Set.data attribute. Just for fun, I decided to take away this parameter and have the data attribute be initialized as an empty list inside the constructor.

Sure enough, that did it. I got the correct output for my two set instances. The data attribute must have been pointing to the same keyword parameter variable. I have a felling that this may be caused somewhere in the event manager. Or maybe not. I haven't tested this scenario outside the library yet.

I'm going to get this release out as soon as possible. The data keyword parameter for the Set constructor will be removed for now. As a side note, this will also affect the iteration functionality for Hash instances in the next release since the Hash.__iter__() method will return a SetIterator instance containing the hash keys. Each key will simply need to be pushed onto the set instead.

1 comment :

  1. If I understood correctly:
    In [66]: def some_func(data =[]):
    data.append(1)
    ....: return data
    In [68]: some_func()
    Out[68]: [1]
    In [69]: some_func()
    Out[69]: [1, 1]
    In [70]: some_func()
    Out[70]: [1, 1, 1]

    mutable types should not be used as default arguments!

    ReplyDelete