Thursday, October 29, 2009

Gaphor Idle Threads

The Gaphor UML modeling tool written in Python uses GTK as its' user interface library. GTK is a good choice as it is portable across many platforms and has a nice feature set. What is also nice about the GTK library, from a development perspective, is it as fairly straightforward to handle events that are triggered from the user interface. Such events may be user-generated, such as mouse clicks on widgets. Other events, may be generated by the widgets themselves. Either way, adding a handler for any event such as these is trivial. Using gobject, developers can also add handlers that are executed in between events, such as when the event loop is idle.

In any GTK application, there is a main event loop that must be instantiated within the application code. This is going to be one of the very first actions executed because without it, there will be no responses to GTK events. Every time an event is triggered inside a GTK main event loop, the event instance is placed in a pending event queue. Once the event has been processed, or handled, the event is no longer considered pending and is removed from the queue. So what this means is that the GTK main loop can be viewed, at a high level, as having two distinct states; pending and free. These states are illustrated below.



Here, the initial state represents the instantiation of the GTK main event loop while the final state represents the termination of the main loop. The termination often means that the user has exited the application successfully but could also mean that the application has exited erroneously. Regardless, there are longer any GTK events that will be processed once the main loop has exited, even if the containing application has not exited.

As illustrated, the GTK main loop as to states while running and two transitions between these states. The GTK main event loop will transition to a pending state when there are one or more pending events. The GTK main event loop will transition to a free state when there are zero pending events.

Gaphor defines an idle thread class that makes good use of all this GTK event machinery. The GIdleThread class uses gobject.idle_add() to add a callback to the GTK main event loop. This callback is only executed when there are zero pending events. Actually, it will still execute if there are pending events with a lower priority but that doesn't necessarily concern the concept here. The key concept is that the callbacks created by GIdleThread are only executed when the GTK main loop is idle. The GIdleThread class is illustrated below.



So the nagging question from developers is, why add this abstraction layer on top of the gobject.idle_add() function? Simply put, the GIdleThread class is used to assemble queues when the GTK main loop isn't busy. The obvious benefit here being that queues of arbitrary size can be assembled without sacrificing responsiveness to the end user.

An example use of this class is to read and parse data files. The generator function that yields data is passed, along with the queue that will eventually contain all the parsed data to the GIdleThread constructor. This abstraction also provides the thread-like feeling for developers that use it. Although not a real thread, it looks and behaves like one and is ideal for constructing queues.

Tuesday, October 27, 2009

Simple Blog Use Cases

How do system designers illustrate the intended use of the system in question? One of the common standards is to draw UML use case diagrams. The core constituents of use case diagrams are actors and use cases. The actors are can be anything that is external to the system, but are often human users of the system. The use cases themselves are high-level goals that need to be realized by the system in order to keep the actors satisfied.

The key thing to remember with use cases is that they are purposefully a high-level concept in system modeling. They are there to explain, visually, what the system is to do. Since use cases generally shouldn't delve into much detail about what the system is supposed to do, these diagrams are often useful for showing to non-system designers.

Within UML use case diagrams, use case elements can have relationships with other use case elements. These relationships are of the extend or include variety. Illustrated below is an intentionally simplistic, incomplete blog system use case diagram.



In this example, we have to actors; Blogger and Reader. A lot can be inferred from the actor names alone. One probably doesn't even need to look at the diagram to realize that a Blogger publishes content and that a Reader reads content. Consequently, each actor has a single base use case. The chief use case for a Blogger using the system is to Publish while the chief use case of the Reader is to Read data from the system. The chosen names fit nicely with what is expected from model readers.

We do, however, show a few other use cases that are related to the above mentioned cases. The Edit use case is included by the Publish use case. The reason this relationship is an include relationship is because publishing can't happen without editing content. Here, we have explicitly taken the editing out of the Publish use case. The Publish use case is extended by the Upload use case. The reason this relationship is an extend relationship is because the system doesn't need to support uploading capabilities for all publishing functionality. However, it is shown explicitly here that it is required functionality. Finally, the Read use case is extended by the Subscribe use case because subscribing is reading from the system but isn't a requirement.

This example probably doesn't show even half of the potential use cases of interest for even a simple blog system. But I think that this is a good this. At least I think less is more when it comes to the individual use case diagrams. All use cases may be captured in the model but I would suggest breaking the use cases into several diagrams. Especially when extend and include relationships are involved. I find that when too many of these relationships are placed in a single diagram, it defeats the purpose of high-level system functionality illustration with simple notation.

Monday, October 26, 2009

Python Named Tuples

Tuples in Python are similar to lists in Python. The main difference of course being that tuples are immutable data structures. This means that once a tuple is instantiated, elements cannot be added or removed from the tuple as they can in list instances. The benefit to using tuples in Python applications is that they are used more efficiently by the interpreter simply because they are of fixed length.

The collections module provides efficient container structures that expand upon the primitive Python container types such as tuples, lists, and dictionaries. One of the container types offered by the collections module is the named tuple. This functionality is available in Python 2.6 or later. A named tuple is essentially a tuple which enables elements to be referenced by a field rather than an integer index, although the index may still be used as well. Below is an example of how to create an use a named tuple.
#Example; Named tuple benchmark.

#Do imports.
from collections import namedtuple
from timeit import Timer

#Create a named tuple type along with fields.
MyTuple=namedtuple("MyTuple", "one two three")

#Instantiate a test named tuple and dictionary.
my_tuple=MyTuple(one=1, two=2, three=3)
my_dict={"one":1, "two":2, "three":3}

#Test function. Read tuple values.
def run_tuple():
one=my_tuple.one
two=my_tuple.two
three=my_tuple.three

#Test function. Read dictionary values.
def run_dict():
one=my_dict["one"]
two=my_dict["two"]
three=my_dict["three"]

#Main.
if __name__=="__main__":

#Setup timers.
tuple_timer=Timer("run_tuple()",\
"from __main__ import run_tuple")
dict_timer=Timer("run_dict()",\
"from __main__ import run_dict")

#Display results.
print "TUPLE:", tuple_timer.timeit(10000000)
print "DICT: ", dict_timer.timeit(10000000)
Here, we create a new named tuple data type, MyTuple, by invoking namedtuple(). The namedtuple() is a factory function provided by the collections module. It is a factory function because it takes a set of fields as a parameter and assembles a new named tuple class. Next, we create a MyTuple instance by supplying it the tuple data.

Now we have two instances; my_tuple is a named tuple while my_dict is an ordinary dictionary instance. Next, we have two functions that will read values from our two data structure instances, run_tuple() and run_dict().

When I run this example, the run_tuple() takes significantly longer to execute than run_dict() does. So what does this mean? Well, what it means to me is that if you are already using dictionaries to read data in your program, keep using them. Especially if the elements are referenced by key.

They power of named tuples comes into play when developers have no choice but to deal with tuples. These tuples may be returned from some other developers code, or they just can't change it for whatever reason. Rather than having to stare at a meaningless integer index, named tuples can add meaning to code which can in turn have a huge impact.

GUI Controller Design

The introduction of graphical user interfaces, or GUIs, have made a huge impact on the way humans interact with computer software. The command line, or terminal, interface is intimidating to many people. You can't exactly do anything intuitively with the command line unless you have several years experience using it. With the GUI, widgets, the components that make up the screen that is displayed to the user, are designed in such a way that they users can infer how to interact with them. For instance, with a button widget in a GUI, it is more often than not, obvious that the button should be clicked. In addition to the actions that the user must take in order to interact with the interface, the GUI allows for descriptive text to be easily placed. This helps the user determine why this button should be clicked instead of that one.

On the development side of things, there is no shortage today of GUI libraries available for use. Most of these libraries are available free of charge as open source software. Also very popular these days is the web browser as an application GUI platform. This is simply because most machines have a web browser capable of rendering HTML. It makes sense to take this approach to reach the widest audience possible.

The GUI library of choice, be it Qt or the web browser, is just one layer in the GUI design structure. In fact, it is the lowest level. Beyond the GUI library layer, that is a lower level still, are all aspects that the application developer doesn't want to deal with. What about the opposite direction in the logical layout of the GUI design structure? The next layer up could potentially be the application controlling layer itself. In many applications, this is in fact how components are layered. But this may not always be ideal. It can be beneficial for design purposes to implement a facade type abstraction in between the application logic and the various GUI widgets that make up application GUI. Illustrated below are potential layers that might be used to tie the GUI to the application itself.



Here, the outermost layer are the App Controllers. This is the heart of the application logic. It is the brain of the program that lives here. Next, we have GUI Controllers. This is another abstraction created by developers for interacting with the GUI library. Finally, at the lowest layer sits the GUI Lib. With this layout, the application logic never interacts directly with the GUI library which is an ideal design trait. GUI controllers created by the developers of the application offer more flexibility in almost every way imaginable.

Firstly, the application logic doesn't need to concern itself with assembling the GUI. Chances are that a given GUI library isn't going to provide the screens that you want to display to your users. They do, however, provide all widgets required to make for a consistent look and feel in the GUI. It is the responsibility of the GUI controlling layer to assemble these GUI widgets in a coherent manor. Again, the application logic only needs to know that it needs to display something to the user. It asks the GUI controlling layer to carry out this task faithfully. There is also the potential for technology independence. If the application controlling layer is interacting directly with the GUI library, modifying the application to support another GUI library is going to be nearly impossible. If, however, this is the responsibility of the GUI controlling layer, this suddenly becomes feasible. Not only does this help with technological independence, but also with platform portability. Chances are that subtle differences in how the widgets are created and displayed will be necessary across platforms. This should be done by the GUI controlling layer and not the application layer as it should function as-is on any platform.

Illustrated below is an application controller and a GUI controller interacting. The idea here is to show that the application controllers do not interact directly with the GUI library. In addition, the application controller servers as a communication channel to other lower layers. For instance, here, the page widget data is retrieved from the database by the application controller. The application controller then sends a message to the GUI controller to construct a GUI component. It sends data retrieved from the database as part of the message.

Friday, October 23, 2009

Python Dictionary Generators

The dictionary type in Python is what is referred to as an associative array in other programming languages. What this means is that rather than looking up values by position, values may be retrieved by a key value. With either type of array, the values are indexed, meaning that it each value may be referenced individually within the collection of values. Otherwise, we would have nothing but a collection of values that cannot be referenced in any meaningful way.

The dictionary type in Python offers higher-level functionality than most other associative array types found in other languages. This is done by providing an API on top of the primitive operators that exist for traditional style associative arrays. For instance, developers can retrieve all the keys in a given dictionary instance by invoking the keys() method. The value returned by this method is an array, or list in Python terminology, that may be iterated. The Python dictionary API also offers other iterative functionality.

With the introduction of Python 3, the dictionary API has seen some changes. Namely, the methods that return lists in Python 2 now return generators. This is quite different from invoking these methods and expecting a list. For one thing, the return value is not directly indexable. This is because generators do not support indexing. The following example shows an example of how the dictionary API behaves in Python 3.
#Example; Python 3 dictionary keys.

#Initialize dictionary object.
my_dict={"one":1, "two":2, "three":3}

#Main.
if __name__=="__main__":

#Invoke the dictionary API to instantiate generators.
my_keys=my_dict.keys()
my_items=my_dict.items()
my_values=my_dict.values()

#Display the generator objects.
print("KEYS: %s"%(my_keys))
print("ITEMS: %s"%(my_items))
print("VALUES: %s"%(my_values))

#This would work in Python 2.
try:
print(my_keys[0])
except TypeError:
print("my_keys does not support indexing...")


#This would work in Python 2.
try:
print(my_items[0])
except TypeError:
print("my_items does not support indexing...")

#This would work in Python 2.
try:
print(my_values[0])
except TypeError:
print("my_values does not support indexing...")

#Display the generator output.
print("\nIterating keys...")
for i in my_keys:
print(i)

print("\nIterating items...")
for i in my_items:
print(i)

print("\nIterating values...")
for i in my_values:
print(i)
In this simple example, we start by creating a simple dictionary instance, my_dict. The idea is that this dictionary be a simple one as we aren't interested in the content. Next, we create three new variables, each of which, store some aspect of the my_dict dictionary. The my_keys variable stores all keys that reference the values in the dictionary. The my_items variable stores key-value pairs that make of the dictionary, each item being a tuple. The my_values variable stores the actual values stored in my_dictionary with no regard for which key references them. The important thing to keep in mind here is that these variables derived from the my_dict dictionary were created using the dictionary API.

Up to this point, we have my_dict, the main dictionary, and three variables, my_keys, my_items, and my_values, all created using the dictionary API. Next, we purposefully invoke behavior that isn't supported in Python 3. We do this by acting as if the values returned by the dictionary API are list values when they are in fact generators. This produces a TypeError each time we try to do it because the generators stored in my_keys, my_items, and my_values do not support indexing.

Finally, we simply iterate over each generator containing data derived from my_dict. This works just as expected and is in fact the main use of the data returned by the dictionary methods shown here. Sure, the direct indexing doesn't work on the returned generators, but is that really a common use of this data? I would certainly think not. The key aspect of this API change is that the API now returns a structure that is good at iterative functionality and that happens to be the intended use. And, if indexing these values that are returned from the dictionary API are absolutely necessary, these generators can easily be turned into lists. It is just an extra step involved for the rare use of the data.

Python Transaction Objects

The transactional programming model allows for changes to be made to data while preserving previous changes made to that same data. This allows the data in question to be altered without concerning ourselves with losing critical changes. This can only go so far though, because eventually, these previous transactions must be discarded. Otherwise, disk space suddenly becomes very precious. Some sort of data confirmation method needs to be applied to the data after some number of transactions. If confirmation fails at this point, the data can move backward in time. If the confirmation passes, the previous transaction data is destroyed.

Most database systems are largely transactional. The reason being that the main feature of any given database is to store and manipulate data. Providing transactional support is a huge requirement for systems that use databases. If the database doesn't provide the necessary transaction support, the applications that use the database would need to implement it. Transactional support isn't trivial to implement. Especially the kind of transactional support provided by production-grade databases.

Moving down to the individual transaction level, what data, exactly, does each transaction need to store? Do transactions need to make full copies of the data being operated on in order to restore previous states? This is one really inefficient way to do it. It is inefficient because the transaction data, once accumulated, would grow uncontrollably large. The better way to store transaction data is to store only what is absolutely necessary to revert the current data to a previous state. Once in a previous state, the same principle can be applied to the data to move further back in time still.

Does the transactional model have a place inside application code? Well, maybe on a fractional scale in comparison to database system transactional support. Having simplistic transactional support that fits inside an object-oriented design could potentially be well suited for small edits that need to be made to objects during runtime. In this case, the number of transactions at any given time would be very small and probably wouldn't exist for any significant amount of time. The real benefit here is simplicity. Even if the application you are building does use a database with transactional support, better to leave the heavy transaction lifting to it rather than bother it with smaller edits.

An in-memory transaction class could be of use for this purpose. Sub classes could then inherit from this class in order to become transactional. Below is a simple example of such a class as implemented in Python.
#Example; Python transaction objects.

#Do imports.
from difflib import ndiff, restore
from types import StringTypes

#String type tuple.
STRING_TYPES=StringTypes

#A transactional class that should be sub-classed.
class Transactional(object):

#Constructor. Initialize the transaction list.
def __init__(self):
self.transactions=[]

#Start recording a transaction.
def start(self):
_attribute={}
for i in dir(self):
current_attribute=getattr(self, i)
if type(current_attribute) in STRING_TYPES:
_attribute[i]=current_attribute
self.transactions.append(_attribute)

#Stop recording a transaction.
def stop(self):
_tran_index=len(self.transactions)-1
_tran_current=self.transactions[_tran_index]
for i in dir(self):
current_attribute=getattr(self, i)
if type(current_attribute) in STRING_TYPES:
_tran_current[i]="\n".join(ndiff(_tran_current[i], current_attribute))

#Rollback the last stored transaction.
def rollback(self):
_tran_index=len(self.transactions)-1
_tran_current=self.transactions[_tran_index]
for i in _tran_current.keys():
setattr(self, i, "".join(restore(_tran_current[i].splitlines(), 2)))
self.transactions.pop(_tran_index)

#Commit all changes.
def commit(self):
self.transactions=[]

#Simple class capable of storing transactions.
class Person(Transactional):

#Constructor. Initialize the Transactional class.
def __init__(self):
super(Person, self).__init__()
self.first_name=""
self.last_name=""

def set_first_name(self, first_name):
self.first_name=first_name

def set_last_name(self, last_name):
self.last_name=last_name

#Main.
if __name__=="__main__":

#Instantiate a person.
person_obj=Person()

#Start recording a transaction.
person_obj.start()

#Manipulate the object.
person_obj.set_first_name("John")
person_obj.set_last_name("Smith")

#Stop recording the transaction.
person_obj.stop()

#Manipulate the object.
person_obj.set_first_name("jOhN")
person_obj.set_last_name("sMiTh")

#Display object data.
print "FIRST NAME:",person_obj.first_name
print "LAST NAME: ",person_obj.last_name

#Rollback to latest stored transaction.
person_obj.rollback()

#Display object data.
print "FIRST NAME:",person_obj.first_name
print "LAST NAME: ",person_obj.last_name
In this example, the Transactional class is responsible for providing the transaction support for sub classes. The basic idea behind this class is that it will provide very basic transaction support for any string attributes of the class. This means that sub classes can define any number of string attributes and each one will be transactional.

There are four basic methods to Transactional: start(), stop(), rollback(), and commit(). The Transactional.start() method will start recording a transaction. This means that any changes made after the method is invoked, will be part of the transaction data. The Transactional.stop() method completes the current transaction that is being recorded. It does this by using Python diff support to store only the changes that have been made to the data. The Transactional.rollback() method restores the string attributes to the most recently stored transaction state. Again, this is done using Python diff support. Finally, Transactional.commit() simply purges all transaction data.

Thursday, October 22, 2009

Publishing CSV

I recently came across this Python recipe for serving CSV data from a CherryPy web controller. The recipe itself is fairly straightforward. It is a simple decorator that can be applied to a method which is also a CherryPy controller. The decorator will take the returned list data and transform it into a CSV format before returning the response.

You sometimes have to wonder about the CSV format, it is quite non-descriptive for humans to read. Fortunately, that isn't the reason it was created. The CSV format is easy for software to understand without requiring a lot of heavy lifting. This is why it is still widely used. Virtually any data can be represented with it. But what happens when something goes wrong? Typically, if an application complains about some kind of data it is trying to read, a developer of some sort needs to take a look at it. Good luck with trying to diagnose malformed CSV data, especially a large chunk of it.

The fact of the matter is, the CSV format is still a requirement in many contexts. Especially in terms of providing the data as opposed to consuming it. It is most likely going to be easier to provide a CSV format to a client than it is to say the client needs to be altered to support SOAP messages. If that were the case, there would certainly be many upset people using your service, or, nobody using your service at all.

As the recipe shows, transforming primitive data structures into CSV format isn't that difficult. Especially with high level languages that have a nice CSV library like Python does. The HTTP protocol is more than likely going to be the protocol of choice. In this case, the important HTTP headers to set are Content-Type and Content-Length.

Odds are, the CSV format isn't the format of choice for most web application designs. That is, the developers building APIs aren't going to use CSV. They are probably going to use something more verbose like JSON or some XML variant. This just makes the clients that need to interact with this data much easier. Also, they much more common. Chances are that CSV support will be an afterthought. This isn't an uncommon change request for any web application project though. No one building a web application can expect the initial chosen format to suffice throughout the lifetime of the application. What this means is that exposing the data to the web is the easy part. It is coming up with a sustainable design that is the challenge.

Many web application frameworks provide support for multiple response formats. And, if CSV isn't one of them, chances are that there is a plugin that will do it.

So, even if the framework doesn't support CSV data transformation functionality, as is the case with CherryPy, the Python CSV functionality will do just fine on its own. That is, the controller can be extended with CSV capabilities as is the case in the recipe. Below is an illustration of a web controller with CSV capabilities.



Here the Controller class inherits CSV capabilities. The DomainObject class, which for our purposes here could be anything that is part of the application domain and needs to be exposed to the web. With this design, as is the case with CSV functionality offered by the web framework of choice, the responsibility of the CSV transformation falls outside of the domain entirely.

Below is an alternate design that give CSV data transformation capabilities directly to the DomainObject class.



So which design is more realistic? Probably the former simply because the use case for CSV data transformation outside of the web controller context isn't all that common. But does the latter design even make sense? Are we giving too much responsibility to a business logic class? Well, I would argue that it depends on how portable the domain design classes need to be. Sure, with the CSV capabilities being given to a domain class, we are violating a separation of concerns principle, albeit, only slightly. It isn't as though the class itself is being altered to support a specific data format. If this is a valid trade-off in a given context, I would strongly consider keeping data format transformation out of the web controllers.

Wednesday, October 21, 2009

Applying Django Middleware

The Django Python web application framework supports the notion of middleware. What exactly is middleware? In the Django context, middleware are Python packages that do not explicitly belong to any particular Django application. Nor do these middleware packages belong to the Django package although Django does ship with some common middleware. The middleware Python packages that are independently installed sit in the middle.

So whats the point of separately installing Django components if they aren't part of either Django or the applications that use them? Well, since middleware components can be enabled in a Django application's settings, these middleware components may be shared between different applications that use the same Django installation. This keeps the Django core small.

The actual middleware components themselves are composed of classes. These classes must implicitly provide a Django middleware method. These methods can be one of process_request(), process_view(), process_response(), and process_exception().

The middleware components that an application wants to use are invoked automatically by the Django framework. All the application needs to do is enable the desired middleware components. The Django framework uses the BaseHandler class and the WSGIHandler class to invoke the middleware behavior. These classes are illustrated below.



Below is an illustration showing how the to classes, BaseHandler and WSGIHandler, collaborate to process all enabled middleware methods.

ArgoUML Proof Of Concept

It has been a while in the making but the ArgoUML modeling tool has made available a development version of the 0.30.0 release. What is interesting here is that this development release, 0.29.1, will provide limited UML 2 support. Up until now there has only been UML 1.4 support in ArgoUML.

The ArgoUML software package has been around for quite some time and is relatively stable. The major roadblock for large scale ArgoUML adoption has been the lack of UML 2 support. The problem is that UML 2 has been the standard for many years now. The newer modeling constructs are gaining in popularity and if a tool doesn't support them, it is difficult to justify using it even if it has a great user interface with many other features.

The development version of ArgoUML that provides limited UML 2 support can be downloaded and used. It should not, however, be used in a production environment as is the case with any development release of any software. The model subsystem to use is specified by command-line arguments. No word on whether the UML 1.4 model will still be supported when ArgoUML 0.30.0 is released.

Tuesday, October 20, 2009

Evaluating Development Processes

I read an interesting entry here about the software development process and it gave me a reminder of how simple, in fact, it can be. I was reminded of some of the common aspects of the software development life cycle and that they are all important for success, to varying degrees. The key thing is that variability.

Analysis, design, implementation, and testing seems to be the common factor in any aspect of the development process. Realistically, one can't create software without crossing each of these phases at least once. They all must occur, ideally longer than just briefly. But whatever the development process that is chosen by a given team, each of these phases is going to need evaluation in terms of time investment.

This is the part that comes even before a given project comes into existence for a software development team. This is where it is important to set a consistent time to be spent for each phase. But this isn't going to happen for the first project that is hammered out by a team. Trying to set an appropriate amount of time for each development phase is completely pointless. The first project is going to be trial and error. What is important is that once the team is able to find some time allotment that works, stay consistent. Consistency makes all stakeholders involved happy when it comes to timing. This includes the customers.

Twisted Application Reactors

At the heart of any Twisted Python application lies one or more reactors. Reactors, as the name suggests, react to events. Whether the event is local or remote, the reactor is the core concept in Twisted concerned with invoking behavior in response to these events.

The Twisted application module defines an application Reactor class. This class is extended by actual reactor implementations that may be used within Twisted applications. Reactors that are available for installation may be enumerated over by using the getReactorTypes() function. The use of this function is illustrated below.
#Example; Enumerating reactor types.

#Import the required function.
from twisted.application.reactors import getReactorTypes

#Main.
if __name__=="__main__":

#Iterate over the reactor types.
for i in getReactorTypes():

#Display reactor information.
print i.shortName
print i.description,"\n"
The design of the Reactor class is really straightforward. The Reactor is an extensibility mechanism of the Twisted framework. That is, it can be installed and used as necessary. The Reactor class provides the IPlugin and the IReactorInstaller interfaces. The Reactor class and the interfaces it provides are illustrated below.
As you can see, the Reactor class has three simple attributes; shortName, moduleName, and description. The moduleName is really important for installing the reactor. It tells Twisted where the actual code for the reactor can be found.

The getReactorTypes() is useful for providing choices to the user. This would actually make more sense as a configuration option for advanced users such as an administrator.

Monday, October 19, 2009

Popular Python Frameworks

The Python programming language is great for building web application frameworks. The two main reasons for this are that it is a very simple language to use and understand and it has fantastic networking libraries. Given these two facts, it is no wonder that there exist dozens of relatively solid web application frameworks written in Python.

The more popular web frameworks are the stable ones that have been around for some years. These frameworks have stood the test of time and have a large feature set.

I found another interesting way to look at which frameworks are the popular frameworks by using the Python package index. I browsed the available packages by framework to see which ones have the most packages. The Zope world is still dominating the Python web application framework market. Django is slowly catching up. At the time of this writing, here are the top frameworks listed by number of available packages.
This list also demonstrates which frameworks are extensible because the easier it is to extend a software package, the more developers are willing to extend it with other packages and release them. What is surprising is the small number of Twisted and Trac packages. Both frameworks are well written and easily extensible. Having said that, the number of packages listed isn't entirely accurate because not all conceivable framework package lives in the Python package index. Also, there are most likely some categorization errors to take into account.

jQuery Array Deletion

The jQuery javascript toolkit provides several useful utility functions for working with arrays. One wouldn't think that this functionality would be necessary but in the world of javascript, this is often the case because of different browser implementations. The array utility functionality offered by jQuery includes basic array manipulation and searching.

One such searching function is the jQuery.inArray() function. As the name suggests, the function will determine if a specified element exists in a specified array. This utility is indispensable for javascript developers simply because of the amount of code it reduces.

Searching for elements in a javascript array often involves some kind of looping construct. In each iteration, we check if the current element is the desired element. In the case of array element deletion, we do something like this. The example below illustrates how the jQuery.inArray() function compliments the primitive splicing functionality of javascript arrays.
//Example; jQuery array deletion.

//Make the array.
var my_array=jQuery.makeArray(["A", "B", "C"])
console.log(my_array);

//Find an element to delete.
var my_pos=jQuery.inArray("B", my_array);
console.log(my_pos);

//Delete the element only if it exists.
my_pos < 0 || my_array.splice(my_pos, 1);
console.log(my_array);
In this example, we use the jQuery.makeArray() function because it returns a true array. It isn't needed but is a good practice regardless. Next, we find the position of the element we want to delete. Finally, if the my_pos value is less than 0, the element wasn't found and nothing happens. Otherwise, we splice the element out of the array. With the help of jQuery, we are able to seek and destroy array elements with two lines of code.

Friday, October 16, 2009

When To Generate Code

Code generation can be a blessing, a complete nightmare, or a combination of both for developers. It is a blessing when the right tools and the right know-how are employed. It is a nightmare when the wrong tool is used and much time has been vested in it. It is both a blessing and a nightmare when all appears to be going well until the maintenance of the code become unmanageable.

The whole point of using code generation in the first place is to eliminate the need for developers having to write mundane, boiler-plate code. Another use, although still considered boiler-plate in most circumstances, is generating GUI code. Many GUI builder tools allow for this in many programming languages.

Whether the boiler-plate code was generated by a UML modeling tool or by a GUI builder tool, the generated code should be imported by some other application module. This is necessary in order to promote isolation between the generate code and the human-designed code. The hand-crafted stuff created by a human developer usually doesn't interact well with the generated stuff. It is always going to make more sense to let the developer find a way to make the generated code work with the other application code. The reverse isn't true; the generated code isn't smart enough to work with the developer code.

So the use case for code generation is quite obvious; to save development time. In the case of developing user interface code, it is nearly impossible to maintain due to the level of verbosity. This is necessary and there really isn't any way around it other than to maintain the user interface graphically with a design tool that generates the code. So always generate GUI code, but always import it.

The classes associated with the problem domain generally store data and don't much behavior if any. These classes are good candidates for code generation. The reason being that the lack of behavior is a good thing when it comes to behavior. This is because the code that is being generated is a static artifact and the code it contains should be mostly conceptually static. Attempting to implement behavior inside a model that is then transformed into running code is a bad idea because method signatures aren't trivial to maintain and because behavior generally grows more complex.

Python Super Classes

The Python programming language is considered to be an object-oriented language. This means that not only must it support classes, but it must also support inheritance in one form or another. Inheritance is the principle of object-oriented software development that allows developers to say class A "is a kind of" class B.

Not all object-oriented languages support it, but multiple inheritance is another form of inheritance what allows developers to say class A "is a kind of" class B "and is also a kind of" class C. The Python programming language does support multiple inheritance and can support designs that employ the principle when needed.

Opponents of multiple inheritance say that it is an unnecessary feature of object oriented languages and in most cases, they are correct. Not necessarily correct about the fact that multiple inheritance shouldn't be a language feature, but about the design itself. Like anything else, multiple inheritance can be abused and actually hurt the software. Most of the time, it isn't needed and something more simplistic in design terms is ideal.

Consider the following class hierarchy. Here we have a Person class that acts as the root of the hierarchy. Next, we have an Adult and a Remote class, both of which inherit directly from Person. Finally, the Student class inherits from both Adult and Remote. The Student class uses multiple inheritance.



This is an example of where multiple inheritance may come in handy. The Remote class represents something that isn't local. This could be a Student or something else in the system. Since it is required that Student inherit from Adult, it makes sense that it also be able to inherit from Remote. A student can be both things.

Below is an example of this class hierarchy defined in Python. The super() function really helps us here because the Student class would otherwise need to invoke the individual constructors of each of its' super classes. Not only is this less code, it is also more generic of a design. All Student instances base class constructors will continue to be invoked correctly, even as these base classes change.
#Example; Using the super() function.

#Root class.
class Person(object):
def __init__(self):
super(Person, self).__init__()
print "Person"

#Adult class. Inherits from Person.
class Adult(Person):
def __init__(self):
super(Adult, self).__init__()
print "Adult"

#Remote class. Inherits from Person.
class Remote(Person):
def __init__(self):
super(Remote, self).__init__()
print "Remote"

#Student class. Inherits from both Adult and Remote.
class Student(Adult, Remote):
def __init__(self):
super(Student, self).__init__()
print "Student"

#Main.
if __name__=="__main__":
#Create a student.
student_obj=Student()

Thursday, October 15, 2009

Documenting With Sphinx

Whether a developer is writing an application in Python, or any other language, API documentation is an absolute must. The API documentation should go without saying these days. If a developers asks if they are a requirement, the this fact hasn't been driven home hard enough. Other developers shouldn't need to sift through a mountain of code just to find a function signature. Especially with the nice output that is available with the generation tools available.

Sphinx is just such a tool, geared toward generating API documentation output for Python applications. One of the nice things about Sphinx is its' own API documentation. There isn't much on the how-to end but every supported language construct is there and it is made clear how to use it.

Another quality of Sphinx is that it is general purpose enough to use with languages other than Python. As long as the concepts are somewhat similar to those found in Python, it can work. And this is helpful for developers if a single documentation generation tool can be used for all output.

Developers do, however, have a big decision to make when considering Sphinx as the documenter of choice. Sphinx requires that RST documents that contain the actual API documentation to be maintained. That is, Sphinx cannot generate API documentation based on the doc-strings of the source code. Many other documentation generation tools can do this, but the overall quality of the Sphinx output is far above anything else.

When To Use Ajax

Take a look around the web today. You'll be hard-pressed to find many interfaces that don't incorporate at least some aspect of ajax. The term ajax has grown to mean more than just asynchronous javascript API requests, although that is a huge part of it.

Ajax can add a new level of interactivity to a web application user interface. This can be done by making changes to an existing interface or by implementing the principles of ajax straight from the get go. Typically, however, you aren't going to want to use asynchronous API calls from an existing javascript application that doesn't already. Especially not if it is stable. There are other aspects of the ajax style that can still be applied in these scenarios.

Many javascript toolkits are in existence today and most provide widget support in one form or another. Widgets are really no different from those found in APIs for desktop environments. The only obvious difference is that these widgets are drawn in the browser using javascript and CSS. Developers get a few things for free when implementing the widgets found in javascript toolkits. First, you get the free design, which, isn't easy to do. Especially if themes are of concern. Second, many of these widgets are configurable and can offer the end user subtle behavioral interaction improvements. The overall user interface has been made "smoother".

One of the great things about the web is addressability. Opponents of ajax say that ajax applications take this attribute away. And, in most cases, they are right because applications that use asynchronous javascript API calls exclusively, have no URIs that users can see and manipulate. They can't copy and paste links. This is one of the key reasons why the web is the size that it is today.

There is, however, a middle ground. Just because a web application implements ideas found in ajax doesn't mean that it can't have useful URIs. Even if the set of useful URIs is a small one, it is still useful to have for the user's sake. Obviously, these URIs become less important if the web application is an internal application that doesn't live on the web.

Wednesday, October 14, 2009

Working With Elements

There is virtually no escaping XML markup in one form or another when building modern applications. Whether HTML, SOAP, or some other dialect, XML has become an important standard to support in applications. Even if the application isn't a web application.

Thankfully, most programming languages have built-in library support for reading and manipulating XML. Some are better than others. For instance, the ElementTree Python package is probably the easiest library for developers to work with. It doesn't add unnecessary complexity on top of a simplistic standard.

The ElementTree package is now part of the growing set of standard Python modules included in the distribution. Since the module can be used to both read and write XML data, it cuts down on dependencies. Most XML libraries support both reading and writing of XML data, but like any other library type, it may do one well but not the other.

The following is an example of how easy it is to not only use the ElementTree package to build an XML document, but also to add abstractions around the elements that are created.
#Example; Abstracting elements.

#Do element tree imports.
from xml.etree.ElementTree import Element, tostring

#The base DOM element.
class Dom(object):
def __init__(self, name, **kw):

#Create the element and set the attributes.
self._element=Element(name)
for i in kw.keys():
self._element.attrib[i]=kw[i]

#Set an element attribute.
def __setitem__(self, name, value):
self._element.attrib[name]=value

#Get an attribute.
def __getitem__(self, name):
return self._element.attrib(name)

#Append a sub-element.
def append(self, value):
self._element.append(value._element)

#A specialized Dom class for accepting raw text content.
class DomContent(Dom):

#Constructor.
def __init__(self, name, content=None, **kw):
Dom.__init__(self, name, **kw)
self._element.text=content

#Common HTML elements.
class Head(Dom):
def __init__(self):
Dom.__init__(self, "head")

class Title(DomContent):
def __init__(self, content, **kw):
DomContent.__init__(self, "title", content, **kw)

class Body(Dom):
def __init__(self):
Dom.__init__(self, "body")

class Div(DomContent):
def __init__(self, content, **kw):
DomContent.__init__(self, "div", content, **kw)

#The root document.
class Document(Dom):
def __init__(self, title):
Dom.__init__(self, "html")

#Initialize the head, title, and body elements.
self.head=Head()
self.title=Title(title)
self.body=Body()

#Add the title element to the head element.
self.head.append(self.title)

#Add the head and body elements to the document.
self.append(self.head)
self.append(self.body)

#Actual output.
def __str__(self):
return tostring(self._element)

#Main.
if __name__=="__main__":

#Initialize the document with a title.
my_doc=Document("My Document")

#Create a div with content and an attribute.
my_div=Div("My Div", style="float: left;")

#Add the div to the body.
my_doc.body.append(my_div)

#Display.
print my_doc
In this example, we construct a simple HTML page. The Dom class is the topmost level abstraction that we create around ElementTree. The Dom class is meant to represent any HTML tags that are placed in the HTML page. The Dom._element attribute represents the actual ElementTree element. The Dom constructor will give the element attribute values based on what keyword parameters were passed to the constructor. The Dom.__getitem__() and Dom.__setitem__() methods allow element attributes to be get and set respectively. The Dom.append() method allows other Dom instances to be attached to the current instance as a sub-element.

The DomContent class is a simple specialization of the Dom class. The DomContent class accepts an additional content parameter, otherwise, the class isn't really any different than its' base class.

The Head, Title, Body, and Div classes are all standard HTML specializations of the Dom class. The main difference being that Title and Div inherit from DomContent instead of Dom because they support raw text content.

The Document class is a helper type of abstraction. It assembles Dom elements common in all HTML pages we might want to build. It is the Document class that makes the main program trivial to read and understand.

Cloud Clients

The term cloud has many definitions in a computing context these days. Some refer to the various social networks as a cloud which I think is overly broad. Perhaps the best definition is the simplest; "a group of interconnected nodes". A "cloud" of nodes is a design construct, not a deployment one. The number of nodes and their respective locations should have no impact on the terminology used.

So what are the roles of these nodes that make up clouds? Typically, a node plays the role of a server. They act when they are requested to act. Users of these clouds are actually outside the cloud. The servers within the cloud then act on behalf of these client requests.

So is it possible to have these outside clients join into the cloud in order to share some of these computational resources? It certainly is and that is what peer-to-peer computing is all about. In this distributed computing model, the client is them most prevalent role in the entire system. Forget about the managers that allow these clients to discover one another. The manager are necessary but the clients taking on more than just a simple dummy role is what is interesting. It allows scale within the cloud to spread like disease.

Tuesday, October 13, 2009

Shrinking Python Frameworks

An older entry by Ian Bicking talks about the shrinking world of Python web application frameworks. It is still a valid statement today, I think. There is no shortage of Python web application frameworks in which to choose from. Quite the contrary, it seems that a new one springs into existence every month. This often happens because a set of developers have a very niche application to develop and the existing web application frameworks don't cut it. Either that or they are missing a feature or two, or they have too many moving parts and so they will make some modifications. Whatever the difference, some developers will release their frameworks as an open source project.

The shrinking aspect refers to the number of frameworks which are a realistic choice for implementing a production grade application. Most of the newer Python web application frameworks, still in their infancy, are simply not stable enough.

Take Pylons and TurboGears for instance. Both are still OK web frameworks, you can't have TurboGears without Pylons now. However, they are somewhat problematic to implement applications with. Even if stable enough, there are complexities that shouldn't exist in a framework. Besides, I have yet to see a stable TurboGears release.

Taking complexity to a new level is Zope. This framework has been around for a long time and is extremely stable. But unless you have been using it for several years, it isn't really worth it because of the potential for misuse is so high.

The choice of which Python web application framework to use really comes down to how much programming freedom you want. If you want everything included, Django does everything and is very stable. However, if there are still many unknowns in the application in question, there are many stable frameworks that will simply expose WSGI controllers and allow the developers to do as they will.