Wednesday, September 30, 2009

Python Cheetah Benchmark

With any web application, no matter what language it may be written in, templates need to be rendered. These are often HTML templates. If there were no templates, just static HTML files, what we would have is a static site as opposed to a web application. The dynamic aspects of the page are filled in by rendering a template. This is the most common approach for several reasons. Chiefly, the user interface is nicely separated from application logic. This, in turn, makes it easier for the user interfaces developers and designers to work without disrupting the logic they aren't concerned with. This also works vice-versa.

The Cheetah template rendering system, written for Python web applications, is a mature one. Cheetah provides a clean, elegant template syntax when compared to other Python template rendering systems. For instance, the syntax constructs are not represented as tags. This provides both flexibility and separation of concerns. Since tags aren't required, Cheetah can be used for templates other than HTML. Also, it is more obvious with Cheetah what is HTML markup and what is template markup.

Below is a very incomplete, sample of how to use Cheetah pragmatically. That is, it is very easy to plug into any potential Python system.

#Example; Timing Cheetah rendering.

#Do imports.
import timeit
from Cheetah.Template import Template

#Rendering test.
def render_cheetah():
  
   #Cheetah template string.
   template_str="""<html>
                        <head>
                            <title>$title</title>
                        </head>
                        <body>$body</body>
                    </html>"""
  
   #The context supplied to the template; variable substitution.
   context={"title":"Cheetah Test", "body":"Cheetah Test"}
  
   #Initialize the Cheetah template object.
   template_obj=Template(template_str, searchList=[context])
  
   #Render the template.
   str(template_obj)
  
#Main.
if __name__=="__main__":
  
   #Run the test an print the results.
   cheetah_timer=timeit.Timer("render_cheetah()",\
                              "from __main__ import render_cheetah")
   print "Cheetah:",cheetah_timer.timeit(number=1000)

Open Source Quality

In an interesting entry over at PC world, they mention a study that shows an overall decrease in defective open source software. Over the past three years, defects in open source software are down. This is great news, even if not entirely accurate, because I doubt the study is so flawed that there are more defects in open source software today. Every day, new open source projects poof into existence. How can all the complexities of the open source ecosystem be reliably measured? The truth is that they cannot. But some larger open source projects are much more popular than others and have been deployed in production environments. These are the interesting projects to monitor for defects because chances are, when a defect is fixed in one large project, that same defect will be fixed in countless others due to the dependency.

What I find interesting is the willingness to study and publish the results of code quality. To most users, even some developers, the code quality isn't that high on the requirement list for using the software. They don't see the code. Even most developers only see the API, and, arguably, that is all they should have to see. The code quality does effect more than just maintainability.

This brings up another question. Does the improved code quality improve the perceived used experience? Not likely, in most cases. But in some, yes. Even if it isn't obvious to the developers who fix a bug that wouldn't have any apparent effect on usability. Looking at these subtle usability enhancements in the aggregate might be interesting.

Tuesday, September 29, 2009

Instance Factory

A factory in object-oriented programming is a design pattern that creates instances of classes. There are variations on this pattern but for my purposes, I simply refer to it as an instance factory because that is essentially what it is used for. The factory takes the responsibility of directly instantiating a class from the context that uses the factory. That context may be some function or, more often, some method of another class. The factory itself is generally a class with several static or class methods. It is these methods that construct and return instances.

So if the developer can take the responsibility of directly instantiating some class away from the method they are currently working on, what do they gain? In this case, it isn't what they gain but what they lose. They lose the direct coupling to the class in question. In most cases, a given class is going to need to create more than one type of instance throughout its' lifetime. This means that there is a dependency between the class in question, serving as the context, and the other classes that it depends on. If the class in question requires only a factory, the class then becomes loosely-coupled. This is an important design factor.

The instance factory is essentially a proxy for the act of creation. It isn't a proxy for data but for behavior. This is the sole responsibility of the factory. If a developer can see a factory invocation in code, chances are that their guess as to what it does will be correct. Since the instance factory is so specialized, it will in turn help with the distribution of responsibilities where ever it is used.

Python Yield Interleaving

Distributed, concurrent applications seems to be the hot topic of interest in computing these days. And, it should be. With an ever increasing amount of data to process, both locally, and on the web, the need to speed things up is crucial. The best way to achieve this speed is by means of concurrency, doing multiple things at the same time. But true concurrency is hard to achieve. Sometimes impossible to achieve as is the case with a single processor. This, however, does not mean that the responsiveness of applications cannot be improved.

Applications that are logically concurrent can support both true hardware concurrency and interleaving concurrency. Interleaving is a time sharing technique that is used when true concurrency is not possible due to a single processor. If interleaving were not used on single processor machines, trivial tasks would render the system useless due to the response time. If nothing else, the single processor architecture has shown how important interleaving is to responsiveness and the effects of that responsiveness.

Applications written in Python can also be designed to be logically concurrent. This can be achieved both by means of threads and by yielding data. Threads in Python are an interleaving construct due to the global interpreter lock, even on multiple processor systems. Yielding data from functions is also interleaving because each element that is yielded from a generator, creates a new process flow. Once this new flow has completed, the flow resumes with the next element to be yielded. An example showing how this interleaving by means of yielding is shown below.

#Example; Python yield interleaving.

#Take an iterable and turn it into a list return it.
#Even if it is already a list.
def _return(_iterable):
   result=[]
   for i in _iterable:
       print "RETURNING"
       result.append(i)
   return result
      
#Take an individual iterable and yield individual elements.
def _yield(_iterable):
   for i in _iterable:
       print "YIELDING"
       yield i
  
#Main.
if __name__=="__main__":
  
   #Create an interable.
   _string_list="A B C".split()
  
   #Display the serial results of returning.
   for i in _return(_string_list):
       print i   
  
   #Display the interleaving results of yielding.
   for i in _yield(_string_list):
       print i

Monday, September 28, 2009

User Authentication Design

Most systems today, in fact, any system today in which a user interacts with a system, will have some kind of user abstraction. Whether that abstraction is the incoming request for application data or a instantiated class that lives on the server, it nonetheless exists. More often than not, the application needs to know who this user is. It can then make decisions about what, if any, data this user can see or modify. This, of course, is authentication.

There are many different approaches taken to implement user authentication. Most web application frameworks have a built-in authentication system. If that authentication system is flexible enough, it will allow for an external authentication system to be used. This is often the route that is taken by any commercial application simply because systems that were designed to authenticate, often do it well. There is no need to reinvent the wheel. Another reason for doing this might be performance.

However, most simple applications, often web applications, need only simple authentication. By simple, I mean they don't need a production-ready authentication system that can simultaneously handle millions of users. This isn't necessary and would be a waste of time. In these scenarios, simple HTTP authentication will be enough to allow the application to behave as required.

Even simple authentication needs to be designed. There are many approaches that can be taken to implement the underlying authentication one of which is a self-authenticating user abstraction. The user abstraction is necessary no matter what and should always be present in any design. The self-authenticating approach means that the authentication activity is performed by the user abstraction itself, with no need to invoke a separate party. The structure of such an abstraction is illustrated below.

Once an application receives a request for authentication to happen, the user abstraction is instantiated. Once instantiated, this abstraction is then passed the necessary data in order to authenticate itself. The result of the authentication is then passed to the controller responsible for instantiating the user. This sequence is illustrated below.

There are obvious benefits and drawbacks to using such an approach. The drawback is that the user abstraction is instantiated regardless of what the authentication outcome is. This is because the authentication can't happen without a user instance. Should a user instance, even if only momentarily, exist if not authenticated? The benefit here is the self containment. There is no need for an external authentication system since the user is able to state to the system whether or not it is who it says it is. Of course, this may not even be a good thing. An authentication system may be a desired design element.

Friday, September 25, 2009

Granular Django Cache

Like many other web application frameworks, Django has a built-in caching system. Unlike other web application frameworks, the Django cache system is relatively straightforward to configure and use. Configuring the cache system can be as simple as specifying where the cached items are stored. With the Django cache system, developers have plenty of options. There is even a dummy cache storage that can be used for development purposes. Whichever back-end cache system you decide to use, it can be specified in the CACHE_BACKEND configuration value.

Once the cache storage location has been setup, caching can be implemented at any number of levels from per-site to low-level. The most effective way to implement Django cache, I find, is to implement it on a per-view basis. Using this method to implement cache means that cached items are created for each URL that is requested if the view mapped to the URL is cached. Using the lower level Django cache constructs are nearly impossible to manage for larger, more complex applications. They do exist, however, for niche situations.

The cache_page() function is responsible for creating a page cache. The function takes a view to be cached and a timeout as parameters. Once the timeout has expired, any cached items are no longer valid. Although the cache_page() function can be used as a decorator on the view declaration, it makes more sense to pass the view as a parameter to cache_page() within the URL configuration. This is the more portable way of doing things and is better aligned logically since the URL serves as the cache key, not the view name.

Developer Progress

Eric Spiegel has posted an entry over at IT management on why developers are let go. I figured I'd chime in and say that most of what he claims isn't entirely true.

You do not need to promote your code by bragging about it or by any other means. Does the code do what it is supposed to do? Great. Move on. The theme of course being to simply get stuff done. That is what developer progress is. Just continuously make progress whether that is fixing bugs or implementing a new feature. Time wasted bragging about stuff does no one any good.

Documenting everything you do is also a very good idea. But do it in a format that is a useful project artifact as opposed to promotional garbage. Just because you document stuff that you have done in a way that way that is meaningful to others doesn't mean you can't use that same documentation to defend yourself should the need arise. It probably wont if you continue to make progress.

Finally, if your not making progress by simply getting things done, you are not motivated. If you are motivated, you get things done and make progress. If a developer isn't motivated and is still able to get stuff done, who cares.

Thursday, September 24, 2009

Loose Coupling Decorators

Many programming languages offer the ability to decorate certain constructs such as functions or methods. What exactly is a decoration in this context? It is referred to as a decorator because the function or method declaration looks as though it is being decorated. The name is syntactically descriptive. So, having said this, what exactly is a decorator? In Python, a decorator is essentially a function that is used to take an existing function or method and return a transformed version of it. The @ symbol denotes a decorator in Python and is placed above the function or method definition.

So how does this help the developer? Why would they want to take a perfectly normal function definition and put some strange syntax around it? The main purpose for this being that it serves as a factory that can inject objects into the function from other name spaces. This is useful because it allows some developers to define decorators and others can define functions and methods that use them. This supports loose-coupling because the same decorators can be used for many functions. Additionally, a function may be decorated by many decorators.

The effect here is similar that of inheritance. However, these can be difficult to achieve sometimes with inheritance if a strict hierarchy isn't thought up from the onset. The polymorphic operations all need to be consistent with one another using this approach.

The inheritance approach to loose-coupling is probably a superior design to using decorators throughout an application. It is cleaner and provides more consistency. However, if this isn't the initial approach taken and there simply isn't enough time to design a resilient class hierarchy, the decorator approach is a good candidate to achieve loose-coupling.

Breaking Release Cycles

There exist today endless software methodologies, each of which, allow for differing policies on producing releases. Some methodologies allow for variable length between releases while some dictate a strict cycle length. The methodologies that follow a strict release time-line are most likely using an agile approach to software development. When using this approach, the release dates for the software are one of the, if not the most, important rules to stick to.

When using an agile approach to software development, there should be a fixed amount of time between releases. This may at first seem counter intuitive. How can a development methodology be agile if the time-line is of fixed length? Agile should support change and that is indeed one of the key benefits to using an agile approach. However, not everything can change. There has to be at least some formal rules that allow for some sort of organization. If there is nothing strict in place, the practice is flawed because nothing would ever get released. And it is the continuous releasing of software that provides the invaluable feedback that plays a huge role in the agile methodology. Take a look at some open source projects that have a six month release cycle. Sure, there is feedback from the user community, but they don't see how their feedback is reflected, which, in turn, generates more feedback and so on.

With a shorter release cycle, this feedback is reflected sooner. But what about when the agile approach is used for developing a solution for a paying customer. Does their request for some change or some feature mean that the release cycle length should be changed? In almost every case, I would say no it doesn't. They are paying you to develop software because you know how to do it and constantly pushing back release dates does not help anyone. The only exception to this rule is when it is a deal breaker. If holding up the release will prevent the loss of money, you simply do not have a choice. Otherwise, there is always a good reason one can give for not including a requested feature. This is especially easy to explain to customers when you always ship predictably and on time.

Wednesday, September 23, 2009

Skulpt

I just came across the Skulpt project, a Python interpreter written entirely in Javascript. How cool! Although the project is still in its' infancy, the project does look promising. Myself, as I'm sure others as well, have wanted a more Python like syntax for building highly-dynamic user interfaces. That isn't to say that it isn't possible today, it would simply increase productivity. Just like developing some application in Python would probably ten times as fast when compared to writing it in C.

Even if it proves infeasible to have a language within the browser language that is Javascript, this project could prove some interesting things about using Python within the browser. Nonetheless, this is all very interesting. There is a Python interpreter now written in both Java and Javascript.

Using Python Properties

The Python programming language is a loosely-typed language. What this essentially means is that any variable defined in a program can be assigned a value of any type. This also applies to attributes of classes. There is no need to specify the allowable types that a given attribute may hold. This offers the developer much flexibility when implementing an application. Even more flexibility may be added to a given class implementation by use of properties.

Python has a built-in property type that can be used to build complex attributes. By complex, I mean that the attributes can hold both data and behavior. This functionality can be made useful by imposing constraints on the attributes of a given instance because the behavior associated with the attribute is invoked when a value is set. This doesn't necessarily mean that the behavior is checking for the type of the value. That would defeat the purpose of a dynamically-typed language. It can, however, perform more complex testing such as checking the state of the value or making sure it falls within some range. The invoked behavior can also store and retrieve values from a non-standard location such as a database.

So why go to all this trouble? Why not just implement standard attributes and methods? That depends. The only reason to implement dynamic properties for Python instances is to provide a cleaner API for the client using the instances.

There are actually two methods in which to implement dynamic Python properties, both of which are illustrated below. The first, just overloads the __getattr__() and __setattr__() methods. These methods are invoked if the attribute requested does not exist in the standard location as a regular attribute. The benefit to this method is that these are the only methods that need to be implemented for attribute management. This means that any attributes can be set or retrieved on the instances. This can however lead to more work for these two methods because they are responsible for everything that might go wrong.

The second method, the property method, provides a better distribution of responsibilities. There is more work involved with the class implementation but is cleaner work that leads to a better client API.

#Example; Python properties.

#Simple person class.
class Person(object):

  #The data for the instance attributes.  This serves as
  #an example that these attributes can be stored elsewhere.
  data={"first_name":"", "last_name":""}

  #Set an attribute.
  def __setattr__(self, name, value):
      self.data[name]=value
    
  #Get an attribute.
  def __getattr__(self, name):
      return self.data[name]

#Simple person class.
class PropertyPerson(object):

  #The data for the instance attributes.  This serves as
  #an example that these attributes can be stored elsewhere.
  data={"first_name":"", "last_name":""}

  #Set the first name.
  def set_first_name(self, value):
      self.data["first_name"]=value
    
  #Set the last name.
  def set_last_name(self, value):
      self.data["last_name"]=value
    
  #Get the first name.
  def get_first_name(self):
      return self.data["first_name"]

  #Get the last name.
  def get_last_name(self):
      return self.data["last_name"]

  #Create the properties using the previously
  #defined instance methods.
  first_name=property(fset=set_first_name, fget=get_first_name)
  last_name=property(fset=set_last_name, fget=get_last_name)

#Main.
if __name__=="__main__":
  #Create the test person instances.
  person_obj=Person()
  pperson_obj=PropertyPerson()

  #Set some values without using properties.
  person_obj.first_name="FirstName"
  person_obj.last_name="LastName"

  #Set some values using properties.
  pperson_obj.first_name="FirstName"
  pperson_obj.last_name="LastName"

  #Display the attribute values of both instances.
  print "Non-Property: %s %s"%(person_obj.first_name, person_obj.last_name)
  print "\nProperty: %s %s"%(pperson_obj.first_name, pperson_obj.last_name)

Tuesday, September 22, 2009

Software Preservation

Grady Booch, over at the handbook of software architecture makes a good point of why preserving classic software systems for future generations is important. There is a storyline behind every system. Within this story lies an endless supply of rationale behind tricky technological problems. Of course, the rationale behind doing such and such with some software component would probably be worth something to some developer in the future. The how probably isn't as important, although it might be. Everything should be preserved as best as possible.

It might be difficult to say what might happen if the software of today isn't preserved for future generations. But what harm could be done if every meaningful software system were preserved for the future? There is probably a mountain of historical data that exists today that is of no particular use besides self-interest. At least it has no use yet. The only thing that is certain is that we'll never know if it proved to be worth-while if we don't do it today.

This got me thinking about software that isn't that old. Maybe a few years old. What if I as a developer worked on it but it was no longer of particular use to anyone else, would it be worth preserving? I think so. I've had countless times where I just thought of something I did to solve a similar problem in a older project. Trac really helps here. I just launched Trac and sure enough, I was able to find what I needed.

Simply HTML

Generating HTML markup in Python applications does not always mean resorting to rendering templates. Using templates to implement the views of an application on the web is a standard practice because it helps to remove the presentation layer. That being said, there is nothing to stop a standard Python module from generating markup while only being concerned with the presentation, not the application logic.

The html Python package provides a very simplistic method in which to generate HTML pragmatically. There is nothing wrong with using this approach as long as the separation of concerns principal is obeyed. This Python package allows developers to generate markup with a single HTML class. This class isn't meant to act as a singleton, but, rather, as a logically-separated string. There is nothing preventing developers from combining HTML instances to provide the final rendering.

The two main features of these HTML instances are that they use simple element methods to construct elements. This adds a nice self-containment to the document while it is under construction. The HTML instances also support the with statement. This provides a context for building child elements as is illustrated below.

#Example; Using the html package.

#Get the HTML class.
from html import HTML

#Main.
if __name__=="__main__":
   #Instantiate an HTML document.
   html_obj=HTML()
  
   #Construct a HTML element and provide a context.
   with html_obj.html():
      
       #Construct a body element with a class, 
       #and provide a context.
       with html_obj.body(klass="my-body"):
          
           #Construct some body elements.
           html_obj.h1("Testing 123", klass="my-header")
           html_obj.p("Hello World!", klass="my-text")
  
   #Display the markup.
   print html_obj

Monday, September 21, 2009

Productive Developers

In an interesting entry over at agile focus, they mention the developer productivity question. They also discuss why defining developer productivity is such a hard thing to do. The question of whether or not productivity can or cannot be accurately measured is still unanswered as far as I'm concerned.

Developers have a goal to reach on any given project that they happen to be working on at any given time. Or, at least they should have a goal. If they don't have an overall idea of what exactly it is that they are building, then there are bigger problems elsewhere and the company need not concern itself with measuring developer productivity. As stated in the entry, if the developer does have a goal to reach, any forward motion made toward that goal can be considered progress. Again, nearly impossible to measure.

This of course begs the question, should developer productivity even be measured at all? Or should management be able to answer yes or no to the question of whether or not a developer is productive? That is, making progress vs. not making progress. I think this is a valid approach to measuring developer productivity if measurements are taken relative to the expected skill of the developer.

Tomboy Wiki

Taking notes is an important activity for any professional that wants to get anything done. Whether you are jotting down notes for a blog entry or highlighting some aspect of a software component under development, using the right tool usually helps. The tomboy notes application is a nice little note-taking utility for Gnome. The power of this application lies in its' simplicity. It doesn't strive to be the next killer work processor disguised as something it isn't. It is designed for taking notes with a few additional features related to note-taking.

The notes that one may create with tomboy notes aren't limited to plain-text. But, keeping with the philosophy of simplicity, this feature set is minimal. You can bold, italicize, highlight, and create lists. Those are the most basic text-formatting features available and are realistically you you could ever need for taking notes. The focus of note-taking is on content, not form, as tomboy notes clearly understands this.

The wiki aspect of tomboy notes comes from its' ability to create links to other notes within the same system. I don't really consider these text links to other notes as part of the text-formatting feature set. It is really part of the wiki feature set and that set length is one. The reason these links have a wiki feel to them is because users have the option to automatically highlight works that are in camel case. Clicking these links will create a new note while retaining the link, just like a wiki.

Sunday, September 20, 2009

HTTP Response Schema

The HTTP protocol is by far the most prominent protocol on the web today. Most people browsing the web aren't even aware of its existence. They just see a strange "http://" string in front of where they want to go. The great part is that most users of the web don't need to know what HTTP is or how it works. The browser and the web server have a firm understanding of how to use the protocol.

Web clients, including the browser need to understand how to handle HTTP responses. Contained within each HTTP response is a numeric code indicating the status of the request. Although HTTP is a stateless protocol, the client needs to know what happened with each request made to the server. One may consider the response code to be a meta state. The response code can still support statelessness on the server.

Each HTTP response code has a distinct explanation for why the requested responded the way it did. The client can also interpret a more general category, or nature of the response because of the ranges in which each specific code falls under. Here is a brief idea of what the ranges mean:

• 200 - Success.
• 300 - Client action required.
• 400 - Client error.
• 500 - Server error.

In an object-oriented design, these ranges of HTTP response codes can be broadly generalized. This helps to conceptualize how the client can handle several specific response codes in the same way as is illustrated below.

Saturday, September 19, 2009

Open Source Accounting

An entry over at linux magazine discusses some of the financial side of the Gnome and KDE open source projects and how both organizations might be spending their money. It, for myself, is interesting to think about because not very often is the financial side of big open source projects considered. I say big open source projects because the smaller projects generally don't have a financial side to them. In fact, many large open source projects do not have a financial backing. Does this mean they are of lower quality than the projects who put money where their mouth is? I think not. They are just different. Besides, for most end users of most open source projects, the financial situation of the project is of little to no importance. The software is free.

Another notable aspect of Gnome and KDE specifying where their money comes from is that most people don't care who the big corporate sponsors of the project are. And, even the ones who do care, the acquisition of such insight is unlikely to influence them positively toward the project. Big open source projects should advertise the fact that the smaller contributions of users are what matter. They are more influential toward other users making a similar small donation. I think both Gnome and KDE do a good job of this.

Injecting Gaphor Services

The Gaphor UML modeling tool is more than just a simple diagram editor. It is also a powerful application architecture, capable of being used outside of the UML context. What makes it so powerful is that the elements that make up Gaphor are clearly separated into services, components, and adapters. The Gaphor architecture is important in how it glues all these pieces together.

Gaphor is built on top of several services that support the application as a whole. There are many required services that are distributed along with Gaphor. For example, there is an element factory service which is responsible for creating and managing the various UML elements. These services are then loaded during the Gaphor startup process by iterating through entry points and instantiating the service classes. This, in fact, is how other Python packages would define Gaphor services.

These services also work the other way around; existing services can be used by other Python applications instead of defining the service for Gaphor to use. The services can be used by other applications by invoking the Gaphor injection mechanism. There are a few precautions to take when using this mechanism and they are illustrated in the example below.

#Example; Gaphor service injection.

#Do the Gaphor imports.
from gaphor.core import Application, inject
from gaphor.UML import Class

#A class is required in order to store the injected service
#as a class attribute.
class Gaphor(object):
    
    #Inject the element factory service.
    element_factory = inject('element_factory')
    
    #Initialize the application which will,
    #in turn, initialize the services.
    def __init__(self):
        Application.init()

#Main
if __name__=="__main__":
    
    #This will not work.  It serves as an illustration as to
    #why a class attribute is necessary.
    try:
        print inject('element_factory').create(Class)
    except:
        pass
    
    #This will work.  First, create and initialize the
    #Gaphor application singleton.  Next, interact with
    #the element factory service.
    app=Gaphor()
    print app.element_factory.create(Class)

Friday, September 18, 2009

Sketching UML

The UML is a largely graphical modeling language used to communicate ideas in software design. The communication channel may only be between the designer and himself. It is often beneficial to build diagrams for yourself. Even in doing so, you are still communicating the ideas. Today, there exist countless UML diagramming tools in which each diagram is created on the computer screen using a mouse. If enough effort is put into using these software modeling tools, the finished product that is the diagram often looks very visually appealing. Perhaps too much so. This can especially be the case if the diagram created is meant to serve as an aid to an initial idea that may or may not be implemented as illustrated in the diagram.

Since the UML is simply a modeling notation in addition to the underlying semantics, UML can also be sketched using a pencil and paper. Using this medium for UML diagram creation can help to increase the creativity of the design. It is also done so in a controlled way since a common notation, the UML is used. Just because a common modeling notation is used in a sketch of some software system does not mean that it is a finished product. Far from it. All it means is that it is simply an idea that is being externalized and that there is still room for interpretation in the model. This is exactly what is desired in the early stages of design. Even if the implementation has already started.

The main benefit to doing UML sketches for diagrams is that several layers between the brain and the canvas are removed. There is a certain mechanical appeal to putting pencil to paper and I think this helps with the design rather than hinder it. The act of sketching is done largely by the software when drawing with UML modeling applications. Sometimes imperfect lines and arcs add to the aesthetics of a design.

These sketches are obviously not ideal for a future reference to use once the system has moved further along in its life cycle. For those types of diagrams, the various UML diagramming software is ideally suited. The goods news is that transitioning a form of a sketch into a digital version isn't too difficult and is even easier when a standard such as the UML is used.

Django Ajax Response

The Django Python web application framework is capable of many types of data serialization. Be it, XML, or JSON, The built-in Django serialize() function can handle it. The transformation typically starts with a Python dictionary or list but can also handle instances of user-defined types. The end result is always a string. The string is of course desired because that is what will be passed along inside the HTTP response. It is nice to have this functionality, but what is it used for? Why not just use standard templates with template variables and let the view render it for the client? The main reason is that more and more non-browser clients are being used with web applications. Even if the client is a web browser, there is a good chance that the request is coming from an ajax application and they don't always like HTML responses. Most prefer the JSON format.

There is, however, a good chance that more than one format is going to be necessary for the same data set. For instance, I might have a standard Python dictionary that I want to return to the client. Depending on who the client is, that same data is going to be rendered differently. This is so that the client can understand the response. Say, for instance, that the client was a javascript application. This client could, potentially, have the ability to selectively handle different response formats that server returns to it. This responsibility shouldn't really be left to the client application. It would be nice if the Django application could automatically determine if a javascript application is requesting the data instead of a standard web request. Thankfully, Django can do this without much developer intervention.

Django HTTP request instances can determine if the request came from an ajax application. Django does this under the hood by examining the HTTP headers. As is illustrated by the following example, simple scenarios like this one can be implemented with a single controller.

#Example; Django Ajax response.

#Import Django components.
from django.http import HttpResponse
from django.template import Context, Template
from django.core.serializers import serialize

#Initialize the test data object.
data_obj={"first_name":"First Name", "last_name":"Last Name"}

#The test view.
def index(request):
    #Check if this is an ajax request.
    if request.is_ajax():

        #Set the result to serialized JSON data.
        result=serialize("json", data_obj)

    else:
        #Create a context object from the test data.
        context_obj=Context(data_obj)
        
        #Create a test template string.
        template_str="""<b>first_name</b>: {{first_name}}<br/>
                        <b>last_name</b>:  {{last_name}}"""
                        
        #Set the result to the rendered HTML.
        result=Template(template_str).render(context_obj)

    #Return the response.
    return HttpResponse(result)

Syntax Highlighting

A lot of code these days in a variety of languages is posted on the web. I myself like to post small snippets of code every now and then. The pre HTML element was intended for just such a purpose. The content of these elements looks like code. Or, something that is code-like. However, the pre element poses a simple drawback; the code is still difficult to read in most cases. That's not to say that placing programming language code in a pre as opposed to somewhere mangled within the plain text is a bad thing, it is simply a matter of interpreting the code faster. Trying to read code that has not been highlighted to reflect the language in which the code was written is a challenge to say the least. With every programming language, there are several key tokens that carry more meaning than others when read by a human. For instance, in most object-oriented languages, the class keyword is going to stand out. Except, with code that is displayed on the web, it doesn't. At least not at first if the code isn't syntax-highlighted.

Spoken languages such as English don't really need the assistance of a syntax highlighter to emphasize various token types because humans either understand it, or they don't. There is quite a difference between what a computer is able to interpret and what a human is able to interpret. Hence the rigiditety of programming languages in comparison to spoken languages.

With programming language code that is displayed on the web, there is a simple need for that code to be displayed as if it were being displayed in an editor that supports syntax-highlighting. Even though the code isn't editable, that isn't the reason syntax-highlighting is built into code editors. It exists for reading purposes. It would probably be more productive to copy the code from the web and paste it into an editor that will highlight the syntax than it would be to try to read the code that is not highlighted.

This is where the Pygments Python package comes into play. Pygments is a very feature-rich syntax-highlighting application and API. Trac, which depends heavily on the need to display code on the web, is a perfect use case for Pygments and its' API. The application part of Pygments is a simple command line utility which is perfect for highlighting smaller snippets of code.

Thursday, September 17, 2009

Python Components

There are probably an endless number of definitions of what constitutes a Python component. The question I have is what is the correct definition or is there a correct definition for a Python component? It seems to me that some things lean more toward being the preferred form of a Python component while others build on this concept and others still are radically different than the vanilla component.

Of course, figuring out what a component is exactly might be a good start. Using the most general idea of what a component is and what a component is not would help us to translate these properties over to the Python world. I think in the most general sense, a component is any replaceable piece of any software system. So, if a component can be pulled out of some system and replaced with an identical component that can oblige to the original interfaces. If a new component cannot do this transparently without causing the system to fail, it isn't a component. It may be considered a component once it has this described property, but until then, it isn't.

Having describe what a component is at the most basic, generic level, how do we decompose Python systems in the same way? We want to take a piece of a given system written in Python, and replace it with another piece. Obviously it needs to conform to the required and provided interfaces to the slot it wishes to fill. But aside from that, what can physically be considered a Python component. At the most fundamental level, most developers would probably consider the module a valid candidate for a Python component. A module, in Python is basically how source code is organized. Well, it is in fact a source code file that supports the modularity concept, hence the name.

The egg is another candidate for a standard Python component. Eggs are the standard method in which to distribute Python pages. In fact, eggs are Python packages. They typically contain multiple Python modules. So are eggs just another type of Python component but at a higher level than modules are? That is tough to say because eggs can be treated as if they were Python modules once they have been deployed on a given system.

The most compelling feature of using eggs, besides the ease of installation, are the entry points feature. Entry points of Python eggs offer services to other eggs installed on the system. Eggs can advertise these services for free. There is no intervening necessary on the developers' behalf. The entry points provided by eggs are also a good candidate for what can be considered a Python component simply because of the enhanced feature set that they offer.

Python Benchmarks

With new-style Python classes, one can help cut down on the memory cost associated with creating new instances. This is down by using the __slots__ class attribute. When the __slots__ attribute is used in the declaration of new-style classes, the __dict__ attribute is not created with instances of the class. The creation of the __dict__ attribute can be expensive in terms of memory when creating large numbers of instances. So, the questions that begs itself is why not use the __slots__ attribute for all classes throughout an application? The simple answer is, because of the lack of flexibility offered by attributes that live inside memory that has been pre-allocated by a slot. The other problem with using the __slots__ attribute for every class is the burden involved with maintaining all the slots, in all the classes. These could be in the several hundred range or more. So, this simply isn't feasible.

What is feasible, however, is to define __slots__ attributes for smaller classes with few attributes. Another factor to consider is instantiation density of these classes. That is, the __slots__ attribute is more beneficial with large numbers of instances because of the net memory savings involved. Consider the following example.

#Example; Using __slots__

import timeit

#A simple person class that defines slots.
class SlottedPerson(object):
  __slots__=("first_name", "last_name")

  def __init__(self, first_name="", last_name=""):
      self.first_name=first_name
      self.last_name=last_name
    
#A simple person class without slots.
class Person(object):
  def __init__(self, first_name="", last_name=""):
      self.first_name=first_name
      self.last_name=last_name

#Simple test for the slotted instances.
def time_slotted():
  person_obj=SlottedPerson(first_name="First Name", last_name="Last Name")
  first_name=person_obj.first_name
  last_name=person_obj.last_name

#Simple test for the non-slotted instances.
def time_non_slotted():
  person_obj=Person(first_name="First Name", last_name="Last Name")
  first_name=person_obj.first_name
  last_name=person_obj.last_name

#Main
if __name__=="__main__":
  #Initialize the timers.
  slotted_timer=timeit.Timer("time_slotted()",\
                             "from __main__ import time_slotted")
  non_slotted_timer=timeit.Timer("time_non_slotted()",\
                                 "from __main__ import time_non_slotted")

  #Display the results.
  print "SLOTTED    ",slotted_timer.timeit()
  print "NON-SLOTTED",non_slotted_timer.timeit()

In this example, we have two very simple classes. The SlottedPerson and the Person classes are identical except for the fact that the SlottedPerson class will always outperform Person. This is because there are always going to be performance gains when the interpreter doesn't need to allocate memory.

Wednesday, September 16, 2009

The Undesigned

In an interesting entry over at agile focus, we are given an idea of what the myth of the undesigned is all about. Well, in this context, it is all about undesigned software, of course. What this entry stresses is the fact that the act of software development is nothing but design and I would have to agree here. The main argument is that the philosophy of adding design to already-built software is fundamentally flawed. I would also agree here. This does however raise several questions as to what counts as already designed software (if you don't subscribe to the notion that all software is designed). For instance, in the context of implementation design, the actual code itself, it is very hard to add design to.

This, I think is what the author is stressing. An example of adding design to code might be cleaning up code that was previously sloppy. But is this really design that is being added to the code or simple rearrangement? I suppose some constraints must be imposed on this sort of cleaning up. This would simply be to ensure that code design doesn't take place while "cleaning up".

Not designed is indeed a bad design but designing nothing but code is also a bad design. Implementation is one thing but it is always best to keep the important, platform-independent, design out of the code and in a model of some form or another.

Pyro URIs

Pyro is a distributed object framework written entirely in Python. The Pyro massively simplifies the task of distributing Python instances across a network. The notion of an object proxy is employed by the Python system. An object proxy in Pyro is indistinguishable from a standard Python instance, hence the term proxy. But nodes within the network must be able create these proxy instances somehow. Additionally, in order to create these proxy instances, the node needs to know about the state of the original, canonical instance. Identifying the original Python instance within a deployed Pyro system is straightforward with URIs. The URI concept is the sole responsibility of the name server in the deployed Pyro system.

The good news is that the object URIs used in the highly distributed Pyro system have a close parallel to the URIs used in a traditional HTTP resource oriented architecture. The translation of the HTTP URI to a Pyro URI wouldn't be overly difficult to achieve. Both systems share properties of addressability. Since the name server does the work of locating the object within a set of distributed nodes, this would be a good candidate setup for a small scale distributed API. ho knows, maybe something like this could be built in the large scale. I wouldn't count on it until it is proven as functional.

Tuesday, September 15, 2009

Themable UML

The unified modeling language, UML, is a modeling notation used for visualizing the design of software systems. Since it is used to visualize the system in question, the UML can be considered to be largely graphical by nature. But the UML specification only provides a base for the notation of each modeling element in addition to the underlying semantics of the language. What the specification doesn't say is anything about the overall look and feel of a finished diagram such as a class or a sequence diagram.

Most UML tools allow users to alter the color of certain aspects of certain model elements, like the fill color or the border color. This color value, for instance, can be set as the default for all new class elements that are placed in the diagram. Tools such as this become useful for emphasizing certain modeling elements in a particular diagram. Or to group certain elements. One may argue that the UML provides grouping elements already such as package elements. The package element is only a single dimension in the organization of a model.

A very useful feature of a UML modeling tool would be a theme selector. This would, of course, offer themable UML. But in the context of the UML, what exactly constitutes a theme? Would it just simply be the feature mentioned above that gives the modeler the ability to change the color of certain elements for emphasizing purposes? I would think not. A themable UML diagram would probably be more along the lines of a color scheme of the various UML modeling elements. In addition to the color scheme, subtle element shape variations could be offered by the theme. The idea behind the theme is that there is no need for the modeler to choose appropriate colors that work. The theme just makes the diagram look good.

This would be a good use case for implementing a UML profile. Since the profile can add visual distinctions to the elements in which stereotypes of the profile are applied, this fits the requirements.

With this feature enabled, some more advanced UML diagram output would be required. For instance, HTML output could be used while the various theme distinctions are defined in CSS. This way, a CSS theme framework, similar to that found in jQueryUI could be used.

Moving Away From Open Source

This entry over at IT world discusses some of the forces behind users that fully intend on using an open source application and make the jump over to the proprietary software world instead. It seems counter-intuitive that any user would will give up something that is free just to pay for it. But as the entry states, there are many reasons users do this. Open source applications tend to have a lack of support, lack of features, and lack of documentation. Oddly enough, there are some open source applications that share the same qualities as their proprietary counterparts. They both have a great feature set, documentation, etc. But like all software, the subtle differences can simply make one application better than the other.

The biggest stumbling block is installation and initial configuration of open source applications as far as I'm concerned. Proprietary installation and configuration procedures are generally a more enjoyable procedure. This is the first step to using any application and is important to get right because it gives the user an impression of what the rest of the application experience is going to be like was it actually gets installed if ever.

Sunday, September 13, 2009

Navigating Trac Tickets

The Trac project management system allows users of the system to perform queries for tickets. In Trac, tickets are entities that help keep track of outstanding tasks, bug fixes, or basically anything the needs to be accomplished for a given software project. Trac default set of useful ticket queries that users can easily execute. The default set of ticket queries allow interested parties to common groups of tickets such as those belonging to a particular developer.

When viewing a ticket, Trac displays a set of links in the top right of the screen that allow the user to view the previous ticket or the next ticket. There is also a link to return to the the query results. The next ticket and previous ticket links are useful because tickets can be navigated without the need to return to the query. However, the ticket order is based on the ticket number. This presents a problem if the user is only interested in navigating back and forth between tickets in the current query result set. The next ticket can always be completely irrelevant to the tickets the user is interested in.

The really useful aspect of the link back to the query results is how the presentation of the query results can change. Tickets that have been edited while you were viewing a specific ticket, change appearance in the query results once the user returns to the query results. You will also notice that the current user is part of the URL back to the query results.

This feature is incredibly useful in a high paced development environment with many developers. It also helps if a small group of developers are collaborating on a small group of related tickets because changes to tickets by other people become immediately apparent.

Saturday, September 12, 2009

Corporate Coding

In an interesting entry over at coding horror, we hear about the idea of "happy talk" and the general corporate stench that accompanies it. Happy talk is the type of corporate language employed that masks anything realistic that may actually going on. So why is this necessary? Obviously not everything that goes on at a company needs to be portrayed literally and it never is. The stakeholders need to be happy. Hence the term. If the company isn't placed in an overall "happy" light, all isn't well and people become upset. The news is, all is not well and never will be. Especially in software development, things just simply do not go as planned.

So what kind of effect does this "happy" corporate culture have on the software development effort of a given organization? Does the this bogus lingo spill over to the software development culture? Not always, at least not on the same scale. But sometimes it does.

So take some developer working in a large company for instance. This developer is in change of developing and maintaining some component in a complex system. Any developer is going to experience the stresses involved with deadlines. The stress arises because they need to make compromises that lead to not so high quality code, that is potentially buggy. But if he knows this component looks good on the outside and works "well enough", no harm done right? He'll just continue collaborating with the other developers as if all is well. This in itself would not be very damaging but what if scenarios like this are continually repeated?

It would be nice if all environments were like some smaller shops and developers could fearlessly admit to flawed code. The company in turn would in turn say to the concerned parties something close to what is actually happening. Doing otherwise sets the stage for upset customers down the road when things really go wrong.

Friday, September 11, 2009

Django Content Files

The Django web application framework written in Python defines a file abstraction used for working with files within the file system. Web applications often have to deal with many very different file formats. These aren't just static files that get served to the client using the system, they are also used to parse and retrieve useful file metadata. An example of this useful metadata would be the image dimensions of an image file. There is other useful file metadata that the clients of the system may not necessarily be concerned with although the system may be.

The File class is the base Django file abstraction. Instances of this class extend the concept of file-like objects in regular Python applications. This abstraction comes in handy when iterating through file contents. Django as a certain iteration style used throughout the framework and this abstraction helps maintain it. The File class is also helpful when dealing with images; it is the base class of the ImageFile class.

Code in Django applications can remain consistent with the file abstraction even when using regular content. Similar to how StringIO works. The ContentFile class inherits from the File class and ContentFile instances and behave just like File instances do. The main difference of course being that the ContentFile only uses raw data instead of data that lives on the file system. This gives Django a huge interface consistency boost when dealing with string data.

Open Source Economy

An entry over at linux insider talks about the current boom that the open source market is experiencing despite the global recession. Even in difficult financial times, the open source market continues to display positive numbers. Is open source absolutely bullet-proof no matter what the global economic state is? Absolutely not. Large companies are not going to use a product or service simply because it is free. It needs to solve a real world problem and it needs to do it well.

As stated in the entry, the fact that open source software is free isn't the only determinant driving the open source market. Could this recession act as a gateway for those of us who are still timid in the face of open source? Absolutely yes. And it will only perpetuate further as more and more companies become aware of others thriving on open source.

One question that is difficult to answer is will it last? One would like to hope so although such a question would be nearly impossible to answer at this point. If one thing is clear, it is that this recession could be an important point in the open source movement's history.

Thursday, September 10, 2009

Python Libvirt Example

The libvirt virtualization library is a programming API used to manage virtual machines with a variety of hypervisors. There are several language bindings available for the libvirt library including Python. Within a given Python application that uses the libvirt library, the application can potentially control every virtual machine running on the host if used correctly. Libvirt also has the ability to assume control of remote hypervisors.

Virtual machines, or guest domains, have primary disks and potentially secondary disks attached to them. These block devices and even be added to a running virtual machine. But just like a physical host, it helps to know exactly how the virtual block devices for a given virtual machine are being utilized. This way, potential problems may be addressed before they occur. Libvirt provides the ability retrieve such statistics for these devices. Here is a Python example of how to do this.

#Example; Libvirt block stats.

#We need libvirt and ElementTree.
import libvirt
from xml.etree import ElementTree

#Function to return a list of block devices used.
def get_target_devices(dom):
   #Create a XML tree from the domain XML description.
   tree=ElementTree.fromstring(dom.XMLDesc(0))
  
   #The list of block device names.
   devices=[]
  
   #Iterate through all disk target elements of the domain.
   for target in tree.findall("devices/disk/target"):
       #Get the device name.
       dev=target.get("dev")
      
       #Check if we have already found the device name for this domain.
       if not dev in devices:
           devices.append(dev)
          
   #Completed device name list.
   return devices

if __name__=="__main__":
   #Connect to some hypervisor.
   conn=libvirt.open("qemu:///system")
  
   #Iterate through all available domains.
   for id in conn.listDomainsID():
       #Initialize the domain object.
       dom=conn.lookupByID(id)
      
       #Initialize our block stat counters.
       rreq=0
       rbytes=0
       wreq=0
       wbytes=0
      
       #Iterate through each device name used by this domain.
       for dev in get_target_devices(dom):
           #Retrieve the block stats for this device used by this domain.
           stats=dom.blockStats(dev)
          
           #Update the block stat counters
           rreq+=stats[0]
           rbytes+=stats[1]
           wreq+=stats[2]
           wbytes+=stats[3]
          
       #display the results for this domain.
       print "\n%s Block Stats"%(dom.UUIDString())
       print "Read Requests:  %s"%(rreq)
       print "Read Bytes:     %s"%(rbytes)
       print "Write Requests: %s"%(wreq)
       print "Written Bytes:  %s"%(wbytes)

Subscribe to: Posts ( Atom )