Tuesday, June 30, 2009

Release Early, Release Often, Break The Interface

Release early, release often. The credo of open source software development in most cases. This didn't become the case simply because open source developers enjoy the task of doing project releases. In fact, this task is often burdensome, especially for projects with many components. This open source development philosophy became so common because it works. The idea behind releasing early and releasing often is to generate early feedback from the user community. This feedback is invaluable to developers and user interface designers alike. But why does it matter if the feedback is early? Does the feedback lose something if it comes much later after the software has been released? I think it certainly does. Although feedback regarding any open source project is valuable at any time during the development life cycle, the earlier this feedback is received, the lesser the likelihood of the concerns being pushed back. That is, if a given open source software project is released one week and users of this software express their concerns the following week, it is likely that these concerns will be addressed in the following release.

This entry raises an interesting usability problem with open source software projects. That is, users of sloppy, or, disorganized user interfaces get used to it. They simply accept the fact that this is the way the user interface is and is unlikely to change. The entry is slightly dated, although, unfortunately, the problem persists in many modern day open source software projects. So the glaringly obvious question is, how is something like this fixed? Aren't there countless usability experts out there that can design beautiful user interfaces? Surely some of these experts are willing to work on some of these open source projects that lack in the usability arena? The answer would be yes, there exists extraordinary usability talent in the open source community but as the entry suggests, there are different pressures face in open source projects than in the proprietary counterparts.

So what do these projects do with the user interfaces that lack in usability? They leave it up to the end user to figure out how they want the user interface to look by adding an endless stream of user preferences. These preferences allow the users to turn off undesired behavior. I say this is an endless stream because once one user interface preference is added, it is hard not to add more preferences of other user interface components. Additionally, this leads to a recursive user interface design problem because of the need to maintain these preferences. Again, as the entry suggests, this can become quite the mess to maintain because users of the software can quite easily grow accustomed to these "patch" type settings.

So what is the solution to this problem? Unfortunately, I do not have a solid answer as I'm no usability expert. However, reducing the number of moving parts on any user interface is always a good idea. I find the "less is more" principle invaluable in regards to user interface design. When it comes to user interface preferences that have been put in place for the end user to customize the "look and feel", they are most likely to hide components that clutter the user interface. Developers shouldn't take the decision to add new preference or configuration values lightly. Once added, new configuration options should be there to stay. When adding a new preference that allows end users to hide a given user interface component, think long and hard about the necessity of the component in question.

Monday, June 29, 2009

Python CPUs

When building Python applications with any level of concurrency, it is often useful to know how many CPUs are available for instruction processing within the system. It is useful for applications to have this information because decisions of how to best utilize the system for concurrency can be made at runtime using this CPU information.

In Python, a common way to retrieve the number of CPUs on a given system is to retrieve the number of CPUs from system configuration values as shown in this entry. Here is the actual function taken from the entry:

def detectCPUs():
"""
 Detects the number of CPUs on a system. Cribbed from pp.
 """
# Linux, Unix and MacOS:
if hasattr(os, "sysconf"):
   if os.sysconf_names.has_key("SC_NPROCESSORS_ONLN"):
       # Linux & Unix:
       ncpus = os.sysconf("SC_NPROCESSORS_ONLN")
       if isinstance(ncpus, int) and ncpus > 0:
           return ncpus
   else: # OSX:
       return int(os.popen2("sysctl -n hw.ncpu")[1].read())
# Windows:
if os.environ.has_key("NUMBER_OF_PROCESSORS"):
       ncpus = int(os.environ["NUMBER_OF_PROCESSORS"]);
       if ncpus > 0:
           return ncpus
return 1 # Default

Using this detectCPUs() function, developers can retrieve the number of CPUs available on any major platform. This is done by checking what is available in the system configuration and using that to determine if and where the number of CPUs is stored.

There is one very basic problem to using this approach in that it introduces new code at the application level that is already implemented at the Python language level in the multiprocessing module. There is a much simpler and more elegant method of retrieving the number of CPUs available on the system.

import multiprocessing

cpus=multiprocessing.cpu_count()

This method of retrieving the CPU count is superior as far as the separation of concerns principle. The multiprocessing module is concerned with CPU matters such as how many of them exist. Your application is also concerned with this information, obviously, otherwise it wouldn't be using multiprocessing to begin with. However, it is only the return value your application is concerned with, not the implementation of how the number of CPUs is retrieved. For older versions of Python, since multiprocessing was only introduced in Python 2.6, there exists a backport on pypi. This simply means that your application setup needs to depend on this package in order to support older Python versions.

A use case for using the number of CPUs available on a system within the context of a Python application would be maximizing the concurrency efficiency. The multiprocessing and threading modules share the same interfaces for most abstractions. This means that the application could make a decision at runtime on which module is best suited for the job based on the number of available CPUs. If there is a single CPU on the system in question, the threading module might be better suited. There there are multiple processors available, then multiprocessing might be better suited.

Monday, June 22, 2009

Turning Off Desktop Innovation

An interesting entry brings up the always controversial discussion of innovation in the open source desktop domain. I'm not entirely convinced that this topic should be nearly as controversial as it seams to be. And who knows, maybe it isn't. Putting the desktop operating system environment aside for a moment, innovation in software as a whole is hard. It is also a requirement of doing software development. Do nothing, and nothing will happen. If there were no innovation in desktop computing environments, in open source Linux distributions specifically, the end users would be stuck in the same situation. However, as the entry asks this very question, perhaps the users are stuck where they are for a reason. Maybe they have zero need to innovation that would serve their particular purpose. They use what they are using because it helps them reach their ultimate goal. Sometimes with innovative software, users are presented with features they didn't no they needed until they became available. This, not always, but often enough, translates to they don't really need it at all. However, users aren't going to be able to use the same piece of software indefinitely in the majority of cases. So, it seems that the logical thing to strive for here is a balance between stability and new features (innovation).

When attempting to strike a balance between stability and new features, developers are faced with an additional challenge. Toward the tail end of this entry, the option of turning these new innovative features off entirely is mentioned. I think an important characteristic to think about when considering new innovative features. Think about it. You ship your existing stable features along with the brand new innovative stuff. If something blows up in the new feature set, the user simply turns it off. Simply of course not being quite accurate. This ability to turn features on and off is no easy feat. Consider the notion of extension modules. The whole idea behind them is that they extend some piece of core functionality. They can also be turned off. However, this is generally done with configuration files that a typical desktop end-user should never be expected to interface with. So, there is the the technical aspect of modularity of features.

Assuming there were a robust, modular desktop architecture that allowed developers to turn features on and off, how would the desktop compel the user to use the new "better" features? Do the new features default to "on"? There is the whole usability question in addition to a very challenging technical problem.

Monday, June 15, 2009

Combining Multiprocessing And Threading

In Python, there are two ways to achieve concurrency within a given application; multiprocessing and threading. Concurrency, whether in a Python application, or an application written in another language, often coincides with events taking place. These events can be written directly in code much more effectively when using an event framework. The basic need that the developer using this framework has is the ability to publish events. In turn, things happen in response to those events. Now, what the developer most likely isn't concerned with is the concurrency semantics involved with these event handlers. The circuits Python event framework will take care of this for the developer. What is interesting is how the framework manages the concurrency method used; multiprocessing or threading.

With the multiprocessing approach, a new system process is created for each logical thread of control. This is beneficial on systems with more than one processor because the Python global interpreter lock isn't a concern. This gives the application potential to achieve true concurrency. With the threading approach, a new system thread, otherwise known as a lightweight process is created for each logical thread of control. Applications using this approach means that the Python global interpreter lock is a factor. On systems with more than one processor, true concurrency is not possible within the application itself. The good news is that both approaches can potentially be used inside a given application. There are two independent Python modules that exist for each method. The abstractions inside of each of these modules share nearly identical interfaces.

The circuits Python event framework uses an approach that will use either the multiprocessing module or the threading module. The circuits framework will attempt to use the multiprocessing module method to concurrency in preference to the threading module. The approach to importing the required modules and defining the concurrency abstraction is illustrated below.

As you can see, the core Process abstraction within circuits is declared based on what modules exist on the system. If multiprocessing is available, it is used. Otherwise, the threading module is used. The only downfall to this approach is that as long as the multiprocessing module is available, threads cannot be used. Threads may be preferable to processes in certain situations.

Thursday, June 11, 2009

Fortran And Physics

An interesting entry got me thinking about hard science and low-level programming languages such as C and Fortran. Do the two fields really mesh well together? As the entry suggests, they do not. Initially, back when Fortran was in it's prime, it would make sense for Physicists to use a language such as Fortran to compute astonishingly complex mathematics very fast. But back then, there really any alternative. This clearly isn't the case today.

It seems that undergraduates in the hard sciences are still taught the Fortran programming language in an introductory programming course. Computer science students are also taught Fortran. I think the latter may make sense and the former not so much. In the end, it really comes down to how the student will benefit in the future. At least in the world of academia.

For computer science students to have a thorough understanding of low-level programming concepts is beneficial. And, there is only one way this insight is gained. Through the painstaking process of learning a language such as Fortran inside and out.

Physicists, however, may not share the same benefit as would computer scientists years after having learned such a language. I think this makes sense. Why would physicists want to concern themselves with low-level software details when the low-level math problem at hand is probably much more interesting to them. Would a software developer be overly pleased to hear that he has to write a web browser from scratch in order to build a web application? Not likely.

I think that programming languages are sophisticated enough these days to allow for the scientists in other fields to focus on what matters to them. Leave the low-level software to the computer scientists. Languages such as Python are there for physicists to use as a result of computer scientists taking care of the low-level details. All parties are much happier in this scenario.

Wednesday, June 10, 2009

Sending Django Dispatch Signals

In any given software system, there exist events that take place. Without events, the system would in fact not be a system at all. Instead, we would have nothing more than a schema. In addition to events taking place, there are often, but not always, responses to those events. Events can be thought of abstractly or modeled explicitly. For instance, the method invocation "obj.do_something()" could be considered an invocation event or a "do something" event. This would be an abstract way of thinking about events in an object oriented system. Developers may not even think of a method invocation as an event taking place. However, the abstraction is there if needed. A method invocation is an event when it needs to be because it has a location in both space and time. Events can also be modeled explicitly in code. This is the case when designing a system that employs a publish-subscribe event system. Events are explicitly published while the responses to events can subscribe to them. Another form of event terminology that is often used is to replace event with signal. This is the terminology used by the Django Python web application framework dispatching system.

Django defines a single Signal base class in dispatcher.py and is a crucial part of the dispatching system. The responsibility of the Signal class is to serve as a base class for all signal types that may dispatched in the system. In the Django signal dispatching system, signal instances are dispatched to receivers. Signal instances can't just spontaneously decide to send themselves. There has to be some motivating party and in the Django signal dispatching system, this concept is referred to as the sender. Thus, the three core concepts of the Django signal dispatching system are signal, sender, and receiver. The relationship between the three concepts is illustrated below.

Senders of signals may dispatch a signal to zero or more receivers. The only way that zero receivers receive a given signal is if zero receivers have been connected to that signal. Additionally, receivers, once connected to a given signal, have the option of only accepting signals from a specific sender.

So how does one wire the required connections between these signal concepts in the Django signal dispatching system? Receivers can connect to specific signal types by invoking the Signal.connect() method on the desired signal instance. The receiver that is being connected to the signal is passed to this method as a parameter. If this receiver is to only accept these signals from specific senders, the sender can also be specified as an parameter to this method. Once connected, the receiver will be activated once any of these signal types have been sent by a sender. A sender can send a signal by invoking the Signal.send() method. The sender itself is passed as a parameter to this method. This is a required parameter even though the receiver may not necessarily care who sent the signal. However, it is good practice to not take these chances. If, from a signal sending point of view, there is always a consistency in regards to who the sender is, there is a new lever of flexibility on the receiving end. Illustrated below is a sample interaction between a sender and a receiver using the Django signal dispatching system to send a signal.

The fact that the signal instances themselves are responsible for connecting receivers to signals as well as the actual sending of the signals may seem counter-intuitive at first. Especially if one is used to working with publish-subscribe style event systems. In these event systems, the publishing and subscribing mechanisms are independent from the publisher and subscriber entities. However, in the end, the same effect is achieved.

Tuesday, June 9, 2009

The Importance Of Name Servers In Cloud Computing

In cloud computing environments, there are nodes. These nodes are the high-level cloud elements. They are often represented by physical or virtual servers. Nodes need computing power to process any actions that may take place within the cloud environment. At the next level down, we have entities, or objects in the cloud. These objects live on one or more nodes in the cloud. These objects may represent any abstract software concept. An object in the cloud could be a file, a module, a class, an instance of a class, etc. However, these objects, are hard to identify in a cloud environment. In the cloud, this objects cloud be identified by nothing more than a URI that makes the object unique.

Introducing a name server into a cloud environment allows objects in the cloud to be labeled. This adds meaning to the objects in the cloud that any party may be interested in. By meaning, I mean human-semantic meaning. As in "blog/123" as opposed to "data/123". We know we are now dealing with a blog instead of some unknown piece of data and can adjust the expected schema accordingly. This human-semantic meaning really isn't all that important in deployed systems that already know what type of data a given URI in the cloud refers to. However, when designing code that has even the slightest potential to move into a cloud, referring to a meaningful name as opposed to a URI can be extremely useful. This can be achieved by using a name server.

But what about URIs that have been well designed and already offer meaning to the developers that use them. Do these URIs have any use for a name server? In a cloud environment they certainly do. Take the following URI, "http://127.0.0.1/blog/entry/123/". It is quite obvious that this URI is referring to a specific blog entry. But which part of the URI gave us this meaning? It certainly wasn't "127.0.0.1". It must have been "/blog/entry/123". This is meaningfulness that could easily be captured by a name server. To the world living outside of this hypothetical "blog cloud", the "127.0.0.1" does have meaning (obviously there would be an actual domain name in the real world). However, naming entities in the cloud environment is what we are interested in here. In a cloud environment, nothing is certain. Nodes come, and nodes go, all for different reasons. And it is this scenario where internal cloud name servers really come in handy.

Lets assume there is now a name server installed in our hypothetical blog cloud environment. What do all our nodes containing data elements and processing power do now? That is, how do they configure themselves to use the name server? This is where the concept of presence broadcasting is introduced. Each node in the cloud needs to inform the name server that it could potentially contain a named object that some other node in the cloud may be interested in. If this where possible, nodes in the cloud could come and go as they please.

The Pyro Python cloud computing framework supports all of the above. The concept behind Pyro is that behavior can be invoked on remote Python objects. In the cloud computing environment, this would mean that nodes can invoke behavior that is executed on other nodes. Pyro objects, which are essentially standard Python instances with a little extra Pyro declaration, can be named on the name server. This not only adds meaning to the objects in the cloud, but also maps the name to a specific node in the cloud. So, the "/blog/entry/123/" URI on the name server would map to "127.0.0.1" or some other node.

The Pyro framework also supports presence broadcasting. This means that before a Pyro object is used, the code that plans on using that object can broadcast itself to the name server. This means that any objects this code is exposing to other nodes in the cloud are now also available in the name server. The idea of presence broadcasting in the cloud is a powerful one because it allows the cloud as a whole to grow and to shrink as necessary. It gives the cloud its' elasticity property. All of this could be implemented without a name server, but it would be far from seamless. Even cumbersome.

Wednesday, June 3, 2009

The Trac Ticket Database Model

In Trac trunk, we can get a glimpse into the schematics behind the Trac ticket. The internals of the Trac ticket database model are interesting to see because the ticket is such a central concept in the Trac system. Below is an illustration of the Ticket database model class.

Here are some brief highlights of what each method does.

__init__() - Performs initialization actions. If the ticket id was passed, the values of the ticket in the database will be populated in this instance. Otherwise, the default values will be used.
id_is_valid() - Return true if the specified ticket id is valid.
_get_db() - Return the database connection.
_get_db_for_write() - Return the database connection for writing.
_init_defaults() - Initialize the default field values. In addition to initializing the default field values, the default options that are available for a given ticket field are also initialized.
_fetch_ticket() - Initialize this ticket instance using specific ticket field values. This is done by first executing a query to load the stored ticket data from the database.
__getitem__() - Support the getitem operator to retrieve ticket field values.
__setitem__() - Support the setitem operator store ticket field values.
get_value_or_default() - Return the value of the field or the default value. The default value is only returned if there is a problem retrieving the actual value because it does not yet exist.
populate() - Populate the ticket fields using the specified dictionary. Only valid keys in the supplied dictionary, that exist as ticket fields, will be populated.
insert() - Insert this ticket into the database. This method will not insert the ticket if the ticket id already exists in the database.
save_changes() - Update the database with any changes made to this ticket. This includes updating the ticket changelog.
get_changelog() - Retrieve the changelog data for this ticket.
delete() - Delete this ticket from the database.

Tuesday, June 2, 2009

Addressability Lost Within The Realm Of Ajax

With RESTful APIs and applications, and with HTTP in general, addressability plays a huge role. With web applications that project an Ajax user interface, this addressability is typically gone. There is no notion of URI as far as the end user is concerned. This is because Ajax web applications often have a single URI; the application itself. From this point forward, the end user navigates through the various application states while all the addressable URIs are contacted beneath the fancy Ajax interfaces. If you are an end user using these types of applications, do you necessarily care? That would depend what you are using the application for and the type of user you are. For most end users, if the use interface is well designed, the addressability of the resources involved with the application are a non-issue. Even developers, the type one would think would be interested in the underlying application data, might be too enthralled with how useful the user interface is. If the user interface of an Ajax web application is poorly designed, things change. The data at the other end of the URI suddenly becomes much more interesting.

In more traditional web applications, the ones without the fancy asynchronous javascript toolkits, the URI of almost every resource is accessible from the address bar in the web browser. This is obviously the benefit to having addressable resources in a web application. The URIs are immediately apparent to the end user via the web browser address bar. With Ajax web applications, end users cannot point there web browser to a specific URI and expect the application to behave accordingly. This, I find, to be one of the more annoying drawbacks to using Ajax web applications. Although the user experience of Ajax web applications is improving at an ever increasing rate, the sacrifice of addressability, a powerful concept, has to be made.

But does addressability really need to be lost completely in Ajax web applications? Just because the URI isn't in its' normal location, the web browser address bar, doesn't mean the end user can't know about it. In fact, many web applications that provide an Ajax user interface will also provide a public RESTful API used by that interface, complete with documentation. However, this doesn't solve the problem of the end user that doesn't care about API documentation and just wants page X of the application to appear when they point their web browser to URI Y. I think something like this could potentially work. There would have to be two separate RESTful APIs; one for the application resource data, and one for the user interface. The user interface would interest the end users because they could use these URIs to point the web browser to a specific application state. These user interface application URIs wouldn't even need to be reflected in the web browser address bar. As long as the current user interface application state is advertised somewhere in the user interface, it could work. And it would be incredibly useful.

Monday, June 1, 2009

Open Source Freeloaders

In an interesting entry about leeches in open source software, the question of big corporations and open source software freeloading is raised. Does such a thing as freeloading on open source software exist? Well, according to the entry, some members of certain open source communities believe that doing so in a corporate environment without ever contributing back to the community would be considered freeloading. However, the open source license used in many popular open source projects does not require any contribution back to the community. Is this an ethical concern then? Do corporations feel bad for not contributing back to a software project that they are allowed to use for free? No. Individuals, maybe.

When you have put a significant time and effort investment anything, you generally want it appreciated. It is easy to see how the core developers of a successful project become essentially unimaginative toward it. The willingness of someone to contribute back any kind of artifact boosts the overall project motivation. The project team no longer feels that they are working toward something that has already become a lost cause. However, there are also implicit contributions made to open source projects.

The mere public knowledge that a large corporation is using any given open source project is probably worth more to the project than anything tangible the corporation would be willing to contribute. People within large corporations didn't decide to use a particular open source solution for the good of their health. They use it because it does what it is supposed to do. This should be very motivating. I'm always impressed by the fact that I use a programming language NASA considers useful.

What about when large corporations complain loudly and thoroughly about a open source project? Well, this does two things for the project. First, it demonstrates that corporation is using the software otherwise they would never take the time to complain about it. Second, it sets the stage for the project. The corporation does all the leg work by pointing public attention toward the flaws in the software. Now all eyes are on the project. All that's left to do is fix it quickly deliver in front of everyone. It seems that there isn't too much damage that freeloading can do to the open source software industry.

Subscribe to: Posts ( Atom )