Monday, March 30, 2009

Gaphor editor adapters

The Gaphor UML modeling tool, which is written in Python, uses a pop-up style editor widget which allows in-line editing of certain diagram elements. The widget itself isn't overly-interesting. It makes trivial changes to modeling elements quicker which is always helpful in any software solution. What we are interested in here is the method used to display the widget based on the element type. Gaphor relies heavily on the Zope interface and component frameworks. The Zope interface framework is utilized by Gaphor to define various interfaces that are provided by classes throughout the application. The component framework is utilized for the purpose of registering components and adapters. What exactly are adapters? Adapters are a type of component, automatically created by Zope when using the interface and component framework in conjunction with one another. This doesn't happen by itself; there are some carefully placed rules involved with defining adapters. When used right, Zope adapters are a very powerful tool that provide maximum usage of an interface. Gaphor defines an extensive set of Zope adapters. Here we are interested in the editor adaptor.

There are actually several adapters created for the IEditor interface, one for each diagram element that supports the editor widget. The Gaphor adapter approach is an alternative design to providing behavior that varies by type. A more traditional approach may have been to create a class hierarchy for the editor widget. Each class in the hierarchy would correspond to a different diagram element type. The differing behavior would then be implemented in each class in the hierarchy while similar behavior remains untouched and varying behavior gets replaced in a polymorphic manor. This is similar to how the editor adapters in Gaphor are defined and registered. One key difference in the design is how each class, or adapter, is instantiated when the need arises. With the class hierarchy approach, we would need extra logic to ensure that the correct instance type is created to use in conjunction with the diagram element widget. With Zope adapters, we simply instantiate the IEditor interface providing the object we are adapting to as a parameter. In the Gaphor case, the IEditor interface is instantiated with the diagram element widget as a parameter. The correct adapter instance is then returned by the Zope component framework, complete with the alternate functionality specific to that diagram element type. A similar effect can be achieved with the class hierarchy design. The class that is instantiated would accept the widget that is being adapted.

The adapter approach is a solid one because it emphasizes the importance of the interface contract provided by classes and the modularity offered by creating components. Being able to directly instantiate an interface speaks loudly in terms of what can be used in that context.

Friday, March 27, 2009

Does Plone really suck as a CMS?

Plone is a content management system written in Python. It is built on top of Zope, a web application frame also written in Python. To identify how well Plone performs as a CMS, we must identify what a CMS is supposed to do and what Plone actually does. Rather than give a rigorous definition of what a CMS is, it is safe to assume that any given CMS is supposed to manage the publishing of content for any number of users. This involves the content work-flow which specifies the various states the content will go through in the publishing cycle. All this needs to be as transparent as possible to the end user because their ultimate goal is to publish content in an orderly manor and as painlessly as possible. These same questions of CMS quality could be aimed at any CMS in existence including TYPO3, and Drupal. Plone often gets misrepresented as a bad choice of CMS, often for the wrong reasons. Many Python developers cringe at the mere thought of using Zope as a Python web application framework. Zope is probably the most powerful Python web framework in existence and the reason for the hesitation is the time investment involved in learning how to use it. Since Plone is built on top of Zope, it relies heavily on not only Zope's functionality, but also the philosophy of how it is used. Flawed software tends to flow upward so Plone will inherit the good, the bad, and the ugly. Thus, some of the complexities found in the Plone user experience are influenced by Zope.

Ultimately, the biggest driving factor in any CMS is the user experience. Everything else is secondary, including the setup of the system. Plone is strong in some user experience areas and weak in others. One of the weak spots in the Plone user experience is the back-end interface which is essentially the Zope interface. For anyone who is using the system for the sole purpose of managing content, this can be very intimidating. In a scenario where an editor of the site is temporarily given the task of managing a few users, this trivial task suddenly becomes a time-consuming challenge. However, that same editor Plone user would excel when it comes to doing his actual job of managing content. Plone offers a great user experience when it comes to general content editing and work-flow management. In fact, the work-flow Plone component is probably superior to many other mainstream content management systems simply because it is very straightforward to use for the people who need to use it. It doesn't require major customization from developers just so that it is barely usable by content editors.

When the case for new custom functionality in the CMS arises, development in Plone is hard. That is, there is a steep learning curve when coming from another CMS or web framework. Again, this stems from the challenges in developing for Zope. The most common reason for custom development in a content management system is to create new content types with new specialized behavior. These new content type may in fact extend the work-flow functionality of the CMS if the work-flow framework will allow for it. Plone also has something call archetypes for extending the available content types in the CMS. The idea behind archetypes is that developers and transform UML modules into functional Plone extensions. This is a very cutting-edge concept that no other CMS offers.

Some of the arguments against Zope and Plone are not well grounded. For instance, a common complaint about Plone is that it is a challenge to install and configure. This may have been the case five years ago but today it is one of the easier content management systems to install. In fact, most of the configuration is taken care of by the installer asking the user simple questions. Another argument is that Zope is very resource intensive. Users must look at why these resources are used and what they get in return. An alternative CMS, offering the the same functionality as a Plone CMS installation is likely to have similar resource consumption. Additionally, a content management system isn't a flimsy desktop application. For any non-trivial CMS deployment, you are going to need modest system resources. This is just a fact of life.

Finally, this isn't a marketing scheme for Plone as a CMS. I haven't used it other than experimentally for years. I think if nothing else that serves as further proof that you need to pick the right CMS for the job. Period. Focus on user experience. Which CMS can you install, setup with the required functionality for the people who manage the content, and deploy the fastest, with minimal complaints from content editors? After asking this question, you may realize that a CMS is not what you want at all and that a plain old web application framework would do the trick after a month of custom development.

Thursday, March 26, 2009

The Trac component loader

All modern web application frameworks need replaceable application components. Reasons for this requirement are plenty. Some applications will share common functionality with other applications such as identity management. However, having the ability to extend this functionality or replace it entirely is absolutely crucial. Technology requirements change too fast to assume that a single implementation of some feature will ever be sufficient for any significant length of time. Moreover, tools that are better for the job that your component currently does will emerge and developers need a way to exploit the benefit of these tools without having to hack the core system. Trac is a Python web-framework in it's own right. That is, it implements several framework capabilities found in other frameworks such as Django and TurboGears. Trac is highly specialized as a project management system. So, you wouldn't want go use Trac as some generalized web framework for some other domain. Project management for software projects is such a huge domain by itself that it makes sense to have a web framework centered around it. Trac defines it's own component system that allows developers to create new components that build on existing Trac functionality or replace it entirely. The component framework is flexible enough to allow loading of multiple format types; eggs and .py files. The component loader used by the Trac component system is in fact so useful that it sets the standard for how other Python web frameworks should load components.

The Trac component framework will load all defined components when the environment starts it's HTTP server and uses the load_components() function to do so. This function is defined in the loader.py module. This load_components() function can be thought of as the aggregate component loader as it is responsible for loading all component types. The load_component() function will accept a list of loaders in a keyword parameter. It uses these loaders to differentiate between component types. The parameter has two default loaders that will load egg components and .py source file components. The load_components() function will also except an extra path parameter which allows the specified path to be searched for additional Trac components. This is useful because developers may want to maintain a repository of Trac components that do not reside in site-packages or the Trac environment. The load_components() function also needs an environment parameter. This parameter refers to the Trac environment in which the loader is currently executing. This environment is needed by various loaders in order to determine if the loaded components should be enabled. This would also be a requirement of a custom loader if a developer was so inclined to write one. There is other useful environment information available to new loaders that could potentially provide more enhanced functionality.

As mentioned, the load_components() function specifies two default loaders for loading Trac components by default. These loaders are actually factories that build and return a loader function. This is done so that data from the load_components() function can be built into the loader function without having to alter the loader signature which is invoked by the load_components() function. This offers maximum flexibility. The first default loader, load_eggs(), will load Trac components in the egg format. This does so by iterating through the specified component search paths. The plugins directory of the current Trac environment is part of the included search path by default. For each egg file found, the working set object, which is part of the pkg_resources package, is then extended with the found egg file. Next, the distribution object, which represents the egg, is checked for trac.plugins entry points. Each found entry point is then loaded. What is interesting about this approach is that it allows subordinate Trac components to be loaded. This means if there is a found egg distribution containing a large amount of code and a couple small Trac components to be loaded, only the Trac components are loaded. The same cannot be said about the load_py_files() loader which is the second default loader provided by load_components(). This function works in the same way as the load_eggs() function in that it will search the same paths except instead of looking for egg files, it looks for .py files. When found, the loader will import the entire file, even if there is now subordinate Trac components within the module. In both default loaders, if the path in which any components were found is the plugins directory of the current Trac environment, that component will automatically be enabled. This is done so that the act of placing the component in the plugins directory also acts as an enabling action and thus eliminating a step.

There are some limitations with the Trac component framework. The _log_error() function nested inside the _load_eggs() loader shouldn't be a nested function. There is no real rationale for doing so. Also, loading Python source files as Trac components is also quite limiting because we loose any notion of subordinate components. This is because we can't define entry points inside Python source files. If building Trac components, I would recommend only building eggs as the format.

Tuesday, March 24, 2009

Why we need a thread-safe publish/subscribe event system

Publish-subscribe event systems are a fairly common design pattern in modern computing. The concept becomes increasingly powerful in distributed systems where many nodes can subscribe to an event or topic emitted from a single node. The name publish-subscribe, or pub-sub, is used because it has a tight analogue in the real world. People with magazine or newspaper subscriptions receive updates when something is published. Because of this analogue, developers are more easily able to reason about events and why they occurred in complex software systems. In any given software system, some code will need to react to one or more events. These events can range from anything as simple as a mouse click to a complete database failure. The publish-subscribe pattern is infinitely extensible because any number of observers may subscribe to a single event. Subscriptions can also be canceled to as to offer architectural scalability in both directions; up and down. One bottleneck in a publish-subscribe framework can occur while the publishing object needs to wait until all subscribers have finished reacting to the event. In some cases, this is unavoidable such as when the publisher is expecting a value to be returned from one of the subscribers. In other cases, however, the publisher doesn't care about the subscribers or how they react to a published event. In a localized publish-subscribe system, that is, not a distributed publish-subscribe system, we could use threads for subscribers. If we were to build and use a framework such as this, where subscriptions react to events in separate threads of control, we would also need the ability to turn threading off and use the framework in the same way and have it still be functional. This is because threading is simply not an option in every scenario.

The boduch Python library offers a publish-subscribe event system such as this. The library is still in it's infancy but has the ability to run subscription event handles in new threads on control. The threading capability can also be switched on and off. The same code using the library can be run in either mode. Events are declared by specializing a base "Event" class. Likewise, event handles, or subscriptions, are declared by specializing a base "Handle" class. Developers can then subscribe to an event by passing an event class and a handle class to the subscribe function. Multiple handles may be listening to a given event type and if running in threaded mode, each handle will start a new thread of control. There are limits on the number of threads that are allowed to be run at a given time but this can be adjusted either manually or pragmatically. When running in threaded mode, or non-threaded mode for that matter, published events may be specified as atomic. This really only as an effect when the event system is running in threaded mode because it forces all handles for that particular event to run in the publisher's thread. When running in non-threaded mode, atomic publications are idempotent.

As mentioned earlier, there are several limitations to the boduch library since it is still in its infancy as a project. For instance, there is no way to specify a filter for event subscriptions. Subscribers may want to react to event types based on data contained within the event instance. In turn, there is no way to tack the source object that emitted the event. Finally, there is no real guarantee that proper ordering will be preserved when running in threaded mode. However, this can be worked-around. I haven't actually encountered a scenario where the ordering of instructions have been defective when running in threaded mode. This doesn't mean it is possible. I actually hope I do some day so I can incorporate more built in safety in the library.

Monday, March 23, 2009

Registering configuration values in ECP

The Enomaly Elastic Computing Platform has an extension module API that allows developers to register new ECP components. These components include new web controllers, new RESTful API controllers, and so on. One component that cannot be registered are configuration values. Extension modules can be viewed as smaller applications that are executed within ECP. Therefore, these smaller extension module applications will need to be configured. There are always going to be values that should be configurable within any application such as storage locations. Currently, extension modules must implement their own settings abstractions. This functionality already exists in the ECP core and the way configuration values are accessed and stored should be consistent and hence the need for the custom settings class. It would make sense for extension modules in ECP to have the ability to register their own configuration values. This way, configuration values would be accessed and stored in the exact same way across the platform. An additional complication arises when trying to use the configuration editor. The configuration editor is tightly-coupled with TurboGears widgets and thus requires that all extension modules be tightly-coupled with TurboGears widgets. Ideally, when configuration values are registered, which currently is not possible, additional meta data suitable for generating a display widget for the configuration value could also be registered.

The current implementation of the ECP Settings class uses managed Python attributes to seamlessly save and load configuration values. Every time a managed Settings attribute is accessed, the Variable class will attempt to load the variable. Likewise, when a managed Settings attribute is altered, the Variable class will attempt to store the configuration value. It is easier to use managed attributes for simple storage and retrieval operations. The alternative is to use the Variable class directly. In fact, earlier implementations of ECP did exactly that. Every time a configuration value was needed, we had to invoke Variable.load() while specifying a default value in case the configuration value didn't exist. The new Settings class was introduced to help alleviate some of this troubled configuration access. A single instance of the Settings class is created in the configuration.py module. This instance can then be used throughout the ECP application, including extension modules. Configuration categories are also incorporated into the Settings class. This is done by using the same concept as the Settings class for each category. This category class is then set as an attribute of Settings. This allows us to access configuration values in the form of settings.kvm.bridge. This syntax offers much more readable code when used in context. However, the problem with this method of managing configuration values was soon after realized. There will always be a need to add new configuration values. Most noteably, extension modules are going to need this capability since developers are going to want to access configuration values in the same way as the rest of ECP. There is a need to be able to register new configuration values. This eliminates the extensibility problem of adding new configuration values. If every time a new configuration value needed by an extension module, or the core application for that matter, needs to be added to the Settings class, it will grow exponentially and become very challenging to maintain. Additionally, the configuration editor is very tightly coupled to TurboGears widgets because extension modules need to display these configuration values in the configuration editor. This is done by the extension module defining a hook that passes in TurboGears widgets used to display the configuration values for the extension module in the configuration editor. This isn't the ideal method since this also couples the extension modules to ECP dependencies (TurboGears). Ideally, the widgets for displaying configuration values should be generated by the configuration editor based on minimal meta-data provided by the extension module at registration time.

The new approach to ECP configuration value management is to have configuration values registered in the Settings class. The same approach of using managed attributes to access and store configuration values is still used. What is different is the ability to register a value and have these managed attributes automatically built for the developer. This is accomplished by introducing a new MetaSettings class. The purpose behind this new class is to dynamically construct new categorization classes and methods that will become attributes of the settings instance. There is also a new settings.register() method that can be used to register new configuration values. The end result of using settings.register() to register a new configuration value is the same syntax as before when using the configuration values. The name of the module passed to settings.register() will become an attribute of the settings instance. There are also meta-data parameters in the settings.register() method that allow developers to specify a title and description of the configuration value. In the current ECP configuration management implementation, this information must be specified in the TurboGears widget. With the new implementation, the managed attribute functionality found in the ECP core no longer needs to be duplicated. There is now a much more uniform interface.

With this new configuration registration functionality in place, there is now an opportunity for great improvements in the configuration editor. We could now potentially eliminate the coupling to TurboGears widgets and have each configuration widget generated automatically. Grouping by extension module is now also possible in the configuration editor.

Friday, March 20, 2009

SQL engine dialect in SQLAlchemy

Object relational mapping technology is used in object-oriented systems to provide a layer between the objects in the system, and the underlying relational database system. ORMs help reduce the time investment, and thus the cost as well, during development because developers can still think in an object-oriented way while interacting with the database. Tables in the database are classes in the code. Instances of these classes are rows in the database tables. This is an over-simplification of the problem. It isn't quite so straightforward as to generate standardized SQL statements based on what the instance is doing in the code. Not all database systems use a standard set of SQL. To further complicate matters, the various popular database systems don't even support the same features.

In any given language, there are different dialects of that language. For instance, if two people were engaged in a conversation, one person from the west coast of the country, the other from the east, there is bound to be some ambiguity even though they are speaking the same language. This could arise from a number of factors such as politics, or culturally-related customs. The same effect happens in technology. The language is SQL and while some database systems are very similar in some ways, they can be just different enough to cause an application to require a huge amount of duplicated code just so that the application can support multiple database systems. These are the systems that are supposed to speak the same language. SQLAlchemy addresses this issue by introducing the concept of a dialect. Every database system supported by SQLAlchemy defines a dialect. This isn't just a dialect for the SQL language alone but rather for the entire database system. We can do a lot more than merely execute SQL statements with SQLAlchemy so this is a requirement of the Dialect interface. This interface specifies what each dialect for a specific database must look like. For instance, with a specific dialect, we can specify whether the database accepts unicode SQL statements or not. With SQLAlchemy, there are other classes that do work related to communication with the database such as preparing SQL statements and generating schemas. All these are specified in the Dialect interface.

SQLAlchemy also defines a DefaultDialect class which provides the Dialect interface. This class actually implements some of the methods and attributes specified by the Dialect interface. Some of these methods and attributes are common across all supported database systems. The attributes and methods that are specialized for any given database, are simply overridden by that particular dialect implementation.

When a connection is established in SQLAlchemy, it will construct a new engine instance. This engine will then use the dialect specified in the connection string. For example, mysql would be considered a dialect. Method calls made through the constructed database engine instance are then delegated to the dialect. The dialect will then act accordingly for the database system it is using.

The Dialect interface and the implemented dialect classes in SQLAlchemy serve as a good example of polymorphism in object-oriented programming. There is a single interface that is used to invoke behavior and that behavior varies by type. This is a resilient design because it is loosely coupled and highly replaceable. We can implement our own database dialect and plug in into SQLAlchemy if we are so inclined.

Thursday, March 19, 2009

The need for a REST interface in object-oriented applications

REST is a set of design criteria used for designing web-centric architectures. Much of the HTTP protocol incorporates ideas found in REST such as being connectible, resources, and a uniform interface. This uniform interface consists of methods that can operate on resources such as GET, POST, PUT, and DELETE. These are the most common method employed by RESTful applications. The idea of resources states that each resource within a system is uniquely addressable. In fact, this is also part of the uniform interface found in RESTful designs. Many web clients, other than the web browser, use SOAP as the message transformation framework. However, SOAP is not as flexible as a RESTful design and yet there exist many clients and client libraries, in several languages for SOAP services. There are also RESTful clients and client libraries, although, no nearly as many. By the very nature of a RESTful design, objects in an object-oriented system map well to resources of a RESTful architecture. Perhaps developers should keep this in mind and have classes provide a RESTful interface.

What would a RESTful object-oriented interface look like? That is, what would the methods and attributes be? The first step to implementing a REST interface would be define methods that map to the HTTP methods. For example, consider the following example.

#Example; A RESTful Python interface.

class REST(object):
   def GET(self):
       raise NotImplementedError("GET()")
  
   def POST(self):
       raise NotImplementedError("POST()")
  
   def PUT(self):
       raise NotImplementedError("PUT()")
  
   def DELETE(self):
       raise NotImplementedError("DELETE()")

Here, we have a Python class called REST. This class defines the GET(), POST(), PUT(), and DELETE() HTTP methods. Each method, when invoked will raise a NotImplementedError exception because this class is meant to be an interface. For a class to provide this interface, it would inherit from this class and redefine all methods, providing an implementation. What about attributes? If a instance of REST were to act as a proxy to some RESTful resource, it would need to know its URI. So uri would be a good candidate for an interface attribute. There are many other meta-data attributes associated with the HTTP protocol they we aren't concerned with here. What we want to highlight is the REST interface developers could potentially use when designing objects. On the topic of attributes, another question springs to mind. What about resource attributes. If all we know about a particular resource is the methods it supports and its uri, how can we represent the resource in the context of an object-oriented system? This would most likely be another interface that we would use in conjunction with the REST interface, used to interpret the representation of the remote resource. An alternative is to use WADL to define what resources should look like. However, WADL is too much like SOAP. The rigidity involved defeats the purpose of a RESTful architecture.

The REST interface discussed so far is really only useful as a proxy to a remote resource. That is, the object we are designing that provides this interface would use this interface to make an HTTP request to the HTTP server providing the resource. An analog would be the web browser application providing the REST interface and invoking the GET() method to retrieve a web page.

The "REST" interface could also be the resource itself. If the developer is designing an object-oriented HTTP web application, they could design object within that system, exposed to the web, that provide the REST interface. The method information is always encoded in the HTTP request, otherwise it wouldn't be HTTP. If the base HTTP server forwards this request to an object that provides this interface, that object will always know what to do with the request. This same object can also act as a proxy and so on, forming a chain of RESTful resources.

However, as with all distributed computing, this chain of resources poses a design challenge. How does the system manage new resource locations? If the system is to scale at all, it will need to. However, this problem will come down the road. Right now, the problem is the RESTful implementation at the design level in object-oriented systems. With a RESTful interface, these problems would be much easier to solve.

Wednesday, March 18, 2009

3D data and 3D UML

An interesting and experimental idea: The ability to design three-dimensional models using UML. This idea is shard among some visionary coders who have already demonstrated the ability to model UML in a 3D model space. Diagram elements can be layered, placed, and rotated in a three-dimensional manor. Glasshouse is another cutting edge GUI for visualizing relational data sets. Users can use SQL or spreadsheet data as input, and Glasshouse will present the user as an avatar within a 3D environment. The avatar then acts as a the user within the data environment, allowing the user to manipulate the data in an interactive way never seen before. The remaining question is, what is wrong with the current standard 2D visualization of data and UML models today?

The answer is that there is nothing inherently wrong with viewing data in two dimensions. The same holds true with UML models. Before the graphical user interface, common on most desktops today, there was the command line. There is also nothing inherently wrong with the command line. However, the GUI was invented for a reason, so that human users can quickly comprehend what is displayed in front of them. Trying to grasp a relational data set that is displayed in the console is possible, although it would most likely take a seasoned professional two weeks to understand it fully. If that same data set is presented graphically, many more features become available such as moving windows around etc. With a GUI it might take that same professional a day or less to fully understand the data. Now, imagine trying to display, edit, and understand a modest UML diagram in the console. For humans understanding data, the GUI was the a big fist step and understanding UML models followed shortly. The next step is another dimension.

The Glasshouse project is good example of how this first 3D data manipulation interface might be taken. Insights about data sets will most likely be made possible that never were before. Users much more freedom in the perspective in which they view the data. Will the UML be able to follow this direction? Some tools have already started by making individual diagrams rotatable and stackable within the modeling space. This is where the 3D functionality in the UML ends. There is currently no tool that offers 3D UML elements such as classes, objects, or interactions. Could avatars be used as actors in use cases? For instance, when simulating a use case realization, the avatar (the actor) could actually move about in the collaboration among the other 3D UML elements. An instance of some class could expand and contract in three dimensions according to how many resources it is using.

These are some incredibly complex design challenges to implement. Building a two dimensional UML modeling tool is by no means trivial. Building three dimensional interfaces are also not trivial. Combining the two could take decades just to get a functional demo working. Is something like this worth the effort? Would this new UML 3D modeling interface produce better software, faster? In the end, this amounts to a tough decision to make because of the risk involved. But that hasn't stopped other ingenious software projects from being built in the past.

Evaluating file-monitoring techniques in Python

There is a general need to monitor changes made to files in any computer system. The question being, why? The short answer being that when a file has changed, the state of the system has also changed and there are going to be reactions to that change in state. These events that take place in response to file changes usually happen at a very low system level. At the application level, there is also a need to monitor the system state or sub-states such as files. For instance, if we are working with a web development framework such as TurboGears and we want the development HTTP server to reload every time a source code file changes, those files need to be monitored. Once a change has been detected by the monitoring process, the process can then reload the HTTP server. Another use for files is to communicate between different processes within a system. One process, or many processes can monitor a given file and react accordingly when the state of that file changes.

There are two approaches I'm going to evaluate here. The first is the CherryPy method which is non-blocking. The second is the generic method which is blocking. Although the two methods differ at a higher level, they are the same in that they determine if a file has changed state based on the modification date.

The first of the two methods is a blocking method. This method will block the flow of control within an application. This means that any code that comes after this logic, will not be executed until this file monitoring code is complete. The reason this method is blocking to begin with is it involves a loop and breaking out of it is the only way to terminate the file monitoring logic. The developer using this method can specify an interval at which the logic will test the specified files for changes. It will check the modification date of the file and compare it to the last modification date recorded by this method. If the date is later than the last recorded date, the file has changed. Obviously the more files being monitored, the more overhead involved so care must be taken to not overload the system by monitoring too many files. This method of file monitoring is generic and can be used in many contexts with little modification since it wasn't designed with any particular application domain in mind.

The non-blocking method is based on the CherryPy Python web application framework. It uses this non-blocking method to monitor changes made to Python modules within a given CherryPy application. Once a module state change has been detected, it is an indication that the HTTP server should be restarted to reflect those changes in the running application. The monitoring logic is periodically run in a separate thread of control at a specified interval. This means that is the server is in the middle of processing a request, the control flow does not block in the middle of the request entirely. The method used to determine if the file has been changed is the same as the blocking method. The modification date is compared to the previous modification date. This method of monitoring for file state changes on a file system is a very elegant solution. The main downfall is that it is context-specific. It was designed with CherryPy in mind. However, it is not so tightly-coupled to CherryPy that it could not be used somewhere else. Some minor changes would do the trick.

Lastly, if you need to monitor file system state changes, which method do you want to use? That is, which method is best suited for your application? There are a couple factors to consider. First, if your application can only support a single process, the blocking method is out of the question. However, this is rarely the case. The application could simply spawn a new file monitoring process. This could introduce a new problem though because the file monitoring process would need to communicate to the main application process. Having done this, you will have introduced a new communication channel in your application and thus increasing the complexity considerably. The CherryPy, non-blocking, file monitoring approach could prove to be the better approach if the application you are developing needs to react to file system state changes as well as changes in state from other resources. The challenge here is that it is not nearly as generic as the blocking method and would require a larger development time investment. Do some investigation as to what state changes your application must respond to. If only file system state changes, the blocking method may suffice. In must other cases, the non-blocking approach may be better suited.

Tuesday, March 17, 2009

ECP three-level machine abstraction

The Enomaly Elastic Computing Platform is a platform for managing distributed virtual machines. Therefore, we need some type of abstract representation of this concept. This requirement isn't really any different from any other software problem. There is a problem domain which contain concepts unique to that domain. Developers will then try to capture what that concept represents in that domain by creating an abstraction. By creating abstractions in this manor, we lower the representational gap between the domain and how concepts in that domain are realized in the solution. In the case of ECP, there is a real need to represent the idea of machines.

In any given software solution, the abstraction created by developers may be a very simple, single layer abstraction architecture or there could potentially be several layers within the architecture, yielding an extremely complex architecture. In the latter case, without a well thought out design, we start to lose the value that creating an abstraction brings in the first place. Sometimes, when dealing with a large abstraction, further dividing this abstraction into layers can help to better understand what you as a developer are actually implementing. Often, the abstraction design is further complicated by constraints imposed by the system or framework within which we are developing. Rationale, interfaces, and consistency in general, need to be taken into consideration when constructing a layered abstraction architecture.

To implement the machine abstraction, ECP uses a three-level approach to realizing this abstraction. In this architecture, each level is a class that realizes a different level of the "machine" concept, and for different purposes than other layers. In this implementation, the three levels hierarchical. At the top level, we have a class called ActualMachine which implements several methods for invoking machine behavior. The next level contains a class called DummyMachine that inherits from ActualMachine and doesn't do much. Finally, we have a Machine class that can store persistent data to the database. Hierarchically, the DummyMachine and Machine classes are at the same level since they both inherit from ActualMachine. In this discussion, however, the levels aren't necessarily based on the class hierarchy but rather based on the rationale behind each class.

The ActualMachine class is meant to most closely represent the concept of "machine" in the context of ECP. The same symbolizes that this is the underlying machine, not a Python object. Obviously, instances of ActualMachine are Python objects but when using these objects, we are more interested in what the underlying technology. This class is where all the behavior for the machine concept is defined. This class doesn't define any data attributes.

The DummyMachine class is exactly what the name implies; a dummy. The class simply defines a constructor that allows attributes to be set. Also, the class inherits all the behavior from ActualMachine. Instances of DummyMachine can set attributes in the constructor and invoke behavior provided by ActualMachine.

The Machine class provides persistence for the machine abstraction in ECP. The class also inherits behavior from the ActualMachine class. Machine functions similar to DummyMachine in that they both provide the same behavior. The difference between the two is that DummyMachine stores attributes in memory while Machine stores attributes in the database.

The rationale behind this architecture is that we want to be able to instantiate machine instances while not affecting the database. The opposite is also true; we need to be able to instantiate machines that will have an immediate effect on the database. Within the context of the ECP RESTful API, machines that are not stored on the local machine (they are retrieved from another ECP host), will need to be instantiated. That is, we want to have an abstraction available to use once the remote machine data has arrived. This can be done by using some primitive data construct such as a list or a dictionary, but by doing this we lose the machine concept. The behavioral aspect of the machine concept is gone because you can't tell a dictionary to shutdown.

There are still several limitations to this approach. For instance, not all ActualMachine behavior will be supported by the DummyMachine instances that are created. This is simply a limitation of the three classes and their inter-relationships. It is still an improvement over representing domain concepts using primitive types. We give ourselves more control in the three-level architecture over what happens when requested behavior cannot be fulfilled. The DummyMachine layer is an example of mixing the problem domain with the solution domain. The class came into existence because the solution demanded it. But this design allows for the behavior provided by the machine instances to still behave like "machines" without conforming too much to the solution constraints.

A similar approach is taken in ECP with other abstractions such as packages. The architecture hasn't been fully implemented for every abstraction within the platform. It will hopefully prove to add some balance between constraints and offered functionality.

I'm sure this approach prove useful in many other application areas. As objects become more and more distributed, we'll need a better way to represent their data when used locally while preserving the behavior of that object.

Monday, March 16, 2009

Initializing the CherryPy server object

The CherryPy Python web application framework contains several abstractions related to an HTTP server. One of which is the Server class defined in _cpserver.py. In fact, the top-level server instance of the CherryPy package is an instance of this class. The Server class inherits from the ServerAdapter class which is defined in servers.py. Interestingly, the class serves as a wrapper, for other HTTP server classes. The constructor of ServerAdapter will accept a both a bus and a httpserver parameter. The bus parameter is a CherryPy website process bus as described here. The httpserver parameter can be any instance that implements the CherryPy server interface. Although this interface isn't formally defined, it can be easily inferred by looking at the CherryPy code.

So we have a Server class that inherits from ServerAdapter. The idea here is that many server instances may be started on a single CherryPy website process bus. One question I thought to ask was "if Server inherits from ServerAdapter and ServerAdapter is expecting a httpserver attribute, how do we specify this if the ServerAdapter constructor isn't invoked with this parameter?" In other words, the ServerAdapter isn't given a parameter that it needs by the Sever class.

It turns out that the start() method is overridden by Server. This method will then ensure that a httpserver object exists by invoking Server.httpserver_from_self(). Developers can even specify an httpserver object after the Server instance has been created by setting the Server.instance attribute with the server object we would like to use. This is the first attribute checked by the Server.httpserver_from_self() method. The Server.instance attribute defaults to None, and if this is still the case once Server.httpserver_from_self() is invoked, it will simply create and return a new CPWSGIServer instance to use.

Now we can feel safe, knowing that there will always be an object available for ServerAdapter to manipulate. We left off at the Server.start() method creating the httpserver attribute. Once this is accomplished, the ServerAdapter.start() method is invoked because it is now safe to do so. One observation about the implementation; I'm not convinced that calling the ServerAdapter.start() method with an instance of self is the best design. This is the only way that Server instances can invoke behaviour on the ServerAdapter instance, even though in theory it is the same instance by inheritance. At the same time, we wouldn't be able to override the method and then call the inherited method if we were to call ServerAdaptor.__init__() from Server.__init__(). The alternative would be to have unique method names between the two classes. Then again, this might end up taking away from the design quality of ServerAdapter. So the question is, which class is more important in terms of design. Just something to think about, not that the current implementation is defective by any means. CherryPy is probably one of the more stable Python packages in existence.

Saturday, March 14, 2009

Using predicates with the boduch library

With the latest release of the boduch Python library, there are two new predicate classes available; Greater and Lesser. These predicates do exactly what the name says. The Greater predicate will evaluate to true if the first operand is greater than the second. The Lesser predicate will return true if the first operand is less than the operand. Here is an example of how we would use these predicates.

#Example; boduch predicates

from boduch.predicate import Greater, Lesser

if __name__=="__main__":
  is_greater=Greater(2,1)
  is_lesser=Lesser(1,2)

  if is_greater:
      print "is_greater is true."
  else:
      print "is_greater is false."
  if is_lesser:
      print "is_lesser is true."
  else:
      print "is_lesser is false"

Here, we have two predicate instances, is_greater and is_lesser. The is_greater variable is an instance of the Greater predicate and will evaluate to true in this case. The is_lesser variable is an instance of the Lesser predicate and will evaluate to true in this case.

With the latest release the library, predicate instances can also accept function objects as parameters. For example, consider the following modified example.

#Example; boduch predicates

from boduch.predicate import Greater, Lesser

number1=0
number2=0

def op1():
   global number1
   return number1

def op2():
   global number2
   return number2

def results():
   global is_greater
   global is_lesser
   if is_greater:
       print "is_greater is true."
   else:
       print "is_greater is false."
   if is_lesser:
       print "is_lesser is true."
   else:
       print "is_lesser is false"   

if __name__=="__main__":
   #Construct predicate instances using function objects as operands.
   is_greater=Greater(op1,op2)
   is_lesser=Lesser(op1,op2)
  
   #Change the value of the operands.
   number1=2
   number2=1
  
   #Print results.
   results()
  
   #Change the value of the operands.
   number1=1
   number2=2
  
   #Print results.    
   results()

Here, we now have two variables, number1 and number2 that will act as operands. Next, we have two functions that will return these values, op1() and op2(). Next, the results() function simply prints the result of evaluating the predicates. In the main program, we construct the two predicate instances, passing the op1() and op2() functions as operand parameters. Next, we initialize the number1 and number2 variables and print the result of evaluating the predicates. Finally, we change the value of number1 and number2 and once more print the results. You'll notice that the results will have reflected the change in number1 and number2.

Friday, March 13, 2009

The need for a simplified pypi package.

Given the growing complexity of many Python applications these days, developers often use other packages and libraries to help manage this complexity. TurboGears, for instance, will fetch several other packages from PyPi when it is installed and install these packages as well. PyPi provides access to packages that the Python community has provided, possibly because they feel it will be useful in a different context.

However, the PyPi code itself isn't exactly a simple Python package used to host egg files. It is a full-featured, hosted solution. The setuptools package can fetch eggs listed on a simple HTML page. In this case, you wouldn't even need anything other than Apache. However, what would be nice, is a middle-ground. A Python package that uses CherryPy or some other web framework to host the actual packages and provides a very simplistic management interface. I think something like this would be very valuable for packages that are limited by having to retrieving dependencies from PyPi and would need their own repository.

I'm not too sure how difficult this would actually by to implement, I'm only thinking of the need for such a solution at the moment. Perhaps I'll do some experimentation and write about what I find.

ECP and the future extension module architecture

Developers that have been using the Enomaly Elastic Computing Platform over the past year, myself included, have encountered some bottlenecks in the ECP extension module architecture. These aren't show-stoppers in every case but sometimes, they are. It is mostly an issue of architectural design such as "what is the rationale behind this API method?" and "if I build it this way, what is the impact of change in other areas going to bring?" Once we took a step back to think about such questions, we came to the conclusion that these questions should be apparent to any developer using the platform or at least easily answered by Enomaly. Right now, much of the ECP extension module framework isn't apparent how to use and we even have a hard time explaining it.

So, this has led the ECP development team to address some of the issues highlighted here.

One of the first major problems is a problem of uniformity and consistency among the core extension modules that ship with ECP. These extension modules aren't exactly consistent with one another. Some modules will use sections of the API as intended while others will use different sections and others, still, don't use the API at all. In fact, some extension module logic in ECP is coded directly in the core system. This doesn't necessarily cause any harm to anyone who wants to install the base system because these modules are "part" of the base system. They are simply constructed as extension modules. Earlier on in ECP's lifespan, we needed to construct an extension module API and these core modules were the perfect way to test out our ideas. So, either way, these core extension modules could have been built in to the core code base. But, it would be nice if they weren't so irreplaceable.

The first step is to introduce a new level of consistency among the extension module that are distributed with ECP. If nothing else, they can serve as useful examples for extension module developers.

The next major defective area within the ECP extension module API is the API itself. Or, rather, lack thereof. What I mean here is that there are plenty of smaller areas of ECP that should be extensible but aren't. For example, if there is some thing small in the ECP front-end GUI that a developer wants to extend, they must replace the entire template, duplicating many already existing elements. This means that many of the core elements, including GUI widgets, would need to become part of a bigger extension module framework. That is, we would need to move them to extension modules. And that is fine with me. A smaller core is easier to maintain and thus more stable.

There have already been some changes introduced to the ECP extension module framework in the past year. We identified the need for extension modules to store their own static data such as javascript and CSS files. To address this, we introduced methods to register these static components.

There is still a decent amount of work to do in order to realize these changes. They have been identified and that is a very good thing. Also, this is by no means a closed list of issues with the extension module framework that need fixing. This is a good starting point. I've already begun fixing the consistency problem with the core extension modules.

How Pylons connects to the ORM

The Pylons Python web application framework manages database connections for any given application written in the framework. It does so by using a PackageHub class for SQLObject. This is actually similar to how TurboGears manages database connections. The database functionality for the Pylons web framework is defined in the database.py module. This are slightly different for SQLAlchemy support in Pylons. In order to provide SQLAlchemy support, Pylons will attempt to define several functions SQLAlchemy requires to connect to the database.

Something I find a little strange is the way the support for both SQLAlchemy and SQLObject is handled by Pylons. Pylons will attempt to import SQLAlchemy and handle the import error. However, Pylons will always attempt to import SQLObject and will not handle an import failure if the library isn't installed on the system. For instance, the following is a high level view of how the database ORM libraries are imported in Pylons.

There is a slight asymmetry here. At the very least, I think SQLObject errors should be handled as well. But what would happen in the event that there are no supported ORM libraries available for import? That would obviously leave us with a non-functional web application. A nice feature to have, and this really isn't Pylons-specific, is the ability to specify in a configuration sort of way, which ORM library is preferred. The database.py module could then base the imports on this preference. For instance, illustrated below is how the ORM importing mechanism in Pylons might work if the ORM preference could be specified.

Here, the flow is quite simple. We load the configuration data, check which ORM was specified and attempt to import it. On import failure, we complain about an ORN not being available. Of course, we will most likely want a default ORM preference if one is not provided. I think that would provide a much cleaner design than basing the ORM preference on what can be imported. There is certain enhancement functionality in which we can base the availability on the fact that the library can be imported. Such as a plugin for a system. But, these are only enhancements. We can't really make these assumptions about a core component like a database connection manager.

The SQLAlchamy connection functionality in Pylons has no notion of a connection hub. There is absolutely no problem with this. The functions are what is needed to establish a connection to SQLAlchemy and they work. For illustration purposes, lets make a new pretend class that doesn't exist called SQLAlchemyHub that groups all SQLAlchemy-related functions together. The following is what we would end up with when visualizing the database.py module.

It is important to remember that the SQLALchemyHub class isn't real. It is there to help us visualize the SQLAlchemy abstraction within the context of the module.

Thursday, March 12, 2009

Colour as a visual cue in UML diagrams

I've been using ArgoUML quite often lately to produce some UML diagrams. One thing I've been experimenting with that I haven't done before is using colour as a visual queue within my diagrams. ArgoUML is one of the open source modeling tools I use that supports colour in the element canvas.

For instance, consider the following class element.

This is just a simplistic BlogEntry class. We're not so much concerned with the actual class definition here as much as we are with the presentation. The image about is taken from the default colour scheme in ArgoUML. What if for some reason I wanted to change the colour of this class to green? This is easy in ArgoUML. Simply select the class element and click the Presentation tab. You'll notice there are both fill and line colour fields for element colours. Generally, you will not want to touch the line colour. The line colour defaults to black which means you should stick to light fill colours. In our case, we want to change the fill colour to green, afterward, we end up with the following.

As you can see, this presentation of the class stands out much better than the first. I'm not sure if that is because of the colour green, or simply because it contrasts with the surrounding colour, white. One feature I have noticed that is missing from the ArgoUML presentation repertoire, is the lack of ability to change the colour of class attributes and operations. For instance, if I select and attribute or operation, the Presentation tab becomes disabled. I think this would really come in handy when discussing a particular aspect for a class. Even better, if an API documentation generator were able to user this type of image generation for specific method documentation pages. However, this is a long way off I think.

I've only shown a single element and how a simple change in colour can enhance the visual aspect of the idea you are trying to model. Things start to get more interesting when you start to add a visual colour distinction between element types. For instance, consider the following where I place my BlogEntry class into a package.

There are two elements here, a class and a package. Although the UML defines different notation for different element types, the colour distinction makes for easier reading.

One final example. If we have a new element within the package and we want do emphasize one element more so than the other, are best bet is to do so with colour.

Here, we only have two elements within the package. The fact that they are coloured differently makes it clear that they are different ideas. However, be careful with elements that contain several sub-elements. You are creating a model not a rainbow and rainbows will only make understanding more difficult when it comes to software.

More open source migration

According to this entry, the French police force is in the middle of migrating their entire desktop infrastructure from Windows to Ubuntu and the move has already saved them millions of euros. A fascinating conclusion they came to after adopting the migration strategy is that it would have cost more to upgrade the existing Windows infrastructure! I would say that is a good indication that the Windows operating system needs some serious work to stay afloat in the coming decade.

Another interesting aspect of this migration strategy is how it all started by replacing Microsoft Office with Open Office. Open Office can run on Windows, like many other open source software solutions, and by so doing gives users an opportunity to get familiar with the interface. Not only is the general user experience eased into the migration away from proprietary software, but the concept of open source alternatives is introduced. Some of the users may already be familiar with how open source software works, but many wouldn't be.

Open Office is a good example of open source software that is tightly designed after its' proprietary equivalent. There is almost zero learning curve involved. What I think would be really cool to see is the reaction from the users not familiar with Ubuntu after the switch is made. After switching to Open Office, the user may think "wow, this is really cool software, I can't believe its free". After switching to Ubuntu, they might think "what were we paying for earlier?".

I think this story serves as an excellent example for companies currently engulfed in proprietary software to move to open source.

ECP 2.2.3 released

The latest stable version of the Enomaly Elastic Computing Platform has been released. Changes include:

New approach to handling an index error caused by SQLObject.
Better handling of ECP clustering when the host machine record does not exist in the database.
Fixed a defect in the way remote eggs are installed with the vmfeed extension module.
Better exception handling when importing existing Libvirt domains.
Fixed several invalid javascript references.
Small CSS fix.

ECP and SQLObject

It seems that everyday we have a new reason to move away from using SQLObject as an object-relational mapper in ECP. The latest issue with SQLObject has been rather challenging to work-around. The problem comes from the ErrorMessage class defined in SQLObject. Here is what the class looks like.

#SQLObject ErrorMessage class.

class ErrorMessage(str):
  def __new__(cls, e, append_msg=''):
      obj = str.__new__(cls, e[1] + append_msg)
      obj.code = int(e[0])
      obj.module = e.__module__
      obj.exception = e.__class__.__name__
      return obj

The problem we are experiencing with ECP is the fact that this class always raises an IndexError. The reason being, the ErrorMessage.__new__() method makes the assumption that the 0 and 1 indices will always be available in the e parameter. The e parameter is supposed to be an instance of Exception.

The question that now arises is how do we handle this? In this case, we have exceptions being raised by other exceptions. The ErrorMessage class could simply be fixed by adding exception handling for IndexError exceptions. However, now that the error is fixed, how do we ship this fix along with our application? ECP will currently install SQLObject from pypi. One solution would be to build our own SQLObject package and point the ECP setup to a custom repository that contains this patched-version. One problem I find with this solution is that it could potentially introduce a myriad of other deployment problems.

Another solution is to perform the patch inside of ECP. In this scenario, we don't actually patch the SQLObject package. The SQLObject package would remain as is on the system so that other Python applications using SQLObject wouldn't experience any side-effects as a result of ECP providing a different SQLObject. And this is the approach we are taking. Once ECP has started up, we import the mysqlconnection module and replace ErrorMessage entirely. Here is how it is done.

#ECP approach to patching SQLObject.

from sqlobject.mysql import mysqlconnection

class NewErrorMessage(str):
   def __new__(cls, e):
       if not isinstance(e, Exception):
           e = e.args[0]
       else:
           try:
               dummy = e[1]
           except IndexError:
               e = e.args[0]
                                      
       obj = str.__new__(cls, e[1])
       obj.code = int(e[0])
       obj.module = e.__module__
       obj.exception = e.__class__.__name__
       return obj
 
mysqlconnection.ErrorMessage = NewErrorMessage

What is shown here is a new implementation of the ErrorMessage class; NewErrorMessage. The interface of the original class is kept in tact. What has changed is the exception handling inside the exception. We first test if the e parameter is in fact an Exception instance. Next, we test for IndexError exceptions and rebuild the e parameter if necessary. The method then continues on as in the original implementation. Finally, we then replace the ErrorMessage class with NewErrorMessage. This all happens in enomalism2d so that the new error message class is available right away, before it is actually needed.

As an afterthought, I'm wondering what led SQLObject to this issue to begin with. That is, how can a class so tightly associated with exception handling be the culprit for bigger problems such as this one? Is it that this class is taking on too many responsibilities and thus adding to the risk of raising unforeseen exceptions itself? I wouldn't think so. The ErrorMessage.__new__() method isn't exactly overwhelmed with code. Besides, the exceptions defined in ECP do a fair amount of work when instantiated (including interacting with SQLObject). When the ECP exceptions are raised, they never raise inadvertent exceptions.

Perhaps special care needs to be taken when defining exceptions that do any work. If nothing else, SQLObject provides us with a lesson learned. As developers, we need to be absolutely certain that any given exception we have defined ourselves can be raised under any circumstances. They cannot fail. It would obviously be nice of no code failed at all throughout an entire application. That is obviously not a reality though. The code will fail at some point and having stable exception to deal with will make your code one step closer to being fail safe.

Subscribe to: Posts ( Atom )