Thursday, January 29, 2009

Teachers and open source

An interesting entry about teachers and their ignorance toward open source software has made me happy. Not the teachers, the fact that people are willing to voice the problems caused by proprietary software in the education system.

The entry states that Canada is making larger steps than the US to improve open source adoption in schools. Unfortunately, I highly doubt that is true.

I think there should be a huge sense of urgency here. Instead, it seems some of the teachers that aren't completely oblivious think "OK, there is Linux, but there is nothing I can do about it".

Perhaps teachers should encourage students to use Linux and other open source projects outside of the classroom. Rather than merely "allowing" it. Let the students insist on bringing open source into the education. I would think this would be a much easier task with student support.

Wednesday, January 28, 2009

boduch API

The boduch API documentation is now available.

ECP 2.2 released

ECP 2.2 has finally been released. Outlined below are the changes.

Core

The ECP installer will now automatically generate a uuid for the host. Also, the installer will now synchronize with the local package repository if one exists. Several Xen fixes are now carried out by the installer that allow ECP to better manage Xen machines. The exception handling has also been drastically improved in the installation process.
The core ECP data module has many new features as well as many bug fixes. Several subtle but detrimental object retrieval issues have been resolved. This alone fixed several issues that were thought to be GUI related in previous ECP versions. The new features include added flexibility to existing querying components and newer, higher level, components have been implemented. These newer components build on the existing components and will provide faster querying in ECP.
The configuration system has gone through a major improvement. It is now much easier and efficient to both retrieve and store configuration data. This affects nearly any ECP component that requires configuration values.
The extension module API now allows extension modules to register static directories as well as javascript. Some of the core extension modules are already taking advantage of this new offered capability. This helps balance the distribution of responsibilities and increases the separation of concerns among ECP components.

GUI

There have been many template improvements that promote cross-browser compatibility. Many superfluous HTML elements have been removed and others now better conform to the HTML standard.
A new jQuery dialog widget has been implemented. This widget is much more robust and visually appealing than the dialog used in previous ECP versions.
General javascript enhancements will give the client a nice performance boost and improve on the overall client experience.

Testing

With an emphasis on improving the ECP RESTful API design in this release, the requirement for automatically invoking various ECP resources came about. Included in this release is a new client testing facility that can run tests on any ECP installation. Although the tests are limited, they continue to be expanded with each new ECP release.

vmfeed (extension module)

A big effort has been undertaken in analyzing the deficiencies with the previous versions of the vmfeed extension module in order to drastically improve its' design for this release. One of the major problems was the lack of consistency in the RESTful API provided by vmfeed. Some of the resource names within the API were ambiguous at best while some important resources were missing entirely. There has been a big improvement in both areas for this release.
Another problem was the actual design of the code that actually drives the extension module. Much of the code in vmfeed has been re-factored in order to produce a coherent unit of functionality. As always, there is still room for improvement which will come much more easily in future iterations as a result of these changes.

machinecontrol (extension module)

In previous ECP versions, when operating in clustered mode, removal of remote hosts was not possible. This has been corrected in this release.
The machinecontrol extension module will now take advantage of the new ECP configuration functionality.
When deleting machines, they are now actually undefined by libvirt.

static_networks (extension_module)

The static_networks extension module will now use the newer ECP core functionality in determining the method of the HTTP request.
Refactoring has taken place to remove the static_networks javascript from the core and into the actual extension module package. This improves the design of both the static_networks extension module while reducing the complexity of the ECP core.
The static_networks extension module will now take advantage of the new ECP configuration functionality.

transactioncontrol (extension module)

The transactioncontrol extension module will now use the newer ECP core functionality in determining the method of the HTTP request.

clustercontrol (extension module)

Major improvement in the RESTful API design. Some invalid resources were removed while others were improved upon.
The clustercontrol extension module will now use the newer ECP core functionality in determining the method of the HTTP request.
The clustercontrol extension module will now use the newer ECP core functionality in determining the method of the HTTP request.

C remains popular while Python is still low-key.

An interesting entry cites the C programming language as the most popular choice for new open source projects. There are also some other languages that followed C as popular choices. Python wasn't one of them.

So what does this mean? Absolutely nothing. It means that there are several existing and successful Python projects out there. Although I'm not a huge fan of some of the other languages mentioned, as a Python developer, I do like C. Python and C interoperability at the system level isn't too difficult to achieve.

Of course, this isn't a requirement. Ideally, if some new killer open source application written in C is made available, you can make bindings between it and your Python application.

Tuesday, January 27, 2009

Use firefox

This is simply a plea for people to start using Firefox or at least spread the word if you already use Firefox. Although attempts are being made to force Microsoft to bundle Firefox with Windows.

The IE 8 release is nearing a "stable" release. Don't upgrade. Switch to Firefox.

Gaphor plugin example.

The Gaphor UML modeling application provides an example hello world plugin. Obviously this plugin doesn't serve any useful purpose in the real world but it does give a good example of how Gaphor can be extended. The general layout for Gaphor plugins is similar to most Python packages. It has a setup.py module which enables the plugin to be installed independently of Gaphor.

Gaphor discovers new plugins through entry-points. The plugin must declare an entry-point that is available in the application. In this case, the hello world plugin wants to insert itself into gaphor.services.

The overall goal of the hello world plugin is to alter the menu in Gaphor and display a simple dialog. The actual plugin logic is contained in a single class called HelloWorldPlugin as illustrated below.

Here, the HelloWorldPlugin class provides both the IService and IActionProvider interfaces. The import items of interest are the menu_xml attribute and the helloworld_action() method.

The menu_xml attribute is an XML string that specifies where the new menu item for the plugin is placed within Gaphor.

The helloworld_action() method is responsible for implementing the action. In the case of the hello world plugin, it will display a dialog. Although not illustrated in the diagram, this method is actually decorated as an action. The decorator provides the action id, label, and tooltip.

Monday, January 26, 2009

boduch 0.0.9

The 0.0.9 release of the boduch Python library is now available. Changes include:

Completely replaced the LockManager class. The locking primitives for exchanging data between event threads is now handled by the Python queue module.
Added a new atomic parameter to the EventManager.publish() method. This allows handles to be executed by the same thread that published the event. Event when the event manager is running in threaded mode.
Added a new max_threads attribute to the ThreadManager class. This is the maximum number of threads allowed to execute.

ORM strengths and shortcomings

Object-Relational Mapper technology is used in object-oriented languages to try to reduce the amount of SQL contained inside application logic. ORM libraries exist for several dynamically-typed languages. Two popular Python ORM libraries are SQLObject and SQLAlchemy. The basic idea behind an ORM is that persistent objects within an application are mapped to a database table. The table schema is derived from the class declaration of the object to be stored.

For instance, here is an example of a BlogEntry class using SQLObject

#SQLObject declaration example.

from sqlobject import *

class BlogEntry(SQLObject):
 """An abstraction representing a BlogEntry."""
 class sqlmeta:
     table ='blog_entry'
     idName ='id'

 uuid = StringCol(length=36,\
                 unique=True,\
                 alternateID=True,\
                 alternateMethodName='by_uuid',\
                 default=gen_uuid)
 title=StringCol(length=80,default="Title Placeholder")
 body=StringCol(default="")
 user=ForeignKey('User', default=None)

Here, we have a blog entry abstraction. The BlogEntry class defines a meta class called sqlmeta. This meta class is used to specify table-specific information used by the database when the table is created. For instance, the underlying table in the example will be called blog_entry and will use the id column is the primary key. We have also defined several columns for our class. These columns will serve as attributes for any instances of this class. Once an instance of BlogEntry is created, the ORM will automatically create a table row in the database.

I consider this to be a real strength of ORM technology. It drastically simplifies the abstraction storage requirement. There is no need to write SQL CREATE statements. Or INSERT and UPDATE statements for that matter. There is nothing specifically wrong is SQL. SQL is extremely expressive and powerful. The problem arises when combining SQL with application logic in an interleaved manor. This leads to unmaintainable systems.

One approach to decoupling the SQL required for persistent objects from the behavior implemented by the objects in an object-oriented system is define SQL templates. For example, we might have an define_blog_entry.sql template file. This file could then be read by some database module that then executes the SQL. The developer would then write several other templates for UPDATE, INSERT, and various other database activities. ORM libraries do this very well. There is a very transparent layer that manages persistence.

OK, so how about querying? How do we get our objects back from the database? Well, the ORM also does this. From the example, our BlogEntry class inherits a select() class method from the SQLObject class. Using this method we can pass various criteria in order to retrieve BlogEntry instances.

I think this is the key weakness in the ORM. In a large percentage of cases, it serves well. All we want to retrieve are blog entries. What about when we need multiple types of objects? There is really no way to do this. At least not sensibly. In our example, all we can do is BlogEntry.select(). There is no BlogEntryUser.select() method to retrieve BlogEntry instances and User instances in the same query. Multiple types means multiple queries in ORM land.

SQL along with relational databases are indispensable. Especially the SELECT statement. It is by fore the most effective way to retrieve complex data. ORM technology has done a great job exploiting most of the power SQL has to offer. I just don't thing the querying functionality is as flexible as it could be in most cases.

Friday, January 23, 2009

boduch 0.0.8

The 0.0.8 release of the boduch Python library is now available. Changes include:

Implemented a new ThreadManager. This takes the responsibility of starting new threads away from the EventManager.
Created a new data package in boduch.event for the Set and Hash events.
Created a new data package in boduch.handle for the Set and Hash handles.
Minor bug fixes.

Thursday, January 22, 2009

When inception exceeds elaboration

The inception phase of any software development life cycle is supposed to be the first phase. Even if you are hacking away on something you dreamed of the night before. You still woke up, and thought "I'm going to try this out. Even if it doesn't work, at least I'll know". And that was the inception. You thought of something cool and made the decision to code it.

Even in these trivial cases, there is still a very brief inception phase. Which is how it should be. Well, maybe a little longer than five minutes. But you should never be thinking "Oh wow. This idea is so great. This is going to revolutionize the way people use computers." for too long without actually building anything. Design something right away at the very least. Even jotting down some simple notes is often enough to let yourself or a team member know if you are completely out to lunch.

If you hold on to ideas for too long, you also run the risk of skipping elaboration entirely. Again, this applies to all software development. Even if you are hacking away, at least you started hacking right away. It is an infinitely bad idea to have this great idea for a very long time and assume that thinking about it, or even talking with team members about it, is the elaboration. Because it isn't. Thinking so spells nothing but failure.

Wednesday, January 21, 2009

Open source and governement.

The BBC has an interesting entry regarding the new US government and the role open source software could potentially play in it.

The article outlines the standard benefits to using open source. What I think will be interesting is if the government will live up to its "open and transparent government" promise. Another aspect of open source technology not usually mentioned is job creation. There are always going to be more open source developers in the world because there is no cost to learning it (aside from a computer and Internet connection). If the government were to adapt open source, they will also need people who understand it.

I also wonder when the Canadian government will gets its' act together and start looking at open source technology.

Tuesday, January 20, 2009

Understanding hooks in ECP.

Hooks in ECP are Python decorators that allow developers to replace existing methods by hooking into them. In fact, there is enough flexibility to decide at run time if the original invocation should be replaced by something new. Possibly depending on the state of some other object in the system.

The hook() decorator accepts two parameters: object, and method. The object parameter specifies the object in which defines the method to be hooked. The method parameter specifies the method to be hooked. The decorated function, the actual hook, must specify the same operation signature as the original method invocation.

For instance, lets say we want to hook the Package.get_name() method. We would define the hook as follows.

#ECP hook demonstration

from enomalism2.model import Package, hook

@hook(Package, Package.get_name)
def hook_get_name(fn, self):
  """Hook into the Package.get_name() method."""
  return 'My Package'

Obviously not the most useful hook in the world, we replace the package name that would have been retrieved with a static string. In fact, the original Package.get_name() call is never actually invoked. It is replaced entirely.

Hooks can also be modeled as event subscriptions, where in the system, each method invocation can be considered a published event. Each defined hook can be considered an event subscription. For example, lets modify the previous example to simulate a pre and post event situation.

#ECP hook pre/post event demonstration 

import logging
from enomalism2.model import Package, hook

@hook(Package, Package.get_name)
def hook_get_name(fn, self):
   """Hook into the Package.get_name() method."""
   logging.info("Attempting to get the package name...")
   result=fn(self)
   logging.info("Success.")
   return result

Here, our pre-event functionality logs the fact that we are attempting to retrieve the package name. Our post-event functionality logs the fact that the package name retrieval was successful. To contract the first example, the original method is invoked here. We are simply extending the functionality. What is interesting is the fact that we can do so in any direction. We can add pre-invocation functionality, post-invocation functionality; we can replace the invocation entirely if need be.

Monday, January 19, 2009

Ubuntu and laptop hard drives

I run the latest Ubuntu release on my laptop and it seems like my hard drive is failing. Slashdot has a couple entries that may be related. Firstly, the original entry about the bug. Secondly, it looks like it is closer to being resolved. I think it is too late for myself, but maybe someone else can salvage their hardware.

Sunday, January 18, 2009

boduch.event.EventManager

The boduch Python library provides publish/subscribe event functionality. At the heart of this functionality is the EventManager class. This class is meant to be static in the strictest sense; there should never be EventManager instances. This makes sense if you are using the library in several modules. Some modules may publish events while others may subscribe to them. There needs to be a central place to store subscriptions. This is solved by static class attributes and methods.

Here is a visualization of what the EventManager class looks like.

Every subscription is stored in the subscriptions attribute. The threaded attribute determines if the EventManager is running in threaded mode.

The get_subscriptions() method will return a list of handles corresponding to the specified event. If no event is specified, all subscriptions are returned.

The get_threaded() method simply returns the threaded attribute of the EventManager.

The publish() method will publish the specified event. The keyword parameters are passed along to each handler that is built for the event.

The subscribe() method attaches a handle to a specific event type.

The unsubscribe() method detaches a handle from a specific event type.

The prioritize() method sorts the handles attached to the specified event numerically. That is, the higher the sum of any given handle, the more likely it is to be executed first when attached to an event.

The build_handlers() method will instantiate all handles attached to the specified event.

Finally, start_event_thread() will start a new thread of control. Within, each specified handle is executed.

Saturday, January 17, 2009

boduch 0.0.7

The 0.0.7 release of the boduch Python package is now available. Changes include:

Implemented a new LockManager class for locking in threaded event handles.
Made some enhancements to the is_type() utility function.
Created some new type constants.

ECP e2_exception extension module.

The e2_exception extension module for ECP allows administrators to view exceptions that have been raised by ECP. By default, exceptions are stored in the database. This means that they can be viewed again at a later time. The e2_exception extension module, when installed, provides a table view of exceptions that have been stored in ECP.

You can also view the tracebacks for each individual exception.

The e2_exception extension module also adds exception resources to the ECP RESTful API. Clients can query for stored exceptions, and delete them. Future versions of the extension module will add more features to the API as well as visual enhancements to the GUI.

Thursday, January 15, 2009

Qt-based Gnome a step closer

In a 2008 interview, Mark Shuttleworth spoke of the possibility of a Qt-based Gnome desktop. It looks like that possibility has moved a step forward with the Qt library now being available under the LGPL.

I think there is nothing wrong with the visual appeal of the current Gnome. I also really like Qt so it is a win-win for me.

Wednesday, January 14, 2009

ECP 2.2 coming soon

ECP 2.2 has reached the final testing phase and the ECP development team is working hard to make this release a reality. It should be available early next week but I'll continue to post any release schedule changes.

Tuesday, January 13, 2009

The spreadsheet is 30 years old

It was 30 years ago that the spreadsheet application was invented. PC magazine has an entry about how the spreadsheet has brought society nothing but trouble. For example, the article blames the spreadsheet for many miscalculations in the past rather than the people responsible for building the spreadsheet.

I think the spreadsheet was an ingenious idea that has much broader applications than accounting tasks. The spreadsheet is simply a tool to visualize and manipulate data. In the end, it is the human interpretation of that data that leads to the undesirable consequences. Blaming the spreadsheet application for a financial crises is like placing the responsibility of car accidents on the invention of the automobile, rather than the people who drive them.

boduch 0.0.6

The 0.0.6 version of the boduch Python package is now available. Changes include:

Fixed a major bug in EventManager.subscribe() that allowed the same handle for a single event to be subscribed more than once.
Type instances now have a uuid attribute which is generated by the constructor.
EventThread will now inherit from Type.
Improved the EventManager interface.
Improved the Event and Handle interfaces.
Implemented new Set and Hash event handles.

Monday, January 12, 2009

Flirting nerds?

Yep, it is true. They actually have a class for us now in Germany.

The class is supposed to teach the IT students flirtation and social skills. I think many nerds who would be considered "socially inept" may actually have required skill-set and choose not to be sociable. The class is a great idea for those who choose it. It shouldn't, however, be forced upon students.

Saturday, January 10, 2009

ECP permissions

Permissions in the ECP system uses the CRUD permission architecture. The permissions for any object with a UUID may be validated for create, read, update, delete operations. One benefit to the CRUD architecture is that it maps well to a RESTful architecture.

In ECP, there is a permissions table that has three columns; u_uuid, t_uuid, and perms. The t_uuid column is the target object. The u_uuid is the user column. The perms column represents the CRUD permissions on the target object. The queries executed on this table are generally quite fast. We can simply query the table for the target object, user, and permissions. If any results are returned, we know that the user has permission on the object.

This table is actually created at ECP install time by the permissions_fast extension module. This module defines hooks that replace the identity framework method of checking permissions. It replaces the default functionality with the quires described above. Perhaps in the future, these queries should be the default functionality since the permissions_fast extension module is installed and enabled by default.

TurboGears 2.0 and SQLObject support

Since I use the TurboGears Python framework quite often, I was curious to see if SQLObject would still be supported by TurboGears 2.0. Searching around on the web yielded no results so I took look at the TurboGears source.

It does not look like SQLObject will be around in TurboGears 2.0.

Friday, January 9, 2009

Template considerations in ECP

Recently, the ECP development team has started to question the validity of building HTML templates that essentially do the same thing as the ECP RESTful API. Of course we need to build HTML templates, it is a web application after all. What the discussion here comes down to is the age old question of separation of concerns. How much logic should be contained in the template if any? Kid templates, for example, offer several means of implementing logic within the template. I think the ability to construct page elements iteratively within the template itself is a powerful concept and solves a very basic problem. Iterating through a set and building elements does not necessarily mean you are not carefully considering the separation of concerns idea. Who knows, the set being iterated through may be data that was specifically built for the template. That is the idealistic case but it is also hard to come by in software development given the common time constraints today.

Anyway, back to the problem of how to build a web interface without building gigantic HTML templates. One may think that since in the case of ECP the problem is simple; there already exists a RESTful API, why not use some javascript library such as jQuery to build the entire HTML content based on the results of querying the RESTful API? If it were that simple, I think everyone would build things that way would they not? I mean, it would make sense to build a web application with a fancy GUI interface that is inherently flexible enough to support any client that was built to use it. Without the RESTful API, all you have is the stand-alone server with many browsers executing the same client code connecting to the server. Not a whole lot of diversity in this situation. As far as I'm concerned, the days of writing HTML parsers to fit a particular client purpose are over. The demands have risen and for good reason.

The moral here, if there is one, is the bottom line is the software does what it is supposed to do and that clients can interact with it somehow. If it is only a browser that needs to communicate with your application, build HTML templates. If there are only custom-built clients using your software, use JSON/XML. If you can do both, do it. That is what the current boat in which ECP resides. In the future, hopefully we can develop some javascript that is sophisticated enough to build the entire interface using the RESTful API data. I think this is a very real possibility.

boduch 0.0.5

The 0.0.5 version of the boduch Python library is now available. This is a minor release:

Added more unit tests.
Added more API documentation.

Thursday, January 8, 2009

2008: Linux, Python, and free software

LWN has a 2008 timeline listing significant events in the open-source world. Glad to see Python 3.0 on the list!

Evolution of the updating the cache in vmfeed module.

Remote packages in ECP are managed through an extension module called vmfeed. This module is a core extension module and is distributed along with the application. Remote repositories are essentially RSS feeds that are read by vmfeed and each entry is then updated in the local database (the cached entries).

Within the vmfeed extension module there is a RepoFeed class that represents an installed repository. The RepoFeed.update_cache() method is responsible for reading the feed XML, and updating the database with each entry that is found. Here is what the ECP 2.1 version of the method looks like.

#ECP 2.1 version of RepoFeed.update_cache()

def update_cache(self):
 """This method will update the cache (the RepoEntry rows) with the new
    versions of all of the data in the database.
    @param self: The method class.
    @type self: L{vmfeed.model.RepoFeed}
    @return: None
    @rtype: None
    @raise None: No exceptions are raised by this method.
    @status: Stable
    @see: L{vmfeed.model.RepoFeed.validate_enclosure}"""
 if not self.cache:
     return False
 self.retrieved_on=datetime.datetime.now()
 feed=ET.fromstring(self.cache)
 e2_log('Got a good feed %s'%feed, location=__name__)
 feedname=None
 feedname=feed.find('channel/title')
 if feedname!=None and feedname.text.strip()!="":
     e2_log('Updating feed name to %s'%feedname, location=__name__)
     self.name=feedname.text.strip()
 description=None
 description=feed.find('channel/description')
 if description!=None and description.text.strip()!="":
     e2_log('Updating description to %s'%description,\
             location=__name__)
     self.description=description.text.strip()         
 items=feed.findall('channel/item')
 if not len(items):
     return 0
 for i in items:
     name=i.find('title').text.strip()
     description=i.find('title').text.strip()
     try:
         description=i.find('description').text.strip()
     except:
         pass
     U=None
     U=i.find('uuid')
     if U!=None:
         U=U.text.strip().lower()
     else:
         U=gen_uuid()
     enclosure=None
     enclosures=i.findall('enclosure')
      
     #Only the first enclosure matters, unless it is an egg, in which
     #case we need to look at ALL of them and get the matching python
     #release version.
     #This should all get refactored actually...
     for enclosure in enclosures:
         e2_log('Enclosure is %s'%enclosure.attrib,\
                 location=__name__)
         if enclosure!=None:
             mime=self.enclosure_2_mime(enclosure)
             if not mime:
                 enclosure=None
                 continue
             elif not self.validate_enclosure(enclosure,mime):
                 enclosure=None
                 continue #Only known mime types get stored.
             else:
                 break;
              
     if enclosure!=None:
         #mime=self.enclosure_2_mime(enclosure)
         if not mime:
             continue #Only known mime types get stored.
         e2_log('Found an enclosure %s'%enclosure.attrib['url'],\
                 location=__name__)
         url=enclosure.attrib['url']
         url=self.normalize_url(url)
         try:
             if U:
                 re=RepoEntry.by_uuid(U)
             else:
                 re=RepoEntry.by_url(url)
         except:
             re=RepoEntry(url=url,\
                          name=name,\
                          description=description,\
                          feed=self)
         re.set(description=description,\
                name=name,\
                url=url,\
                retrieved_on=datetime.datetime.now(),\
                mime=mime,\
                uuid=U,\
                )
         re.sync()
         #re.retrieved_on=datetime.datetime.now()
         #re.description=description
         #re.mime=mime
         #re.uuid=U
     else:
         e2_log('No enclosures found in entry %s'%name,\
                 location=__name__)
         if enclosure:
             e2_log('Enclosure had attribs %s'%enclosure.attrib,\
                     location=__name__)
 return 1

The success of the methods' execution is based on the return value. This means that when the method fails, the invoking process is given no useful information when the method fails.

The main problem the ECP development team found with this method is that it is not very cohesive. The responsibilities of this method are very broad:

Parse XML
Initialize repository entry parameters
Iterate through item elements (while performing XML operations)
Iterate through enclosure elements (while performing XML operations)
Check if the repository entry exists and create it if not.

Finally, there is excessive logging that doesn't help with the complexity.

Here is a taste what what the ECP 2.2 version of the same method will look like.

#ECP 2.2 version of RepoFeed.update_cache()

def update_cache(self, tx=None):
  """This method will update the cache (the RepoEntry rows) with the new
    versions of all of the data in the database.
    @param self: The method class.
    @type self: L{vmfeed.model.RepoFeed}
    @return: None
    @rtype: None
    @raise None: No exceptions are raised by this method.
    @status: Stable
    @see: L{vmfeed.model.RepoFeed.validate_enclosure}"""
  self.retrieved_on=datetime.datetime.now()
  feed_xml=get_element(self.cache)
  try:
      name=get_element_text(feed_xml, element='channel/title')
      self.name=name.strip()
  except AttributeError:
      pass
  try:
      desc=get_element_text(feed_xml, element='channel/description')
      self.description=desc.strip()
  except AttributeError:
      pass
  for item in VMFeedTools.get_items_xml(feed_xml):
      try:
          name=get_element_text(item, element='title')
          name=name.strip()
      except AttributeError:
          name=None
      try:
          desc=get_element_text(item, element='description')
          desc=desc.strip()
      except AttributeError:
          desc=None
      try:
          item_uuid=get_element_text(item, element='uuid')
          item_uuid=item_uuid.strip().lower()
      except AttributeError:
          item_uuid=gen_uuid()
      enclosure=VMFeedTools.get_valid_enclosure_xml(item)
      url=get_element_property(enclosure, None, 'url')
      mime=VMFeedTools.enclosure_2_mime(enclosure)
      try:
          entry_obj=VMFeedTools.get_repo_entry(item_uuid)
      except RepoEntryNotFound, e:
          e.store_traceback()
          entry_obj=RepoEntry(uuid=item_uuid,\
                              url=url,\
                              name=name,\
                              description=desc,\
                              retrieved_on=datetime.datetime.now(),\
                              mime=mime,\
                              feed=self)
      else:
          entry_obj.set(uuid=item_uuid,\
                        url=url,\
                        name=name,\
                        description=desc,\
                        retrieved_on=datetime.datetime.now(),\
                        mime=mime)
          entry_obj.sync()

Mission critical or just plain critical?

I've really come to dislike the term mission-critical. The sad thing is, I use it all the time. The problem is, developers are hard-pressed to not use the term when referring to important application components. Is there an alternative? Not if you are speaking in terms of marketing. The "mission-critical RDBMS" sounds much more marketable than "really important RDBMS".

Another problem with the term is its' overuse. I might as well have called my first Python program "mission_critical_hello_world.py".

Wednesday, January 7, 2009

Agile software development

The Hitchhikers Guide to Software has an interesting post on "why the waterfall model does not work". I think he basically nailed the reasons why the waterfall approach to software development does not work. The main reason it fails is the huge amount of optimistic assumption required. There are simply too many variables associated with software development.

What makes agile software development superior to the waterfall approach? I think the name of the methodology speaks for itself. It allows developers to move ahead in the project quickly and easily. Also, as goes without saying, early feedback is invaluable to a software development team.

Inside Query.get_machines()

In ECP 2.2, there exists a new Query class that defines a few class methods for querying the ECP database. The rational behind creating this class was that there are several queries that are very similar but require slightly varying degrees of flexibility. Also, it makes logical sense to group query methods into a class called Query. We know that using this class will in no way affect the state of the database.

The most pressing need for this class are machine queries. For example, there is often a need to select machines based on a specific cluster. Also, we may need to order these results and combine the various filters on a per-query basis.

Here is what the method definition looks like:

#ECP - The Query.get_machines() method.

@classmethod
def get_machines(cls, *args, **kw):
   """Retrieve a list of machines.
    @status: Experimental"""
   filter=False
   join=[]
   if kw.has_key('order_by'):
       order_by=kw['order_by']
   else:
       order_by='machine_name'
   if kw.has_key('machine_uuid'):
       machine_uuid=kw['machine_uuid']
   else:
       machine_uuid=False
   if kw.has_key('cluster_uuid'):
       cluster_uuid=kw['cluster_uuid']
   else:
       cluster_uuid=False
   if kw.has_key('namefilter'):
       namefilter=kw['namefilter']
   else:
       namefilter=False
   if kw.has_key('children'):
       children=kw['children']
   else:
       children=False
   if cluster_uuid:
       try:
           cluster_obj=UUIDSearch.cluster(cluster_uuid)
       except E2ClusterNotFound, e:
           e.store_traceback()
           return []
       filter=[]
       machine_part=Machine.q.id==ClusterMachine.q.machine
       join_part=ClusterMachine.q.clusters==cluster_obj.id
       filter.append(machine_part)
       filter.append(join_part)
       if namefilter:
           name_part=LIKE(Machine.q.machine_name, "%s%%"%(namefilter))
           filter.append(name_part)
   elif machine_uuid:
       try:
           machine_obj=UUIDSearch.machine(machine_uuid)
       except E2MachineNotFound, e:
           e.store_traceback()
           return []
       if children:
           filter=[]
           machine_part=Machine.q.parent==Hypervisor.q.id
           join_part=Hypervisor.q.machine==machine_obj
           filter.append(machine_part)
           filter.append(join_part)
       else:
           return [machine_obj]
   elif namefilter:
       filter=LIKE(Machine.q.machine_name, "%s%%"%(namefilter))
   result=PermSelect.machine(perm='r',\
                             orderby=order_by,\
                             filter=filter,\
                             join=join)
   return result

The first to variables in the method, join and filter, are initialized first. They are always going to be sent to the final PermSelect.machine() invocation. Next, the keyword parameters are initialized. Here, the default values are assigned to the parameters if they are not provided.

If a cluster_uuid was specified, we retrieve the Cluster instance. We then construct the table-joining query component that joins the machine and cluster tables. If there is a name_filter specified, we construct a query component based on it.

If a machine_uuid was specified, we retrieve the machine instance. Then, we check if the children parameter was set to true. If so, we build the joining query components to select the children of the specified machine. Otherwise, we return the single machine instance.

Finally, we use the PermSelect class to execute the final query, which integrates permission checking in the selection. Another benefit of this method is that it will significantly cut down on the number of queries executed eventually.

Python audio scrobbler

I just experimented with the audioscrobblerws Python package and as a LastFM user, I have too say, it is pretty neat.

First, we initialize the service.

from audioscrobblerws import Webservice
ws=Webservice()

Next, if you need to retrieve a LastFM user, you can do so.

user=ws.GetUser('someuser')

You can view the available functionality available to each user.

dir(user)

You will most likely see something like:

['AsDict', 'FillProfileData', 'GetCommonArtists', 'GetCompatibility', 'GetCurrentEvents', 'GetFriends', 'GetFriendsEvents', 'GetNeighbours', 'GetRecentBannedTracks', 'GetRecentLovedTracks', 'GetRecentPlayedTracks', 'GetRecentTracks', 'GetRelatedUser', 'GetSystemEventRecommendations', 'GetSystemRecommendations', 'GetTasteOmeter', 'GetTopAlbums', 'GetTopArtists', 'GetTopTags', 'GetTopTracks', 'GetWeeklyAlbumChart', 'GetWeeklyAlbumChartList', 'GetWeeklyArtistChart', 'GetWeeklyArtistChartList', 'GetWeeklyChart', 'GetWeeklyChartList', 'GetWeeklyTrackChart', 'GetWeeklyTrackChartList', '__class__', '__delattr__', '__dict__', '__doc__', '__eq__', '__getattribute__', '__hash__', '__init__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__str__', '__weakref__', 'age', 'attributeMap', 'avatar', 'cluster', 'country', 'currentEvents', 'friends', 'friendsEvents', 'fullProfile', 'gender', 'icon', 'id', 'mostRecentPlayedTrack', 'neighbours', 'playcount', 'realname', 'recentBannedTracks', 'recentLovedTracks', 'recentPlayedTracks', 'recentWeeklyAlbumChart', 'recentWeeklyArtistChart', 'recentWeeklyTrackChart', 'registered', 'systemEventRecommendations', 'systemRecommendations', 'topAlbum', 'topAlbums', 'topArtist', 'topArtists', 'topTag', 'topTags', 'topTrack', 'topTracks', 'url', 'username']

As you can see, if you are a fan of LastFM and feel like experimenting with this Python package, it offers quite a bit to experiment with.

Implementing a cache with the boduch Python library

I'm going to show a simple example of how to use the boduch.data.Set class to implement a caching. The Set class is nothing but a Python list that emits events. All we need to do is subscribe to the appropriate Set events.

#Using the boduch.data.Set class to implement a cache.

from boduch.data import Set
from boduch.event import subscribe, EventSetPush, EventSetGet
from boduch.handle import Handle

class HandleMySetPush(Handle):
   def __init__(self, *args, **kw):
       Handle.__init__(self, *args, **kw)
      
   def run(self):
       new_obj=self.data['event'].data['obj']
       print 'Cache updated.  Now updating DB with %s'%new_obj
      
class HandleMySetGet(Handle):
   def __init__(self, *args, **kw):
       Handle.__init__(self, *args, **kw)
      
   def run(self):
       index=self.data['event'].data['index']
       set_obj=self.data['event'].data['set']
       print 'Checking if %s is cached.'%index
       try:
           set_obj.data[index]
       except IndexError:
           print 'Not cached.  Need to retrieve from DB.'
      
if __name__=="__main__":
   subscribe(EventSetPush, HandleMySetPush)
   subscribe(EventSetGet, HandleMySetGet)
   set_obj=Set()
   set_obj.push('Hello World!')
   set_obj.get(0)

Here, we have created two event handles for our Set instance; HandleMySetPush and HandleMySetGet. Both event handles are invoked for the Set.push() and the Set.get() methods respectively.

In the main program, we create our Set instance and subscribe our new handles to two Set events.

The first handle, HandleMySetPush, is invoked after the actual Set instance is updated. This means that once the cache has been updated, we now have an opportunity to update a database with this new value. Updating the database is simply an example use of this handler. You could perform whatever action needed.

The second handle, HandleMySetGet, is invoked before the the actual Set instance is updated. This means that the handler can check if the requested object is cached. If not, it now has an opportunity to update the cache before the invokee can complain about a non-existent element.

Tuesday, January 6, 2009

boduch 0.0.4

The boduch 0.0.4 Python tool library has been released. Some of the changes are:

Fixed a minor bug in the ISet interface.
The new Hash data type with event emission has been implemented.
Implemented a new handle for EventSetPush events.

Python benchmarks

Just for fun, I decided to run some Python benchmarks that test the lookup time differences between list, tuple, and dictionary types. The list performance seems to come out on top every time. This is confusing to me because I hear that tuples are supposed to be faster because they are immutable. Here is the test I used:

from timeit import Timer

test_data1=[1,2,3,4,5]
test_data2=(1,2,3,4,5)
test_data3={0:1,1:2,2:3,3:4,4:5}

def test1():
  v=test_data1[2]

def test2():
  v=test_data2[2]

def test3():
  v=test_data3[2]

if __name__=='__main__':
  print 'Test 1:', Timer("test1()", "from __main__ import test1").timeit()
  print 'Test 2:', Timer("test2()", "from __main__ import test2").timeit()
  print 'Test 3:', Timer("test3()", "from __main__ import test3").timeit()

I'm running this on a Intel(R) Core(TM) 2 CPU T7200 @ 2.00GHz machine. I wonder if my test is flawed.

Future of visual modeling

Andrew Watson has a white paper about the past, present, and future of visual modeling. What I find interesting in this paper is the strength and solidity the MOF has given the other modeling standards offered by the OMG. For example, the SysML and BPMN standards both use MOF as their foundation.

The paper also gives some interesting statistics for development projects that derive at least some of their code directly from models. Good read.

Subscribe to: Posts ( Atom )