Friday, February 27, 2009

ECP update

Over the last week or so, the ECP development team has been fixing several issues brought about by the ECP user community. One issue that has been resolved I'd like to point out.

The problem here is that during the ECP installation procedure, any existing Libvirt domains will be imported into the database. Obviously, the machine table must already exist in the database. When the user hits the "/install" url, the installer is run. The user may perform this action even after all the ECP database tables have been defined. This way, any Libvirt machines that have been created by some other machines may be imported.

One of the problems with this method is that sometimes, the machine import functionality is executed before the machine database table exists. Another requirement of the machine import functionality is that the local machine database record has been inserted into the machine database table. This is needed to determine what hypervisors are available on the local machine.

The fix made in this case was to check in the machine table exists. ECP cannot execute this functionality without it. Second, ECP will no longer assume that the local machine exists in the database. It will now check for both the machine table and the local machine record. If either is false, no machines will be imported. In this case, the "/install" url can always be reloaded once the required tables and records have been created.

So, how do we end up in a situation like this in the first place? Shouldn't the ECP installation functionality always ensure that the required data exists before it is needed? This is a perfectly valid concern and the installer does work this way. The table creation is the first task executed by the installer. Then, important records such as the local machine are inserted. The only way this ordering can be altered is if some extension functionality "hooks" into the installer. The hooks have a choice as to the order in which the "hooked" method is executed. The original invocation may happen before the new functionality or after. It is entirely the responsibility of the extension module to ensure that nothing is interrupted in the original behaviour.

That being said, there are several extension modules distributed along with ECP and we'll be keeping a close eye on those as usual. If anyone has noticed a potential defect in a core extension module, feel free to report it here.

Wednesday, February 25, 2009

Interfaces and errors

The basic idea behind defining interfaces and classes that provide those interfaces is to create a contract. This contract specifies that any instances of this class will carry out the behaviour specified within the contract faithfully. It is said that if some instance that provides a given interface, any context in which that instance is used that requires behaviour specified in the contract, the instance is behaving as expected. The instance is error free.

On the other hand, if some behavior is invoked, on some instance, in some required interface context, and the instance does not conform to the contract, the instance does is not behaving correctly.

For example, here is an example of a functional provided interface.

Here we have a simple interface called IDoable. This interface specifies that any classes that provide this interface must implement a do_something() method. The Doer class that provides this interface is valid and functional because it implements the do_something() method.

What about an invalid provided interface? Consider the following non-functional example.

This time, the Doer class does not faithfully carry out the contract specified by the IDoable interface. So if an instance of Doer is used in the context of a required IDoable interface, this will not work as expected.

Depending on the implementation language, the instance may not even have an opportunity to behave in certain contexts because of the lack of one or more required interfaces. Other languages, don't care what type of instances are used in a given context let alone what interfaces they provide. So in either case, the failure of an instance to provide an interface can be taken care of. There will be a compilation error or a runtime error.

This still leaves us with some unanswered questions. In the event of a runtime error, due to the lack of a required interface, is this the result of a design error or should the instance simply not be there. In the case of the design error, chances are that the instance is valid in the context, there is simply a mistake in the implementation of the class. This could in fact be as simple as a missing operation or an invalid operation signature. What about when the instance has no business being in the context which is attempting to invoke behaviour on it? This is obviously not a design flaw in terms of the required interface and the instance that does not provide it. How do you deal with such situations? In interpreted languages, this can be a little trickier. Then, the question is why was this instance placed in the specified context in the first place if it doesn't belong there? We know that all classes that provide the interface in question are implemented perfectly.

There really is no answer. You really have to look at the architectural layers of your application at this point. If instances that cause these types of mishaps belong to the problem domain, you could be in good shape. If hierarchies of problem classes are constructed in such a way as to provide a means to handle these interface errors in the problem domain as well.

Other problem developers can stumble over is the over-reliance on interface conformance. When instances provide the necessary interface in a given context, this doesn't mean that the invoked behavior will execute error-free. Should these errors be handled outside of the problem domain. Again, this is really dependent on your architectural layers and how they are implemented. If rigorous testing results in flawless class implementations in all the problem domain and the application domain, we could then tie the interface errors to the problem domain. Anything else can be considered at the application level and be dealt with accordingly.

Building all these interfaces around the problem domain is a lot of work. Sometimes it is simpler to just define some common exceptions and deal with them, be it application or domain layer.

Tuesday, February 24, 2009

New ECP exception and enhanced state restoring behaviour

Over the past few days, the ECP team has made some notable enhancements in the trunk. The first being the addition of a new exception called E2LibvirtError. As the title suggests, this exception is raised for Libvirt-related issues. The Python Libvirt library already defines an exception class. However, there are many types of errors that can happen from within Libvirt. The idea behind this new E2LibvirtError class is to provide better information when something bad happens in libvirt. For instance, in the Python Libvirt library, there is only one exception type. If this exception gets raised, a short message is displayed. This is the default message that gets initialized with the Python base exception class.

The problem here being that Libvirt can manage several different hypervisors on any given system. Thus, there are several layers within Libvirt in which something can go wrong. In ECP, the Libvirt exception is caught, and the generic message is recorded.

The new E2LibvirtError exception exploits additional exception information encapsulated within the basic Libvirt exception instance. I don't mean encapsulated in the traditional object-oriented sense. I mean the information is there, and ECP should use it for the benefit of the end user. The new ECP exception, when instantiated, will take several error codes from the original Libvirt exception and produce a much more meaningful error message.

This leads me to the changes made in the restore_machines_state() functionality. The rationale hasn't changed, only the implementation. We simply handle table existence and local machine existence detection much better than the current version. If the function finds a machine that is not running and it should be (because that was the state the machine was in when ECP last shut down), it will attempt to start it. We've already added the new E2LibvirtError exception handling to this function when attempting to start the machine since this is a Libvirt operation. I've already been seeing much more useful error messages in the logs. These new error messages should also be viewable in the web front-end via the error dialog box when something Libvirt-related goes wrong.

This does increase the Libvirt coupling in ECP a considerable amount. However, given the level of functionality that ECP would have without Libvirt, I think it is a fair trade off.

Optimistic provisioning in the cloud

One of the technological problems that cloud computing technologies are supposed to solve is the lack of computing power when it is needed. Computing on demand, so to speak. The elasticity of the cloud enables this.

The classic example of this is when a web site operating in the cloud gets "slashdotted" and does not have the necessary computing resources required to fulfil the requests, your site dies and readers (soon to be ex-readers) will be disappointed. Luckily, your site is running in a cloud environment and has the ability to "expand" its' computing when the demand requires it.

What happens when the actual expansion takes place? Generally, a new virtual machine is created and that machines' resources are now available to the process that requires it. The process in this context refers to the overall business requirement that caused the expansion event in the first place. The process that says "give me more computing power" may in fact result from a general discussion amongst several nodes in the cloud.

Here, we have a simple controlling process that handles requests. These may be client requests or requests from other nodes in the same cloud. The controlling process then forwards the request to a resource management process. It is the responsibility of the resource manager to ensure that computing resources are available to fulfil every request. This is where the bottleneck lies, in has_resources(). In the most common case, there are plenty of resources available and has_resources() has very little work to do. However, when resources start to dry up, it needs to make more resources. This is where the costly work of the resource manager lives. It would be great if there were some way to know ahead of time what the peak resource demand will be.

Unfortunately, there is really no reliable way to do this. The best we can do in this situation is guesswork. The resource manager could monitor the distance between the size of resource requirements in a given time interval. Certain thresholds could then be set and once reached, we could then provide resources based on what the probable resource demand will be in the near future.

For instance, lets say I have a simple running within a cloud environment. I post a new entry, "a ton of traffic". Now, before I post this entry, I have an average demand of 5 requests per hour. An hour after posting, the resource manager notices that my average has doubled to 10 requests per hour. This is something that could be handled very comfortably be my service. However, the suddenness of this relatively large change could put the resource manager on alert. Now, hour two after posting "a ton of traffic", the number of requests reaches 20 requests per hour. It seems that this raising demand trend is continuing. The resource manager would then proceed to making more resources available.

With this approach, there is always the risk of over-provisioning resources. This type of data can be misleading. However, it does lend a guiding light toward proactive provisioning. Besides, if the statistical data is misleading, it is better to cleanup over-provisioned resources than being trying to do a huge provision job during the high resource demand.

Monday, February 23, 2009

An argument against XML

My argument against using XML as a data format in certain situations is that it is too verbose. In other situations, however, the verbosity provided by XML is needed. Such as for human consumption. This is why XML exists, it is easy to use and read by both humans and computers.

The verbosity problem with XML stems from the use of tags. Every entity represented in XML needs needs to be enclosed in a tag. The opening tag indicating that a new entity definition has started and the ending tag indicating the end of that definition. For example, consider the following XML.

<person>
<name>adam</name>
</person>

This is a trivial example of a person entity with a single name attribute. Notice the duplication of the text "person" and "name" in the metadata. With XML this is required. However, tags may also have attributes. Our person definition could be expressed as follows.

<person name="adam"/>

Here there is no metadata duplication. But I think the second example negates the readability philosophy behind XML. What exactly is the difference between attributes and child entities in XML? Semantically, there is none. A child entity is still an attribute of the parent entity.

With JSON, there is no duplication of metadata or any confusion of how an entity is defined. This is because the JSON format is focused on lightweight data, not readability. For instance, here is our person in JSON.

{person:{name:"adam"}}

Now, if a person were reading this, the chances of them getting the meaning right are greatly reduced when compared to the XML equivalent. However, it is much less verbose in most cases. And verbosity counts when data is being transferred over a network. Another plus, the XML is not lost. JSON can easily be converted to XML and back. So if JSON-formatted data must be edited by humans as XML, this is not difficult to achieve.

Here is a simple Python demonstration of reducing the size of XML data with JSON.

#Example; XML string and JSON string

xml_string="""
<entry><title>mytitle</title><body>mybody</body></entry>
"""

json_string="""
{entry:{title:"mytitle",body:"mybody"}}
"""
if __name__=="__main__":
print 'XML Length:',len(xml_string)
print 'JSON Length:',len(json_string)
pcent=float(len(json_string))/len(xml_string)*100
print 'XML size as JSON:',pcent,'%'

Finally, since XML is based on tags, there is no opportunity for sets of primitive types. For example, some client says to the server "give me a list of names and nothing else". The client will likely name something along the lines of the following.

<list>
<item name="name1"/>
<item name="name2"/>
<item name="name3"/>
</list>

Here is the JSON alternative.

["name1", "name2", "name3"]

Friday, February 20, 2009

Open source too insecure for government use?

Well, according to an infoworld entry, security firms seem to think so. Rather than go through all the odds and ends of comparing the security differences between proprietary software, I propose a simple experiment. Set up two desktops. Install some variant of Linux on one and install some popular proprietary operating system on the other. Perform some simple everyday tasks and see which one gives you more security problems first.

Thursday, February 19, 2009

How to manage technical documentation for varying levels of competency?

An interesting question on slashdot asks exactly this. Two things spring immediately to mind for me:

Trac
Is this possible?

The Trac wiki system would be my first choice simply because I'm familiar with it. Anyone with moderate Trac experience can teach the concepts to other people fairly easily. Developers and anyone else using the system.

So, the basic problem re-stated; "how do I provide a simple and easy way for people of with different levels of knowledge toward a given subject access to that information?". Using Trac, you could start by getting all required content into a page. This includes every possible detail imaginable.

Next, suppose we have written a Trac plugin that defines processors you can use to wrap around sections of text based on the required expertise. For instance, you could have the following processors defined:

Expert-topic
Intermediate-topic
Moderate-topic
New-topic

The second part of this theoretical plugin would need to extend the user accounts to allow the ability to specify what the user knows and at what level. Of course, each page would also need to be categorized as well.

The question of is this possible comes not from the technical end but from the business side of things. My answer to this is another question. How accurately can users' knowledge for a given topic be rated? This problem is eliminated if we were allow users to rank themselves in regards to topic competence.

Strange Gnome screenshot behaviour

I noticed some strange Gnome behaviour when attempting to take a screenshot. This only seems to happen in firefox. If I want to take a shot of an expanded menu on a form, Gnome wont let me. I assume that this has to do with the window focusing. It seems to me that this would be, well, not a common requirement but it is a failed use case nonetheless.

Wednesday, February 18, 2009

New ECP community site.

I'm pleased to announce that the new Enomaly ECP community site is up and running. Feel free to report bugs, request features, or check out the documentation.

Python memory Usage

Here is an example in Python of how to retrieve the system memory usage. This example was adapted from an entry on stackoverflow.

#Example; Get the system memory usage.

import subprocess

class MemUsage(object):
  def __init__(self):
      self.total=0
      self.used=0
      self.free=0
      self.shared=0
      self.buffers=0
      self.cached=0
      self.init_data()

  def init_data(self):
      command="free"
      process=subprocess.Popen(command,\
                               shell=True,\
                               stdout=subprocess.PIPE)
      stdout_list=process.communicate()[0].split('\n')
      for line in stdout_list:
          data=line.split()
          try:
              print data
              if data[0]=="Mem:":
                  self.total=float(data[1])
                  self.used=float(data[2])
                  self.free=float(data[3])
                  self.shared=float(data[4])
                  self.buffers=float(data[5])
                  self.cached=float(data[6])
          except IndexError:
              continue

  def calculate(self):
      return ((self.used-self.buffers-self.cached)/self.total)*100

  def __repr__(self):
      return str(self.calculate())

if __name__=="__main__":
  print MemUsage()

Here we have a simple class called MemUsage. The constructor initializes the attributes of the class needed to compute the memory usage. The init_data() method is what MemUsage invokes in order to retrieve the required system data. This is done by using the subprocess module to execute the free command. The resulting data is then mapped to the corresponding attributes. We compute the memory usage as a percentage by subtracting the buffers and cache from the used memory and dividing the result by the total memory.

Tuesday, February 17, 2009

Trac RecaptchaRegisterPlugin problems

The RecaptchaRegisterPlugin Trac extension has a few minor defects I noticed while testing it out for production use. The first problem, the captcha input would disappear if any other fields were invalid. The second problem, if the captcha field was invalid, any existing for data was lost. Here is my updated version of process_request() that addresses both issues.

#RecaptchaRegister trac plugin fix.

# IRequestHandler methods
def process_request(self, req):
  self.check_config()
  action = req.args.get('action')

  if req.method == 'POST' and action == 'create':
      response = captcha.submit(
          req.args['recaptcha_challenge_field'],
          req.args['recaptcha_response_field'],
          self.private_key,
          req.remote_addr,
          )
      if not response.is_valid:
          data = {'acctmgr' : { 'username' : req.args['user'],
                                'name' : req.args['name'],
                                'email' : req.args['email'],
                              },
          }
          data['registration_error'] = 'Captcha incorrect. Please try again.'
          data['recaptcha_javascript'] = captcha.displayhtml(self.public_key)
          data['recaptcha_theme'] = self.theme
          return "recaptcharegister.html", data, None
      else:
          ret = super(RecaptchaRegistrationModule, self).process_request(req)
          h, data, n = ret
          data['recaptcha_javascript'] = captcha.displayhtml(self.public_key)
          data['recaptcha_theme'] = self.theme
          return "recaptcharegister.html", data, n
  else:
      ret = super(RecaptchaRegistrationModule, self).process_request(req)
      h, data, n = ret
      data['recaptcha_javascript'] = captcha.displayhtml(self.public_key)
      data['recaptcha_theme'] = self.theme
      return "recaptcharegister.html", data, n

Saturday, February 14, 2009

Custom Python iterators.

I've been experimenting with writing custom Python iterators. In Python, any sequence type can be iterated over. For example, when you say "for i in list_obj", list_obj actually returns an iterator. For this common case, the developer need not be concerned with the iterator. They know that it will always behave correctly in a for loop.

If you have a class that needs to be iterated over, you can define a custom iterator for your class as demonstrated below.

#Example; Custom iterators.

class MyIterator:
   def __init__(self, obj):
       self.obj=obj
       self.cnt=0
      
   def __iter__(self):
       return self
  
   def next(self):
       try:
           result=self.obj.get(self.cnt)
           self.cnt+=1
           return result
       except IndexError:
           raise StopIteration
      
class MyClass:
   def __init__(self):
       self.data=["Item 1", "Item 2", "Item 3"]

   def __iter__(self):
       return MyIterator(self)

   def get(self, index):
       print "Getting item..."
       return self.data[index]
  
if __name__=="__main__":
   for item in MyClass():
       print item

Here, we have a custom iterator defined, MyIterator. The constructor initializes the object it will iterate over as well as the counter. The __iter__() method is required to return the current MyIterator instance so the iterator can also be iterated over. Sort of like meta-iteration.

The important method is next() which returns the next item in the iteration. In this case, every time next() is invoked, we attempt to return the value of get() and increment the counter. Once there are no more items, we raise a StopIteration exception, which will gracefully exit the iteration.

Finally, you'll notice that MyClass defines a __iter__() method. This is what gives instances of MyClass that ability to be iterated over. The method simply returns a MyIterator instance.

Friday, February 13, 2009

Relational databases are not going anywhere

I stumbled upon this entry which argues against the existence of the RDBMS in distributed applications. I must say I disagree. The "key-value" database movement does attempt to solve some valid concerns. For instance, the complexity involved with deploying a clustered RDBMS can be daunting at best. The ease of which application developers can use these key-value databases is very powerful. I also agree with the assertion that a single schema distributed across n nodes will not be able to scale very easily.

However, at a lower level, there is still no substitute for the RDBMS when it comes to reading and writing persistent data. I often sense a stereotype among developers toward RDBMSes as being a bloated overhead. Again, I disagree. There exist several open source, lightweight RDBMSes that add zero or very little additional effort.

On the schema end of things, they are only fixed at the data level. The application can easily abstract new and dynamic shemas during its' lifetime.

I think several of the new "key-value" database concepts (such has the document abstraction) belong in the application or server layer, not the data layer. As far as nodes in the cloud, each node will benefit from an RDBMS for the foreseeable future.

ECP 2.2.2 released

The 2.2.2 maintenance release of ECP is now available. Changes include:

SECURITY FIX: Removed the automatic installation of egg files from ECP start-up.
Added more exception handling to failed ECP startups due to missing database tables.

Thursday, February 12, 2009

Community involvement is not open source

The NY times has an entry on "open source game design". I'm not sure if the title of this entry is a simple misunderstanding of what open source means. However, the title is misleading. Open source, in the context of software development means code. Thousands and thousands of lines of code. I could not find a single mention of source code in the article.

Open source software is really gaining a lot of traction. Maybe this is why several people who wouldn't otherwise care are talking about it. But titles like the mentioned NY times entry worry me when it comes to people who are new to open source and genuinely interested in what it is and how it works.

The community surrounding projects plays a huge role. The community, in several cases, could even be more important than the code itself. However, the code is available in open. The case mentioned in the NY times entry is not.

Trouble extending Trac navigation.

I'm in the process of building a new Trac site. I wanted to add new menu items to the main navigation. Hiding or rearranging the default menu items in Trac 0.11 is quite straightforward. It can all be done in the Trac configuration. However, new menu items need to be part of a component. Hence, the need for the NavAdd plugin.

It looks like the plugin has not yet been updated to fit the 0.11 plugin architecture, although it does work with 0.11. The Trac plugin API changes weren't too drastic. I did notice some strange behaviour though. It turns out that any menu items added with the NavAdd plugin can not be considered active.

What? No active menu items? Well, it isn't really the NavAdd plugins' fault. It turns out, that in order to have an active menu item, the current request needs to be handled by the same component that produces the menu item. I discovered this by looking at the timeline Trac component. This component actually implements the IRequestHandler interface. This is necessary because the timeline component handles requests to the /timeline URI. The menu item produced by this component becomes active anytime the /timeline URI is visited.

So, this design works well for components that have URIs. But what if I want to add menu items that point to wiki pages? And what if I want to have my menu item become active when visiting those pages? There is currently no way to do this. My suggestion would be that the INavigationContributor interface adds a new uri field to be returned from get_navigation_items(). When page corresponding to this uri becomes active, so does the custom menu item.

boduch 0.1.1

The 0.1.1 version of the boduch Python library is now available. Some changes include:

New Set and Hash functionality. Both object types now support the Python key/index notation.
More unit tests.

Here is an example that demonstrates the new Set and Hash functionality.

#Example; boduch data types.

from boduch.data import Set, Hash
from boduch.event import threaded

def test_set_data():
   set_obj=Set()
   set_obj.push("test1")
   set_obj.push("test2")
   print "SET:",set_obj[0]
   print "SET:",set_obj[1]
   set_obj[0]="updated test1"
   set_obj[1]="updated test2"
   print "SET:",set_obj[0]
   print "SET:",set_obj[1]
   del set_obj[0]
   del set_obj[0]
      
def test_hash_data():
   hash_obj=Hash()
   hash_obj.push(("test1", "value1"))
   hash_obj.push(("test2", "value1"))
   print "HASH:",hash_obj["test1"]
   print "HASH:",hash_obj["test2"]
   hash_obj["test1"]="updated value1"
   hash_obj["test2"]="updated value2"
   print "HASH:",hash_obj["test1"]
   print "HASH:",hash_obj["test2"]
   del hash_obj["test1"]
   del hash_obj["test2"]
      
if __name__=="__main__":
   test_set_data()
   test_hash_data()
   threaded(True)
   test_set_data()
   test_hash_data()

As you can see, we can now treat Set and Hash instances as though they were Python list and dictionary instances respectively. The main difference being of course, all actions are carried out by event handlers. This enables threaded behaviour for some of the actions. For example, pushing new items and updating existing items will be handled in separate threads if threading is enabled.

Tuesday, February 10, 2009

ECP 2.2.1 released

The 2.2.1 release Enomaly Elastic Computing Platform is now available. This is a bugfix release. Changes and fixes include:

Fixed a potential spoofing exploit in the enomalism startup process.
Fixed an import error in the permission_control extension module.
Removed non-functional GUI widgets for host machines.
Fixed several invalid jQuery references.
Fixed a unicode issue in the exceptions module.
Fixed database exception handling in the exceptions module.
Fixed database exception handling in the restore machine state functionality.

The best way to call Python methods

What is the best way to invoke behaviour on objects in Python? Is there any better way than simply invoking a method which defines behaviour for a given instance? Or, given a class method, does the same principle not apply?

I would say it depends how you are using the instances or the classes. In the majority of cases, an instance is created, and some behaviour is invoked from that instance. However, the invoking code doesn't always know what behaviour is supported by the instance. For instance, consider the following example.

#Example; testing is a method is callable.

class MyClass:
 def do_something(self):
     print 'Doing something...'
  
def do_something(obj):
 if hasattr(obj, 'do_something') and callable(obj.do_something):
     obj.do_something()
  
if __name__=="__main__":
 do_something(MyClass())

Here, we have a simple class, MyClass, that will "do something". We also have a function that will "do something". The purpose of the do_something() function is to accept some object as a parameter and invoke the "do something" behaviour on that object.

The problem I'm interested in here is that the do_something() function has no way of knowing for sure that it will be able to "do something" with the provided instance. It is the responsibility of the do_something() function to determine the invoking capability of the instance.

Now, what if we take the responsibility of know the invoking capabilities of the instance away from the do_something() function and gave it to the MyClass class? For example.

#Example; testing is a method is callable.

class MyClass:
   def _do_something(self):
       print 'Doing something...'
   do_something=property(_do_something)
      
def do_something(obj):
   try:
       obj.do_something
   except AttributeError:
       pass
      
if __name__=="__main__":
   do_something(MyClass())

In this example, obj.do_something is a managed attribute instead of a method. The do_something() function no longer needs to test the invoking capabilities of the provided instance. However, more responsibility has been added to MyClass.

Which solution is better? It depends. If our system were simple and we only needed managed attributes for all instances passed to do_something(), the latter approach may simplify things. I think in the majority of cases, the first approach offers more flexibility.

Monday, February 9, 2009

Testing types with boduch

The boduch Python library provides a simple type testing utility function that allows truth tests for both primitive and user-defined data types. Python does offer built-in type testing utilities, but there is currently no unification between user-defined classes and primitive types. Here is an example of how to use the is_type() function.

#boduch type testing example.

from boduch.type import is_type

class MyBaseType:
   pass

class MyType(MyBaseType):
   pass

if __name__=="__main__":
   print "Testing string type."
   print is_type("my string", "str")
   print "Testing boolean type."
   print is_type(False, "boolean")
   print "Testing a class type."
   print is_type(MyType(), "MyType")
   print "Testing for a base class."
   print is_type(MyType(), "MyBaseType")

As you can see, we can significantly cut down the number of lines required for type testing with a single function. Also note, is_type() can also test if the specified instance belongs in an inheritance lattice, as demonstrated in this example.

Friday, February 6, 2009

Python CPU Usage

I recently needed to obtain the CPU usage in a Pythonic-way. After some searching, I found this example. I've adapted it to fit my needs and to fit a more general purpose usage. Here is my version of the example.

#Python CPU usage example.

import time

class CPUsage:
   def __init__(self, interval=0.1, percentage=True):
       self.interval=interval
       self.percentage=percentage
       self.result=self.compute()
      
   def get_time(self):
       stat_file=file("/proc/stat", "r")
       time_list=stat_file.readline().split(" ")[2:6]
       stat_file.close()
       for i in range(len(time_list))  :
           time_list[i]=int(time_list[i])
       return time_list
  
   def delta_time(self):
       x=self.get_time()
       time.sleep(self.interval)
       y=self.get_time()
       for i in range(len(x)):
           y[i]-=x[i]
       return y   

   def compute(self):
       t=self.delta_time()
       if self.percentage:
           result=100-(t[len(t)-1]*100.00/sum(t))
       else:
           result=sum(t)
       return result
  
   def __repr__(self):
       return str(self.result)
  
if __name__ == "__main__":
   print CPUsage()

Here, we have a class called CPUsage. This class is capable of getting the CPU usage as a percentage or in milliseconds. This is specified by the percentage constructor parameter. The interval constructor parameter specifies the length of time we are measuring. This defaults to 0.1 seconds. Finally, the constructor invokes the compute() method to store the result.

The get_time() method will return a list of CPU numbers needed to compute the number of milliseconds the CPU has been in use for the interval we are measuring.

The delta_time() method will accept a time list returned by the get_time() method. It will then return the delta time based on the same list format.

The compute() method will calculate the percentage of CPU usage or simply return the CPU time.

To me, this makes for a more object-oriented approach. This class can also be modified to suite a different usage scenario.

Wednesday, February 4, 2009

How to count Python virtual instructions

Recently, I've had a need to count how many virtual instructions will be executed by the Python virtual machine for a given function or method. It turns out, there is no standard way to do this. But, Python being Python, there is always an easy way out.

The built-in dis module allows us to disassemble the byte-code for any given Python object. However, the challenges are that the results are printed rather than returned. Also, even if the output were returned, we still need to perform some action that will count the number of instructions.

Here is a simple example of what I came up with.

#Virtual instruction count

import sys
import dis

def my_function(op1, op2):
   result=op1**op2
   return result

class VICount:
   def __init__(self, obj):
       self.instructions=[]
       sys.stdout=self
       dis.dis(obj)
       sys.stdout=sys.__stdout__
      
   def write(self, string):
       if string.strip() in dis.opname:
           self.instructions.append(string)
      
   def count(self):
       return len(self.instructions)

print VICount(my_function).count()

In this example, I have a function called my_function(). I would like to determine the number of virtual instructions the Python virtual machine will execute when this function is invoked. For this purpose, I've created a VICount class that is used to count the virtual instructions. The constructor of this class will accept an object to disassemble. We also initialize the list that will store the names of the virtual instructions that are found.

Next, in the constructor, we need to change where the print statements executed by the dis.dis() function go. We do this by changing the sys.stdout file object to self. This is legal since VICount provides a writable file object interface by implementing a write method.

Finally, in the constructor, we need to restore the sys.stdout file object to its' original state.

The write() method is invoked by the print statements executed by dis.dis(). If the string being written is a valid instruction, it is appended to the instruction list.

The count() method simply returns the length of the instruction list. In the case of my_function(), we have six virtual instructions.

Tuesday, February 3, 2009

Web-based applications a good idea?

An entry on infoworld doesn't think so. Apparently the web browser sucks for applications and anyone using them are reverting back to the client server model. Well, it is a client (the browser) and it is a server (whatever). The strong case for the web application in the browser is deployment. There is no easier way to distribute a GUI than the web browser. Unless that is, everyone in the world used Windows.

Of course, the case against the browser is strong too. The same interoperability problems that operating systems experience are shared in the browser domain. In short, your HTML/CSS/AJAX GUI may not work across all platforms (platforms being web browsers in this case).

In terms of GUI interoperability and deployment, the same problems persist. There is no easy way to make everyone happy. And that will always be the case as long as people use different operating systems. There exists no simple unifying solution. Again, with web browsers, you have the interoperability problem. With interpreted languages, it is quite simple to achieve interoperability amongst various operating systems. I think the problem there is deployment. It is a lot tougher to deploy desktop applications that use any kind of data center.

The main problem I see with the suggested solution in the infoworld entry is that there is no mention of distributed data. Lets say I take the advice from the entry and all our customers now use our new Java desktop application. Where did the data go? Any modern requirements will scream distributed data. And that is another complexity there is no alternative for.

All these challenges, distributed computing, distributed deployment, and interoperability, they have been around for a while now. I agree with the author in that for the most part, web browsers suck for any kind of reliable user interactivity. However, I still don't see any solutions over the immediate horizon that will make the browser go away.

Monday, February 2, 2009

libvirt 0.6.0

Looks like libvirt 0.6.0 is now available. There are several new features, bug fixes, and improvements. I just tested this version with ECP 2.2 and everything works just fine.

Sunday, February 1, 2009

boduch 0.1.0

The 0.1.0 release of the boduch Python library is now available. Changes:

Minor release.
Refactored the interface package.
More API documentation.

Subscribe to: Posts ( Atom )