Thursday, October 21, 2010

Exceptions From Within

How do you properly raise and handle exceptions? This seems like a straightforward design problem. Accompanying exception handling in your code are questions about depth. For example, function a() raises an exception. Function b() calls a(). Should b() handle the exception? Or, should the exception continue outward in the call stack? The distance in the call stack, between the exception being raised, and the exception being caught, or handled, is up to the developer. This includes raising exceptions from within the exception handler itself.

Are these concerns a matter of coding-style? Or, can we establish a pattern that helps us decide where to raise and handle exceptions? Defining a generic exception handling pattern is hard to do because anything has the potential to raise an exception - intended or otherwise. From a functional perspective, exception handling is all about preventing unexpected events from disrupting your program. From a coding-style perspective, exception handling is all about knowing exactly where any given exception originated. Both are hard to do.

Picture the low-level functions in your code. The atomic functions that don't call anything else. These functions will, at one point or another, raise exceptions. The exceptions raised here will propagate upward in the call stack. Think of an exception as signal that notifies the rest of your application something went wrong. The signal is received at each point in the call stack. The only way an exception will stop propagating upward is if a handler for that exception type lives at that point.

Now, picture a higher-level function. A function that calls other, low-level functions. If this higher-level function knows about the exceptions that might be raised by the functions it calls, it can define a handler for them. This is the how exceptions are handled from a context outside that of which they are raised. All the handler needs to know about are the types of exceptions that might be raised. This is a key exception-handling concept - they are classes and therefore represent a type of thing that can go wrong. Any number of problems may cause an IOError to be raised. The handler only cares that it is an IOError. Anything else will continue upward in the call stack.

An exception handler has two components. The first, a try block, attempts to execute something. The second, an except block, conditionally runs if the first block fails. Is it plausible for us to raise exceptions from within the try block? Imagine an exception handler that handles a TypeError. If the try block itself raises a TypeError, the except block will handle it and the exception signal will stop propagating.

Raising exceptions from within the try block can reduce code verbosity. If I call a function that may raise an exception, you call that function inside a try block. Inside that same try block, you may call another function that doesn't raise an exception. For instance, the function might not return what you expected. In this case, you want your except block to run. It wont, however, because no exception was raised by the function call. Rather than alter the low level function, you can check the result and explicitly raise the exception that enables the except block.

Although this use method of exception handling is useful, it doesn't always feel right. Exceptions are indeed better suited for propagating outward in the call stack instead of being raised and handled in the same statement. Another drawback is that you are potentially blocking exception handlers even higher up in the call stack that might take a different approach to handling the same type of exception.

Friday, October 8, 2010

Cloud Data

Cloud data is probably a better term for what is conventionally referred to as cloud computing. Rather than treating the cloud as a computing resource, we're using it as massive data store. This is only logical given that we're running out of places to put our information. Under typical circumstances, cloud data will suffice. A place where we can instantaneously fetch the information we need. Despite monumental storage capabilities, applications in the cloud can only process so much data without more CPU cycles and memory. How, exactly, is cloud data complimented by cloud computing resources?

First, I want to walk through how I put my data in the cloud. I have some data, lots of data probably, that I want accessible from anywhere in the world. I find some cloud computing service that allows me to do this. I give them my data, I later retrieve the bits and pieces I need. Everything is good. This is a little over-simplified - we put structured information into the cloud. We give social networks profile information. We give photo sharing platforms categorized images. We give cloud service providers virtual machines that are run on our behalf. All these things amount to data we put into the cloud and later retrieve it.

Virtual machines are unique in that they represent both data and computing resources. The ultimate purpose behind a virtual machine is to run software. But they also occupy a significant amount of disk space, as does any other cloud data. If virtual machines fall into cloud computing category, how is running it in the cloud beneficial? In the end, a virtual machine can only run on one physical machine at a time.

The problem is figuring out how to better utilize computing resources in the cloud - giving applications more CPU cycles and more memory. The distributed nature of the cloud is crucial for concurrency. Each node in the cloud adds another level of concurrency for your application to capitalize on. This is how applications running in the cloud can utilize computing resources. Executing two or more things and the same time will increase your throughput. The nice thing about concurrency in the cloud is that it implies more CPU cycles and more memory because of the distributed physical nodes. Applications equipped to run on multiple nodes are cloud-worthy.

One way an application can adapt to multiple cloud nodes is by implementing distributed algorithms. A distributed algorithm is an algorithm that takes into account the notion of time and space. Time and space on a single node is taken care of by the operating system. Your application might use some form of a map/reduce algorithm. It might implement a spanning tree, or a routing table of some kind. These are all examples of how an application utilize cloud computing resources.

Cloud data by itself isn't enough meet the computing resource requirements of applications running in the cloud. The cloud is proficient when it comes to storing vast amounts of information and making it globally available. We have yet to utilize the available cloud computing resources to their full potential. The cloud, as a whole, is nothing more than a networked group of hardware nodes. We've put the hard disks to good use. Its up to the developers to design applications that appropriate the CPU and memory in a meaningful way.