Friday, October 8, 2010

Cloud Data

Cloud data is probably a better term for what is conventionally referred to as cloud computing. Rather than treating the cloud as a computing resource, we're using it as massive data store. This is only logical given that we're running out of places to put our information. Under typical circumstances, cloud data will suffice. A place where we can instantaneously fetch the information we need. Despite monumental storage capabilities, applications in the cloud can only process so much data without more CPU cycles and memory. How, exactly, is cloud data complimented by cloud computing resources?

First, I want to walk through how I put my data in the cloud. I have some data, lots of data probably, that I want accessible from anywhere in the world. I find some cloud computing service that allows me to do this. I give them my data, I later retrieve the bits and pieces I need. Everything is good. This is a little over-simplified - we put structured information into the cloud. We give social networks profile information. We give photo sharing platforms categorized images. We give cloud service providers virtual machines that are run on our behalf. All these things amount to data we put into the cloud and later retrieve it.

Virtual machines are unique in that they represent both data and computing resources. The ultimate purpose behind a virtual machine is to run software. But they also occupy a significant amount of disk space, as does any other cloud data. If virtual machines fall into cloud computing category, how is running it in the cloud beneficial? In the end, a virtual machine can only run on one physical machine at a time.

The problem is figuring out how to better utilize computing resources in the cloud - giving applications more CPU cycles and more memory. The distributed nature of the cloud is crucial for concurrency. Each node in the cloud adds another level of concurrency for your application to capitalize on. This is how applications running in the cloud can utilize computing resources. Executing two or more things and the same time will increase your throughput. The nice thing about concurrency in the cloud is that it implies more CPU cycles and more memory because of the distributed physical nodes. Applications equipped to run on multiple nodes are cloud-worthy.

One way an application can adapt to multiple cloud nodes is by implementing distributed algorithms. A distributed algorithm is an algorithm that takes into account the notion of time and space. Time and space on a single node is taken care of by the operating system. Your application might use some form of a map/reduce algorithm. It might implement a spanning tree, or a routing table of some kind. These are all examples of how an application utilize cloud computing resources.

Cloud data by itself isn't enough meet the computing resource requirements of applications running in the cloud. The cloud is proficient when it comes to storing vast amounts of information and making it globally available. We have yet to utilize the available cloud computing resources to their full potential. The cloud, as a whole, is nothing more than a networked group of hardware nodes. We've put the hard disks to good use. Its up to the developers to design applications that appropriate the CPU and memory in a meaningful way.