Wednesday, April 4, 2012

Appliances For The Cloud

For things that run in the cloud, I like the term appliance because it is generic.  And generic is what we're after when designing stuff that runs in heterogeneous environments.  Appliance is meaningful as a term because it encapsulates the notion of hardware and the raw binary that is the operating system.  This is great, if we can manage to deploy bare-bone operating systems using a single appliance on multiple cloud providers.  But appliance design should also take into consideration the larger business problems of the distributed application — virtual machines rarely exist in isolation.

That said, what types of problems can we solve at appliance design time?  This is the challenge, because the ultimate goal would be to retain some level of flexibility for the application code once provisioned, but we also want to reuse the appliance in more than one environment.  There are important factors to consider when designing appliances for the cloud, two of which I think deserve close attention.

System Requirements
When you install a piece of software on your computer, the installer is more often than not going to perform some sanity checking.  Things like, "this is a x86_64 architecture, I'm not touching that" or "512MB RAM isn't going to cut it".  These are the minimum requirements that affect either the software's reliability, performance, or both.  Not meeting these requirements means you simply cannot expect the software to work as described.

When programmers create software systems, they consider these limitations of the architecture relative to the end user's hardware.  A dominating influence in what a system can and cannot do is the hardware resources available to it.  For example, if you're building a game, you would look at the consumer hardware available to your audience and aim for something that will run on a selection of those hardware platforms.  If you're writing a multi-tenant web application, you're going to look at typical server hardware configurations and strive to meet those requirements.

Designing an appliance for the cloud, on the other hand, requires these same hardware considerations, but with a more adaptive attitude.  Deploying to the cloud is a different paradigm than installing software on physical hardware.  With traditional software systems installed on traditional hardware, there is no sense of urgency.  It isn't the end of the world if, say, your current server doesn't support the required RAM by a web application you'd like use.  You can take time to evaluate alternative hardware configurations, or alternative software packages. Physical hardware is a long-term investment.  Therefore, it is important to achieve the right mix of system resources and software that can best utilize it.  Hardware in the cloud on the other hand, is transient.  As is the demand for it.

Systems deployed on cloud provider infrastructures succeed because this is the best suited environment to handle fluctuations in demand.  Collective insight over the past several years has been that end users — customers — control the need for hardware resources, not the software itself.  That is, unless you're deploying a large-scale autonomous analytical engine that will suck back any CPU cycles you throw at it.  So let's assume that isn't the case and your software doesn't necessitate long-running idle processes in order to function.  Let's assume that we have no way of predicting what our users will do, or how many of them will do it.

That being the case, our appliances will need to be replicated when the need for more hardware resources arises, due to increased activity.  Can we get away with deploying copies of our application that requires at least 2GB RAM?  That is, can we safely make the assumption that the cloud will give us the necessary resources?  I think this is a dangerous assumption to make about provider infrastructures.  Don't forget, clouds are just lots of physical machines, they cannot pack every single virtual machine, rigidly honouring their hardware requirements. But if a cloud cannot give my appliance what it wants, perhaps it will give me a counter offer of 1GB RAM.  Should I take the offer?

Remember, we have a backlog of demand to fulfil at this point.  The need for hardware is now, not later.  Urgency trumps ideal hardware environments.  So at the application level, I should be able to make use of these limited resources that can contribute to satisfying demand that is current.  I don't have time to think, my application needs to react.  What I can do, however, is use this scenario to my advantage when I'm designing my appliance — the fact that I need to take hardware I can get, without much vetting.

Ephemeral Existence
Uncertainty.  This is what we're up against in any kind of software development. The degree to which we cannot predict the events our software must deal with scales up along with user counts. To combat this uncertainty, we try to utilize the elastic nature of cloud environments.  The elasticity comes from expanding and contracting hardware resources as provisioned by our application to meet demand. So I think equal emphasis needs to be placed on the transient nature of virtual machines at appliance design time.

Another challenge with designing software appliances targeted for cloud environments is configuration.  When a virtual machine instantiated from an appliance comes to life, it must inspect it's virtual hardware environment, not in great detail, but as a basic guide to what can feasibly be done.  The software needs to note that the currently allocated resources can handle a given load.  If not, as in  the scenario where the virtual machine is operating with less-than-optimal hardware resources, the software needs to make adjustments.  Perhaps it needs to adjust simple configuration values that tell the system to take on less work.  On the other hand, maybe the virtual machine was provisioned with more hardware capacity than it's used to.  In this case, it might be beneficial for the system to configure itself to use a greedy strategy with regard to resources simply because it can get away with it.

Above all else, appliances built for cloud environments have a whole batch of design problems to acknowledge.  I would go so far as to say that appliance design is a separate design activity.  But at the same time, appliance design goals — optimal performance and reuse — cannot be achieved independently of the contained software system.  What is necessary, then, is an integrated strategy that places emphasis on appliance quality attributes while developing the software. We need to force important deployment questions during the software design phase that may not have been relevant in the past. Even previous releases of current software projects may not have considered these questions about cloud environments.  Cloud infrastructures aren't perfect.  Appliances need to handle these imperfections.  The best way to do that is by adopting a strategy to accommodate the transient nature of both demand and the availability of resources.