Tuesday, January 10, 2012

Cloud SLA

Levels of customer service that go beyond the basics are difficult to achieve. Disaster recovery in the cloud is perhaps the most difficult thing to offer customers. Never mind the myriad other factors that make implementing cloud infrastructure services difficult in the first place — the sheer volume of data to replicate in order to satisfy SLAs is disheartening.  Performing any task, regardless of it's simplicity, becomes complex when the data won't fit on a single hardware node.  Putting it simply, cloud infrastructures are the new, more complex systems of today.

Now imagine saying this to a potential customer — "the technology is new, the ideas are new, the problems are challenging, you understand why I can't support you, right?"  Wave goodbye.  There has to be some level of service you're willing to give customers, at least enough that they're able to operate their business using your services.

Two competing obstacles.  Customers need all the support they can get when facing challenges with your system.  Contrarily, cloud customers are typically new customers.  They're used to operating in more conventional environments, where the problems with implementing a cloud infrastructure don't manifest themselves. That is, disaster recovery isn't why customers choose one cloud provider over another.  That type of SLA has lost some of it's persuasion in a cloud provider context.  Yet, this is a reasonable expectation of the customer.  Are there other solutions?

Clouds and Support
The cloud is different from a public parking lot.  I'm free to pull my car in, but I've got to find an appropriate spot.  I've got to make sure my car will fit in any prospective spots.  If when I want to leave, and someone took the liberty of parking directly behind my car, I'm on my own.  The cloud is probably more like a private parking lot where I need to pay to park.  The difference is that I get at least some guidance on where I can take my car once I've entered.  I can feel relatively at ease that nobody blocking me in.

Support for any software system is essential.  Guaranteed support is essential in some circumstances — as is the case with critical service where downtime and other disasters means losing money to the customer.  The trouble with SLAs, for any given software system, is that they're imaginary.  They say they'll give your software 99% up-time, for example.  If that requirement isn't met, the customer is compensated some how.  But how do you measure lost business opportunity as a result of this SLA violation?  How are you compensated for that?

Obviously it's much safer to put your system, the critical lifeblood of your business, in a place where these types of events simply do not happen.  Operating your application within a cloud service provider definitely helps your chances of survival. Staying responsive, replicating applications and data so they're always available — this is where cloud infrastructures shine.  By their vary nature, everything inside the cloud environment is virtual and transient — they can be shuffled around physical hardware effortlessly.

But just like any other service out there, clouds are susceptible to real disasters. The worst possible scenario is highly unlikely but the end result of it happening are extremely risky.  Remember though, however unlikely for something bad to happen, you're always better off decreasing those odds. An enticing SLA policy might be providers staying open to one another.

Gardens Ajar
There has been much talk lately about walled gardens and how they're harmful. Not just from a vendor lock-in perspective, but it's also bad for the very nature of the web.  Without the freedom to move your data around freely, you're stuck where you are — for better or for worse.  A similar problem presents itself in the cloud domain — sure, it's possible to move your data around from one infrastructure to the next.  But how feasible is it?  Just because there are no hard limitations, such as capabilities or legal ramifications, doesn't mean it's possible for the customer.

Even if it were an easy feat — taking your virtual infrastructure from one provider to the next — practicality dictates that you're better off finding a quality provider and giving them your business.  After all, you're not going to continually shop around for the best local place to eat.  Once you've found something decent, you stick with it.

Unfortunately, this isn't food we're talking about here — it's a lot more complex.  A lot more unpredictable.  Even your favorite restaurant is going to have an off day here or there, but they're your favorite because they're consistently good.  An overdone piece of meat isn't the end of the world.  But we're talking about systems that need better results than that.  Possibly better than any one provider can offer. So is the solution to distribute your infrastructure across multiple providers?  I think so.  I also think this plays a part in the SLA of any given cloud service.

Customers use the cloud because they want to be agile — a mindset of those that care deeply about the availability of their systems.  Part of being agile means having the ability to adapt when the situation calls for it.  So is it up to the service provider to dictate how the customer is able to respond in the face of adversity?  If the provider cannot, it's up to them to accommodate for migrating elsewhere.  Not just as an afterthought, but as a coherent set of APIs that are of practical value and can be utilized on a day-to-day basis.  The best cloud SLA is promising your customers to be open, to give them the freedom they require to operate in true cloud fashion.