Sunday, December 18, 2011

Cloud Controllers

The cloud is all about making new resources available to computing consumers. Not hardware consumers, but processing, memory, and storage consumers.  This distinction, I think, is sometimes overlooked by those of us in the industry.  These resources, ultimately, belong to the application — the customer doing the provisioning doesn't necessarily care about these things.  In the end, their job is to make sure the software they're administering runs effectively.

The availability of physical hardware, or in the cloud's case, virtual hardware, is how we're able to achieve optimal usage of deployed software systems.  Of course, as a consumer, I want to give my application everything it needs to perform.  Not just at the barely acceptable level, but at the screaming fast level.

If I'm to get any of these things from a cloud service, I've got to make sure that the resources my application needs are there ahead of time.  So maybe I'll be proactive and give it more memory than it actually needs?  The issue is that I'm looking at the application from outside of the box.  I can peek my head inside to get a general idea of what's happening and how I can improve the situation.  But I get nothing more than a general idea.  The question is, how can I really know what the application is expecting based on current conditions?  What does it need?  And can I automate this process without writing any code?  Maybe, but I think we need to step back and look at cloud services and what they offer to the applications, not just the users who're initiating the environment.

The User Focus
Cloud infrastructure services have friendly control panels for customers.  Control panels, at least in the context of a cloud environment, should hide some of the ugliness involved with provisioning new resources.  A customer sees a limited set of choices in how they can deploy their application — select from a list of different memory profiles.  Select your required bandwidth.  How much storage will you need?  A form not all that different from something typical of any web application we're used to.

The end result of this process?  The application is deployed — with the hardware it needs.  All the work involved with finding the right physical node in which to place this new virtual machine takes place under the covers.

As with any application, this is a sound principle — take a complex task such as provisioning new virtual machine resources and encapsulate the complexity behind a user interface.  This is what successful applications do well — they're successful because they're easy to use.  There is one broad category of user and we're catering to them — to making life easier in how they interact with the cloud.

The problem that I see, or oversight anyway, with this approach is one of priority. When we're designing software systems, one of the first activities is identifying who'll be using the system.  So who uses cloud service providers?  Well, folks who want to provision new applications without the overhead involved with allocating physical hardware to fit their needs.  An opportunity I see with cloud is automation. Not just a simplified interface for system administrators, but a means for deployed applications to start making decisions on what needs to happen in order to perform optimally.

The Application Focus
Some environmental changes that take places are obvious — like the number of registered users jumping from one hundred thousand to five.  These changes are somewhat straightforward to handle by the administrator — they're not exactly critical to how the service will respond to demand over the next few hours.  This type of environmental change takes place over larger amounts of time — a duration suitable for humans to step in a relieve the situation.  If we're seeing a growth trend in terms of registered users, maybe we'd be smart to assume that we'll need a more robust collection of hardware in the near term.

Now, what about when the timeline of these events — the inclining demand of our application's availability — is compressed into something much smaller, like under an hour for example.  If all available resources our applications has to store, compute, and transfer aren't enough to handle the current state of usage, than we'll see a change in behavior.  Sorry, the users will see a change.  But, thankfully, advanced monitoring tools we deploy to the cloud beside our applications can easily tell us when the application is experiencing trouble and the cloud needs to send more virtual hardware to the rescue.

Even if this isn't an automated procedure, it's still something trivial for the application's monitoring utilities to notify the administrator to go and provision another instance of the application server to cope with the request spike.  In this scenario, there may only be a limited window in which users experience unacceptably poor response times.  But this is often automated too — it doesn't take a system administrator to determine that there aren't enough available resources to fulfill the application's requirements based on current situations.

Google App Engine is a good example of how something like this is automated. Each application deployed to app engine has what are called serving instances. These are the decoupled application instances of the application that doesn't share state with other services.  As the load increases, so do the number of serving instances to help cope with the peak.  Just as importantly, as the peak slowly winds down and the pattern of user behavior returns to normal, app engine kills off superfluous serving instances that aren't necessary.

There are many ways to automate application components to help cope with what users are doing — to prevent one user from sucking available CPU cycles away from others.  Provisioning new instances of the serving instance within the cloud environment for example.  But, does this really take into account what the application is really doing and what's likely to change in the eminent future?  To do that, code inside the application needs to take samples of internal state and propagate these changes outward — toward the outer shell of the application — perhaps even into the cloud operating environment in some circumstances.

The trouble isn't that it's not possible to take into account the inner workings of our applications — it's that it isn't a high-priority for cloud service providers.  It's easy to alter applications deployed to the cloud — to take measurements and make them available to other services that could potentially react to those measurements.  The trouble is, there is simply too much code to write — too much of the burden is put on the customer and not the service provider to offer APIs that can help applications operate effectively in the cloud.

Thursday, December 8, 2011

Threads And Progress Indicators

If your application is to exhibit any simultaneity, it's likely going to use threads. That's assuming you haven't already divided the work and responsibilities up into process chunks — all executing independently of one another and communicating via inter-process messaging.  The multi-processing route, even if there is a framework for building applications with this type of concurrency, is hard.  Multi-threading is hard, let alone having to manage the additional problems of multiple processes communicating with one another.

In an ideal word, there wouldn't be any communication at all between two concurrent flows of control.  Once forked, our logic would happily flow, uninterrupted by the wants and needs of others, to it's final destination.  Sadly, no such magic orchestration exists in the real world.  It doesn't exist in the abstract software world either because like it or not, the software we write is a reflection of what we experience and it's to this degree that we're limited in creating something more sophisticated.

Having said that, realizing the dependencies between concurrent flows of control, how can we make use of it.  Is there really a means by which our seemingly independent logic can collaborate with one another — producing useful behavior larger than the individuals?

Overall Progress
If we're setting out to write a largely asynchronous application — one that uses threads to conduct activities concurrently — the ability of the application to gauge it's progress is valuable.  The overall progress of a task composed of multiple execution threads.  Dependent threads.  Threads of control have to collaborate to produce anything measurable in terms of completed work.  Disjoint threads — threads that are truly independent and don't concern themselves with whatever else the application is doing — don't need to produce progress indicators.

However, for those larger tasks where we have multiple threads of control all working toward the end goal, we would like to know how far along each thread is. Is there an end in sight?  Is one of our workers exhausted for resources and simply spinning it's wheels — not contributing to the big picture?  This, of course, assumes we're even able to gauge the completeness of a thread.  Sometimes, by their very nature, threads are intended to be long running — pseudo-processes within the parent if you will.

The only real type of progress we can actively measure from concurrent swim-lanes are those that race to the end of the pool.  Imagine the swimming pool is a task the application decided to launch.  Inside this pool, each swim-lane represents a thread's route to completion.  Each thread in the pool, of course, are swimmers taking part in a race.  Only when all swimmers have crossed the finish line do we have a completed task.

It's during the race that the application is interested in gauging the progress of both the race as a whole and the individual swimmers.  Perhaps the application itself is performing several tasks — all in different pools, all with different swimmers. Which race is more interesting to spectators?  The application can't know unless individual threads are continuously supplying progress indication.  But we really shouldn't be exchanging data between threads, right?  Isn't the intent to isolate individual threads of control as much as possible?  Maybe so, but we're not necessarily exchanging information relevant to the state of the tasks data — only the overall application.

Application Thread Data
Data that pertains to the overall well-being of the application — task progress for instance — isn't the same as data that pertains to the nucleus of the task itself. Something that needs to be computed for the sake of the user's benefit usually means that it's part of the problem domain — a space that would ideally be encapsulated tightly by threads that make the computation happen.  This insulation layer — the one that separates application management and application problem domain logic is essential to sharing data between concurrent flows.

So if we're able to establish such tight boundaries by defining what constitutes the problem space, and how the tasks that solve that problem are spawned and subsequently managed by the application, we can identify data that's safe to move around.  Exchanging data between threads in this way isn't such a bad design pitfall.  There is a separation of concerns here — the problem and the implementation.  Making sure these two items are distinct in their implementation is a good programming practice anyway — how we go about doing it is another matter entirely.  Because, making these clean cut distinctions between different tasks that form a computational solution to the problem our application solves, isn't easy.  Or even possible in some cases.

But, if you can manage to find some value in exchanging data between threads, data that values the overall aim of performing better.  Or performing smarter.  Or another goal that falls outside of the context of the problem's solution.  Of course, this requires some abstract thought — perhaps beyond what is worthwhile in the majority of software we're writing.

I chose the progress indicator as an example here because it may prove to be of practical value to the application as a whole.  Code that manages tasks can make informed decisions based of the progress — a simple piece of state about what one of the active swim-lanes is doing.  And that's all it comes down to, really — ensuring that your not sharing state between threads that is part of the solution. Other information — data about the tasks themselves might be a smart thing to pass around the thread pool.

Monday, December 5, 2011

Mind The Aperture

Picture each software system as a gigantic puzzle — an admittedly over-simplified analogy.  Each piece, so the visualization goes, is an encapsulated component with boundaries and responsibilities.  Only when each puzzle piece is connected do we have a complete software system — the big picture coalesces on the landscape.

A problem with this analogy though — one that might reflect some of the challenges we face in grasping the true depth of problems our code aims to solve. You see, the big picture — the one resulting from an assembly of puzzle pieces — isn't a fundamental given.  We haven't anything concrete to serve as a reference picture.  How do we know when the system is ready?

Gaps In Foresight
The waterfall approach to software development doesn't work because of a fundamental misstep in presumed knowledge.  Knowledge before the construction phase starts.  Apertures inevitably rear their heads only after development has started.  If we are to assume that we won't hit these gaps in knowledge during the development of our product, we're ill-prepared to deal with them right from the get-go.

So what do these gaps represent?  Do they mean that there are serious problems with our understanding in what we've set out to build?  Do they mean that we've simply miscalculated the complexity involved?  Was our team naive in choosing the required technology platform on which our software runs?  Any one of these could be true, and the only way you'll find out is to try building it.  The trick is, if you can eliminate the painfully obvious gaps in knowledge before you start your endeavor, you can be sure that you're on the right path.

Wide gaps in foresight are preventable.  Perhaps even mid-size gaps can be identified and evaluated with no more than a little research.  It's the smaller ones to look out for.  These gaps in our assumed know-how will come back to bite us once we've put in any real development effort.  The trouble with small gaps in understanding what the software project is all about is that they tend to progress along side the project.  As the project grows in size, the components that we've designed — the interfaces that serve as the basis of our architecture — are lodged throughout the code.  And in the same way the newly-fashioned design parades through the system, establishing it's roots as the basis of which the rest of the code uses, the small gaps in our understanding take hold and forever change the future of the project.  That is, unless we can learn to differentiate between good gaps and bad gaps.

Leaving Gaps Alone
Who says all missing pieces are necessary, that all potential gaps of any software system must be filled?  I think this is another misconception that we all experience at one point or another during our lives of building software.  We have to somehow understand and implement all potential use cases of the system, including those that stakeholders have yet to consider.  These aren't exactly easy to come up with either.  We have to put ourselves in the shoes of users.  Of administrators.  Of support staff.  Anyone.

The trouble is, once we start thinking about where the requirements fall short, we start playing a guessing game.  We generally ask the whoever the gap concerns about what they want done about it — how should we implement this scenario? Chances are, they don't know.  If they'd thought about it, it'd be in the requirements now wouldn't it?  But at this point in the game, it's difficult to think responsibly about these gaps in our understanding of what the software does.  There is no system in production yet. So how does one quantify changes that should be introduced without any knowledge in how the system works in reality?

Asking a stakeholder about a missing gap in the system during development won't necessarily help you much because they need to think about it.  Or worse, no thought goes into how to fill holes in the system and it's just a workaround that plugs the gap.  In the best case, we will sit down with the stakeholder and actually think about what the gap represents, how it should be dealt with, and what are the significant architectural changes.

Keep in mind — this is still development.  We've now got a road block in place, slowing the continuous development effort.  While the issue details are being ironed out, there really isn't much that can be accomplished in terms of writing code. Pushing forward without a full understanding of what the system goals are, we could be writing the wrong thing.

Having said all that, I think it's better to leave gaps alone during the development phase.  Just as long as they're understood.  The implications of missing requirements from all perspectives of the project will enable use to move forward in pounding out lines of code.  Nothing makes stakeholders happier than a system delivered sooner rather than later — regardless of gaps in what the system does. The key is, as I mentioned, how the system deals with gaps in understanding.  If you allow for stakeholders to view a functional system, and can point out the gaps in a meaningful way, they're more apt to make the right decision and fill it with something useful.  Just be sure to produce a working system, one that knows about the gaps but will still work.  Point out the what the gap means for every aspect of the software.  It's much easier to make these arguments when you've got a running software system to back up your claims.