Thursday, December 8, 2011

Threads And Progress Indicators

If your application is to exhibit any simultaneity, it's likely going to use threads. That's assuming you haven't already divided the work and responsibilities up into process chunks — all executing independently of one another and communicating via inter-process messaging.  The multi-processing route, even if there is a framework for building applications with this type of concurrency, is hard.  Multi-threading is hard, let alone having to manage the additional problems of multiple processes communicating with one another.

In an ideal word, there wouldn't be any communication at all between two concurrent flows of control.  Once forked, our logic would happily flow, uninterrupted by the wants and needs of others, to it's final destination.  Sadly, no such magic orchestration exists in the real world.  It doesn't exist in the abstract software world either because like it or not, the software we write is a reflection of what we experience and it's to this degree that we're limited in creating something more sophisticated.

Having said that, realizing the dependencies between concurrent flows of control, how can we make use of it.  Is there really a means by which our seemingly independent logic can collaborate with one another — producing useful behavior larger than the individuals?

Overall Progress
If we're setting out to write a largely asynchronous application — one that uses threads to conduct activities concurrently — the ability of the application to gauge it's progress is valuable.  The overall progress of a task composed of multiple execution threads.  Dependent threads.  Threads of control have to collaborate to produce anything measurable in terms of completed work.  Disjoint threads — threads that are truly independent and don't concern themselves with whatever else the application is doing — don't need to produce progress indicators.

However, for those larger tasks where we have multiple threads of control all working toward the end goal, we would like to know how far along each thread is. Is there an end in sight?  Is one of our workers exhausted for resources and simply spinning it's wheels — not contributing to the big picture?  This, of course, assumes we're even able to gauge the completeness of a thread.  Sometimes, by their very nature, threads are intended to be long running — pseudo-processes within the parent if you will.

The only real type of progress we can actively measure from concurrent swim-lanes are those that race to the end of the pool.  Imagine the swimming pool is a task the application decided to launch.  Inside this pool, each swim-lane represents a thread's route to completion.  Each thread in the pool, of course, are swimmers taking part in a race.  Only when all swimmers have crossed the finish line do we have a completed task.

It's during the race that the application is interested in gauging the progress of both the race as a whole and the individual swimmers.  Perhaps the application itself is performing several tasks — all in different pools, all with different swimmers. Which race is more interesting to spectators?  The application can't know unless individual threads are continuously supplying progress indication.  But we really shouldn't be exchanging data between threads, right?  Isn't the intent to isolate individual threads of control as much as possible?  Maybe so, but we're not necessarily exchanging information relevant to the state of the tasks data — only the overall application.

Application Thread Data
Data that pertains to the overall well-being of the application — task progress for instance — isn't the same as data that pertains to the nucleus of the task itself. Something that needs to be computed for the sake of the user's benefit usually means that it's part of the problem domain — a space that would ideally be encapsulated tightly by threads that make the computation happen.  This insulation layer — the one that separates application management and application problem domain logic is essential to sharing data between concurrent flows.

So if we're able to establish such tight boundaries by defining what constitutes the problem space, and how the tasks that solve that problem are spawned and subsequently managed by the application, we can identify data that's safe to move around.  Exchanging data between threads in this way isn't such a bad design pitfall.  There is a separation of concerns here — the problem and the implementation.  Making sure these two items are distinct in their implementation is a good programming practice anyway — how we go about doing it is another matter entirely.  Because, making these clean cut distinctions between different tasks that form a computational solution to the problem our application solves, isn't easy.  Or even possible in some cases.

But, if you can manage to find some value in exchanging data between threads, data that values the overall aim of performing better.  Or performing smarter.  Or another goal that falls outside of the context of the problem's solution.  Of course, this requires some abstract thought — perhaps beyond what is worthwhile in the majority of software we're writing.

I chose the progress indicator as an example here because it may prove to be of practical value to the application as a whole.  Code that manages tasks can make informed decisions based of the progress — a simple piece of state about what one of the active swim-lanes is doing.  And that's all it comes down to, really — ensuring that your not sharing state between threads that is part of the solution. Other information — data about the tasks themselves might be a smart thing to pass around the thread pool.