Thursday, June 28, 2012

From File-Like to Stream-Like

The new io module, designed for Python 3 but available in 2.6, gives file handling in Python a commonsensical layer of abstraction. The intent behind the new module is to replace the built-in open function and the methods of file-like objects. But why?

The file-like objects we've used in Python for over ten years already provide a nice abstraction layer between the programmer and the API. The nicest thing about file-like objects is that the API is small, and so easily emulated by other objects that we'd like to use in a file context. The facilities available for working with files in Python aren't perfect, nor is the new io module. The new ideas brought forth by io have had quite a while to coalesce. And since the module is available in Python 2, the transition to Python 3 using this new io code should not have to change.

Over the course of many years, the Python community has been busy writing IO code using the native file interface. Lots of code over significant amounts of time yields some common usage patterns that aren't optimal for the vast majority of users. That is to say, there is nothing wrong with the native file implementation — it works as expected. The improvements in the io module aim more toward writing less band-aid code surrounding the file interface. I'm certainly in favour of writing less code, especially for common problems that are encapsulated behind the interface. My question is this — can Python projects successfully transition to a new methodology of treating file-like objects as stream-like?

Good Old Days
Interestingly enough, the good old days of file-like objects are still with us. This is good, because should they have suddenly disappeared, we'd be left with exactly zero functional Python packages. That isn't what the Python development effort is intending — the goal isn't to profess what a select few think is good for the language. Instead, the file-like interface will likely never go away. Like many other provisional decisions made by the core Python development effort, the largest influence is maintaining a community of developers that can keep their code stable.

The Python 2 series has seen some changes, mostly additions, that are by and large aimed at Python 3 adoption. The io module being one of those additions. Needless to say, you can start making your code Python 3 friendly now. The trouble is, with the built-in file interface in particular, where is the motivation to start rewriting code? Because there is a new way of doing it? I think the simplicity of the Python 2 file interface doesn't exactly inspire mass exodus from the norm. What is needed is a real kick that will really excite developers. Something to really demonstrate that writing new code for working with files is worth while.

There is a problem there too. The file interface is so useful that we often emulate the API around other objects — file-like objects. So we don't know that we're using a file, and we probably don't care much for that matter, as long as the implementation is hidden from view. The file-like concept is so pervasive that it's almost as though Python has a concept dependency. Python has no notion of a required interface, but is staying file-like mandatory? I whole-heartedly support the idea that given an opportunity to fix some common use cases without introducing mammoth amounts of code, that developers would jump on it.

Brave New API
The key differentiator that the new io module brings to the table is a class hierarchy. The most general class, IOBase, is completely abstract and does nothing more than provide an API. From here, traversing downward through the hierarchy, we're presented with more specific IO capabilities. For example, the RawIOBase isn't abstract — it deals with low-level system calls. But rather than interact with RawIOBase directly, we'd probably want to use one of it's descendants. And this is the beneficial design tactic brought forth by this new library — low level operations that we would typically have to create our own abstractions around are taken care of for us.

A prominent use case would be in dealing with text versus bytes. TextIOBase, for example, will take care of unicode issues that we're used to dealing with ourselves using the traditional file interface. The distinction between raw bytes and text is a key philosophy in Python 3. How the io module handles these different types by means of the class hierarchy offers a glimpse into this philosophy. If you're writing Python 2 code, which I think most of us still are if we write code that's used in production, this is a simple means to write less boiler-plate code to deal with types. I think the io module works well in taking Python back to the concept of typeless languages — by hiding some of the type woes behind an API.

The PEP for this module states outright that it took some influence from Java IO. That means we've now buffered IO classes to work with. Why is this important? Predictable performance. Python runs on a lot of disparate operating systems and devices. That means that there are a number of differences between system calls for the same Python application running in different places in terms of latency. Buffered IO provides the necessary means to read or write as much as possible, while hiding the underlying system calls from the programmer. Any application that reads or writes from more than one place concurrently will likely run into issues of unpredictable responsiveness, at least from the user's perspective. There are definitely ways around this using the built-in file API, but suppressing that code into a core library seems like the better choice.

The built-in file API isn't going away. Conceptually, the new io module isn't all that different. The core abstractions in the class hierarchy are virtually the same as file-like objects we use today. The challenge is going to be figuring out where is is worthwhile to replace old code with this module. Or, maybe Python IO concepts going forward will force us to rethink our application data in terms of streams and what the rewards of doing so are.

Monday, June 25, 2012

Good Architects Write Code

Writing code makes the difference between a mediocre architect and an outstanding one. There are a number of factors that influence this thinking, least of which being, architects are essentially programmers with additional responsibilities. Part of thinking like a programmer is thinking in code and part of thing like an architect is thinking like a programmer. That is, as an architect, essentially what you're doing, no matter how many practices and procedures stand between yourself and the software, code must be written.

What is the best way to make this happen? After all, the big picture falls to the architect to sort out. All the stakeholder's expectations must be aligned with what we're actually able to produce. So, then, maybe I can help by contributing some code, or at the very least, reading some code and reviewing it. But does that leave time for my other important architectural activities. Too often, code and the process of curating it, is neglected by not just architects, but the development process as a whole. Developers recognize that the success of any modern organization rests heavily on the shoulders of it's technical foundation. It's ability to write and understand code.

When an architect suggests that they have other activities to perform — ones that seem to place code on the back-burner — its no wonder the negative stigma surrounding software architecture is so prevalent. Architects are just big picture guys, out to do nothing but draw diagrams without consideration of the technological challenges facing developers. That, and other negative perceptions about software architecture, could be partially resolved if architects wrote code.

Team Player
Writing code is a team sport — you try out, you make the team, you pass the ball around, you win, you lose. A narrow analogy might place the architect in the role of coach. Observing the game, commentating on events as they happen, providing feedback, and devising plans. That might work well if software development teams were structured the same as organized sports teams. We often don't have a fixed roster of developers. The teams we're up against are often composed of many more players than ours. Despite these limitations, there an obstruction in sports that doesn't exist in software development.

The architect can throw on a pair of skates, jump over the boards, and score a goal or two. It's as simple as that really. Being a team player, that is, getting your hands dirty and helping out with the code-level work will bode well for team morale. This is how, as an architect, you can earn some serious respect. Bend over backward to make life easier for programmers, and they'll return the favor, I assure you. Put this way, the architect is playing the role of captain and coach.

Now, this certainly isn't an easy feat. Architects can't simply ignore their duty of ensuring stakeholder satisfaction. As much fun as programming in the wild west can be, architects are programmers with additional responsibilities. Let me put it this way — you're still supposed to look after the architectural description, and basically ensure that the software system itself maintains it's architectural integrity. So no, don't forget about those things. Don't drop what you're doing and start your new programmer position just yet — these purely architectural day-to-day jobs are important. Make sure you do get involved with the code from time to time. Even if you're not capable of contributing functional components, ask questions. Earn some respect by taking an active interest in the nuts and bolts.

Nuts and Bolts
Beyond earning respect from developers, it is important that architects understand the nuts and bolts used to glue their system components together. Understanding the big picture and the impact it has on every stakeholder is the most important deliverable in any software architecture. This includes the nuts and bolts.

Developers that disregard the software architect role as nothing more than big up-front design have good reason for this attitude. If we're merely handing down ideals without sympathizing as to how difficult these implementation jobs actually are, than we're doing big up-front design while leaving reality as an afterthought. Reality cannot be an afterthought. Reality is the low-level details, the subtle but important code we need to write that realizes the requirements of the stakeholders.

As an architect, you're also a programmer — never forget that. Like it or not, if you don't understand the nuts and bolts, you had better make sure you allot time to study them. The low-level components have a big impact what what is ultimately delivered as a software product. This impact ripples from the bottom, all the way up to the users, touching on every use case we've put together that ensures the software can satisfy all requirements. If the implementation was executed well, these nuts and bolts will be encapsulated inside robust APIs that we've designed. As you know, abstractions only go so far in protecting ourselves against unanticipated behaviour of the nuts and bolts.

The software architect is in a good position to understand the quality of the system's abstractions and how vigorously they're prepared to defend against unexpected behaviour. To approach this job, I would look at things in two directions. First, the top down approach, looking at the requirements and conversing with the stakeholders who may or may not know what they want. This is the business side of the game, important to understanding what the software must do. For everything else, I would take the bottom up approach, starting with the nuts and bolts. Along this route, you'll eventually meet you're your business path. Eventually, not immediately — this is an iterative approach to software architecture.

Thursday, June 7, 2012

Making Implicit Use Cases Explicit

If I could attach a number to something, it would be how many implicit use cases materialize during the development phase of any given project. They outnumber the explicitly spelled-out use cases. It's a little difficult to define what, exactly, an implicit use case is. Is it all the valid use cases of the system that were just overlooked? Do use cases include malicious intent by the user?

The trouble is, use cases generically capture the intended usage patterns by users. The user wants to get to point C, so we need to get them through A and B. And this works because it explicitly states the what. If we have a collection of these generic use case statements, we can use them to ensure we are in fact solving the problem we've set out to solve. But when new software is crafted, new problems are introduced — new opportunity for miscommunication and human error.

What happens is the implicit use cases that went unnoticed saturate the design and any code written. Not only do you have to deal with these cases when they're inevitably discovered, but you're also taking a mental hit. Finding missed cases when you think you've got it all covered hurts morale. This is no different from the waterfall approach — when big upfront design fails, it fails hard. Explicating all possible use cases isn't feasible either. This is almost waterfallesque, trying to foresee the unknown. Perhaps a better organizational schema for tackling the problem of implicit use cases is to group them by personae.

Generic Use Cases
The practical aspect of use cases is they capture a generic scenario. For any given function of the system, we can make statements about what that functionality looks like from the actor's point of view. And they're the same for all actors. So we have this relationship between an actor, also generic, and a use case. This says that we can reasonably expect a certain commonality across the users of our software. This assumption is necessary, because you can't sit down with each individual that plans on using your software, interview them, draw up the use cases based on their personal preferences, and make the corresponding code modifications. Practicality dictates that we come up with a template. One that best represents why we're building what we're building.

Generic use cases are challenging enough to manage, even with the reduction in complexity due to common user expectations. Imagine that your software's user-base were all clones of the same person with the same personality traits, the same knowledge of your application, the same everything. With this in mind, you could simply go about documenting how these clones would go about interacting with your application. Every situation presented to them would prompt the same response. So with the human variables removed, you're free to focus on the use cases themselves, and avoid the relationship between the actor and the use case.

By and large, this is how we design software today. We take the set of human clones and apply formulated use cases. We establish the generic link between actor an use case. The positive angle to this approach is that we can easily manage these relationships. Just like when you design a class hierarchy, you put the common structural and behavioral features near the top. This allows our brains to think generically, as opposed to having to specializations straight away. Use case definitions work in the same way — we focus on the top level of the class hierarchy, trying to document the common use cases for the common user. This is a good way to get started with the requirements of your system, but just as with class hierarchies, specializations must be put forward.

Specialized Use Cases
The generic, templated approach to use case design work is where implicit use cases originate from. Consider our group of user clones again. When they're using the application, everything is fine. But introduce another set of clones, based on a different individual, we start to see some interesting things happen. Because the generic use cases that our system is based on really only considers one persona, we can't accommodate another. Of course there is some level of overlap between the two groups of human user clones, and that is why we design use cases generically to begin with. But not everything overlaps. There are individual differences that need consideration.

The differences in user personae are the reason we require specializations in the use cases of our software. They're the reason why the relationship between between actor and use case are important.

To remedy this problem, we employ user experience design, as is the standard practice today. More often than not, the user experience design activity occurs after the generic use cases of our system has been captured and finalized. This is what our software will do, so let's make sure the experience of doing those things is pleasant for each personae. This works from a user interface perspective, but the improvements seldom end there. Whichever persona we're improving upon, will often require more than a simple rearrangement of user interface components. The changes in improving the experience for any software tends to trickle, top-down, into it's roots.

The generic approach to use case design is feasible, this is why we do it. My suggestion is that we take it one step further and think more deeply on the user experience design at this stage. User expertise at use case time, perhaps even before the user interface code exists, is invaluable. There are bound to be a few light bulbs that have far-reaching consequences in terms of how your code looks all throughout the system. You don't have to go about this histrionically either. Even two or three personae taken into consideration during use case discussions would help.

Friday, June 1, 2012

Clouds and Innovation

If I were to launch a new business today, I'd be bombarded with the latest and greatest cloud technologies and how they'll make my life better. I'll save on hardware costs because I can virtualize; I'll save on operational overhead because I can operate everything I need to within a single dashboard; I can scale up when demand necessitates.

We all know that marketing doesn't reflect reality — entirely. We can do a lot of the purported cloud things that we love to talk about, but there are a lot of other complexities that pop into existence as a result of the cloud culture. You've now a deluge of product and configuration options to sort out. Of these, which assist with what you're trying to accomplish from a business perspective? Not all of it, but certainly some of the cloud product capabilities are relevant to your cause.

What I find interesting about cloud is that we're in a hurry to design and build this mega-infrastructure in a box. An impossible task by most sane measurements, but we're trying. It's not as though we've an end goal — there is finish line marking the perfect cloud product that everybody must own. But we're trying. And through all this, even the development of features you're not necessarily interested in as they don't directly impact the quality of your business today, we're witnessing some really cool innovations. What impact will this race to building the cloud have on software development projects in the future?

New Contexts
As it turns out, solutions to problems are often applicable in contexts outside that of which the problem originated. Especially in IT where software is such a fluid contrivance — easily replicated and easily changeable. One would think with all the software frameworks and libraries out there today, that we would be capable of creating something generic and useful enough to tackle any problem we encounter. Just make a few configuration changes, and, you've got the answer to your problem. That, however, could never work because problems that need solving proliferate much faster than software. There is no way that we could design something generic enough to keep pace.

We do have some really good frameworks and commercial products in the technology domain that weren't born out of the desire to tackle every single problem by means of being all-encompassing. Many successful projects that are somewhat generic today, that is, they can be applied in several contexts, were conceived to solve an smaller problem.

Django is perhaps the easiest example I can point to in the Python world. This is the de facto web framework for Python programmers, but that certainly wasn't Django's goal when it was first conceived. Django, in it's early days, had the more modest goal of providing the facilities necessary to rapidly build web applications for newspaper content. I won't drill down into all the right decisions that were made over the course of Django's history that gave it such popularity in the Python web community. I will, however, reiterate that because newspaper applications needed to be developed quickly, we now have a web framework that has been used to develop a wide array of web applications.

It's always interesting to watch solutions to problems applied in new contexts. The most successful solutions to technology problems are almost always adaptations from other contexts. The innovations that are currently under way in cloud technology most certainly have practical implications for software design and development in general.

Principles
Sometimes, a technology solutions doesn't have a one-to-one translation in a context different from the problem they were meant to solve. Sometimes, you have to derive principles from the solution, and these are valuable. Principles gained from innovating could be as simple as lessons learned. We know what not to do because such and such went wrong in environment this and that. The bi products of innovation are limitless — sometimes they're real software components that have been simplified and generalized for the masses. Sometimes its general knowledge.

In the case of cloud technology, we're trying to solve a number of problems that wouldn't otherwise prevail. Scalability, for instance, is an important aspect to cloud infrastructures. How do you accommodate such vast amounts of data, coupled with fluctuating demand, and fortify architectural quality properties? Interoperability — there is a cloud ecosystem out there, and vendors need to emulate one another's API. Migratory data. Moving large chunks of data around networks while retaining consistency isn't an easy problem to solve.

What's interesting about cloud technology is that we're seeing the inverse of the regular innovation pattern. Cloud, depending how you want to define it, solves a general problem. So the long-lasting principles that will someday be beneficial in other contexts, weren't aimed at solving a specific problem as Django did in the newspaper industry. I will say though, there are a lot of hard technical lessons to be learned during this innovation period in cloud technology. We're in the incubation period for ideas and principles that will find themselves in many future software systems.