Wednesday, March 28, 2012

Naming Things

How do we tell two people apart?  That is, how do we know that Fred is a different person than William?  An easy way to perform this differentiation is by comparing their names - Fred is a different handle than William - no need for deeper comparisons.  This is a great feature of names.  They're a shortcut for differentiation among things that are of the same classification, in this case, people.  But we also need a method to identify these classes of objects, these classes need names too - like humans, dogs, and laptops.  The two naming activities are similar, and yet they present two very different problems.

Classification names, by their very nature, refer to many objects.  Or more accurately, instances of these classifications.  If you've got several instances, you need a way to tell them apart.  Sure, we could examine, in fine detail, the differences between two instances of "dog", but it's easier to just learn that the brown dog's name is Bark and the white dog's name is Bite.  And therein lies the problem with naming things in a global context - my pet's name carries much meaning for myself, my family, probably my neighbors too, but outside of that tightly-knit group, Bark is just another dog.

Naming instances in computer programs is subjected to this same loss of meaning outside of a localized context.  What is that context exactly?  It could be any number of things, but I think it mostly becomes relevant in large software applications where in a large population of instances, the names attached to them really lose some of their identification basis.  Is there a need to attach meaning to individual instances in large scale software architectures?  Or can we get by implementing pools of anonymous instances, their existence a mere side-effect of a larger cause?

Why Instances Have Names
Writing software is a telling experience when it comes to understanding why and how things get named. When you really think about it, a large part of your responsibility as a software developer is devising meaningful names.  You instantiate an object in your code.  Further down in the editor, you tell that object to do something, or maybe you change the state of that object.  From very early on, we're told - again and again - use meaningful names.  That acumen still helps me now, as it did back when I was learning the fundamentals.  Instance names shed light on the attached attributes and methods.  Literally, these methods need a context in which to operate - the instance.  Think of the name as the same sort of required context required for our programmer's mind to comprehend how this object works.

There is something to naming instances that has an almost evasive quality to it. And that quality is giving something a good meaningful name that'll help my brain understand it when I return several hours later is hard.  It isn't just a matter of using real words as opposed to a letter and a number, obj1, but using actual words that reflect several intangibles about the instance, at a glance.  And this really is a powerful code-writing tool.  It really does save time.  If a good name prevents you from having to second guess how to use that object, you've just saved yourself a matter of seconds.  Doesn't sound like much, does it?  But it's the cumulative effect we're after.  It's the comfortable confidence we need to be productive in programming something useful, not constantly verifying our assumptions.

Giving an instance of a particular classifier a meaningful name is difficult enough without having to give many instances good names.  And this is where the scaling factor comes into play.  As our software systems grow more complex, we're going to have more instances created within.  There is a simple limit to how meaningful a name can be in a large population.

Scaling With Anonymity
What of instead of giving instances names, we gave them an arbitrary identity? Well it turns out we do this already.  In programs where we have a list of objects, we can refer to individuals by their index.  More commonly perhaps, we only reference instances indirectly through iterative patterns.  By this I mean iterating over a set, and interacting with each object in that set by unconditionally calling a method or performing more sophisticated logic.  The point is that when we designed this iterative behaviour, we didn't visualize an instance of a class - we wanted something done to a group of things.  So whenever we perform iterative behaviour like this over a set, we're using an instance that we didn't have to directly identify. We might have given the set itself a name, but we're using it to simulate a population of objects instead of having to uniquely identify each.

There are many places in code where we can get away with anonymous objects - with no direct reference to them.  Such as with sets, we might have behaviour that interacts with a single anonymous instance.  For example, maybe you've designed a method that expects an object of a particular type of interface.  This method might accept the instance as a parameter, or it might create the instance and destroy it after the work is done.  Regardless, it's then interface that matters, not the individual's identity.  In smaller contexts, such as within methods, it's easier to give short-lived instances meaningful identities since they're not competing for the programmer's attention.

But what about the interfaces themselves?  The classes that define how instances will function when they come to life?  I'm afraid naming classes and interfaces are not immune from the issue of naming things in large systems.  When a system gets bigger, we're naturally going to require more classes and interfaces.  The alternative is to try and stuff more of the responsibility into already-existing classes which doesn't help anyone.  We could try and solve the problem by defining anonymous behaviour - as we would with instances.  And therein lies the challenge. With anonymous instances, we can at least gain some perspective into what they are, or why they exist at a particular point in time.  That is, we can infer their class that made them what they are.  If you have an anonymous classifier that dictates how a thing operates, how can you reason about it?

We only have the ability to partially solve the naming problem in software design. There comes a point when you're simply moving the problem or creating a new one. Names hold meaning for us humans, but we have fundamental limits in how many names we can manage mentally.  It's an interesting problem, one that I'm confident will become more prevalent in the near-term.