Monday, February 7, 2011

Implicit Encapsulation

Understanding large data structures in software is hard. Using abstract entities helps us understand what we're building by hiding information about the structure that we aren't interested in. This is what encapsulation is all about. Software objects are simplifications of some idea, so, anything. The cohesiveness of abstractions in a software system, how understandable they are, what information they expose, the impact they have on other abstractions in the system - we don't consider things things once we've hacked something together that works. Of utmost importance is that it works, not how comprehensible the classes are. How do we measure abstraction quality? Encapsulation is a design principle, one that allows us to remove irrelevant aspects from the problem at hand, which makes it hard to objectify. We can look at some trade-offs of various encapsulation approaches. For instance, some programming languages allow us privatize object attributes - hide them from other objects. As developers, have an amplified ability to hide irrelevant details that aren't beneficial to us. How valuable is explicitly changing the visibility of an object attribute so as to hide it?

What makes an attribute irrelevant to the outside world? Not only irrelevant, but dangerous if exposed due to misuse by other developers. Subscribing to this attitude when it comes to information hiding is somewhat useless - your code is always going to be abused no matter how you try to protect it. We can safeguard against misuse to some degree by hiding attributes as you can't change what you can't see. Once I've marked my attribute as private, it is as though it never existed. If we wanted to get fancy with how our abstraction is discerned, we could implement and endless combination of public, private, and protected visibility settings. This type of configuration, we don't want to be stuck with. I can see the code, so I can see the attribute marked as private in one way or another - it isn't as though I'm completely ignorant of it's existence. I just can't use it there is usually no apparent reason as to why. What makes an attribute irrelevant to the outside world?
Private attributes inaccessible to other objects exist as part of the internal structure of the object. They give the object it's character, even if we can't see it from the outside. For instance, public attributes can be derived from private ones. Say I have a Product class with two private attributes - sellerCost and profitMargin. A public attribute, buyerCost, could then derive it's value from the two private attributes. The outside world only cares about the buyerCost, not how it was computed. The two private attributes are encapsulated - nobody can see them. Attempts to access sellerCost or profitMargin are met with failure. Where do we set the value of these private attributes?

The constructor is a good place to set attribute values. Passing object attribute values to the constructor isn't the same as modifying the attributes after instantiation. Think of it this way - creating an object is more than just saying "create a new product". You create objects by saying things like "create a new product with a seller price of $15 and a profit margin of 25%". This creates a product instance, different from those with a $500 seller price and 16% profit margin. If we're able to change these values after the object's existence has been established, we're in effect creating a new object. This is an abstract idea, you can obviously change attribute values all you want once an object is created as long as it isn't static. If we know we're supposed to set attributes in the constructor, we know that we're using an object that relies on initial values that don't change throughout the duration of it's life. Knowing this leads to better understanding of the system under construction because we can see the expected values used by derived attributes.
Objects change state in a running system, meaning, an object's attributes change value. Otherwise, we'd have a completely static system that just creates objects and doesn't do anything interesting with them. In the spirit of encapsulation, one approach to changing the state of an object is by using setters. Setters are nothing more than simple methods that set the value of a private attribute. There are also getters - methods that return the value of a private attribute. All we're doing by adopting this methodology is forcing developers to take a scenic route to storing and retrieving object values. Why not just read the attribute directly? Why not assign a value to an attribute directly? If you have a method that does this for you, you're not improving the encapsulated design of your code. Remember, the idea is to hide data that is irrelevant to the outside world, not to impose an unnecessary data conduit. When the state of an object changes or when data is read from an object, you may want to trigger some event. This is easy to do with setters and getters - set the data, trigger the event. However, this isn't within the scope of encapsulation - if setters and getters provide you with convenience, then by all means, go for it. Just don't assume that you've hidden all irrelevant attributes properly.

We've established that the visibility of an attribute doesn't necessarily dictate the level of encapsulation an object exhibits. It is up to the developer who designs that class to implement an outer wrapper that the outside world can see. Is this possible without explicitly saying so in the code? Can I implement a class that when instantiated, provides an adequate interface to fulfill the object's responsibilities? An interface says more about information hiding than the attribute visibility does. We can explicitly hide stuff all we want, but the interface, the contract says how it'll be used. No developer in their right mind is going to say "how can I abuse this object as much as possible by playing with it's innards even though I shouldn't". We really don't care to know how things work under the hood. We like to think about it as it just works. If the desire to tinker was just too strong, we'd write our own code to do the same thing. Reinventing the wheel is so last decade. When change happens, when inane requirements arrive, the provided level of encapsulation no longer provides the basic necessities of the system. Developers need to start playing out of bounds - we need to rethink what is relevant and what isn't.

At this point, we need to step back and say "hey, looks like a lot of hacking is going on with the Logger class, I think we'd better rethink visibility there". But this never happens. Developers aren't going to say to one another how much a decision to make something private has made their lives miserable. No, it just goes unspoken - privacy is irrevocable. The same predicament holds true for publicly visible attributes - are they forever available to anyone interested or at one point or another does it make sense to hide them? I've always worked under the assumption that visibility is a "set in stone" type of ideology. The reason is simple - if you have something public and you suddenly restrict access, you're asking for trouble. Obviously this is a lot of work to go and fix because we'll no doubt find many subtle issues even after we find an alternative way of doing things since we no longer have access to the weight of the product. You take an integer field that your using somewhere in your application, suppose, innocently enough, your just reading it because you need to perform some calculation. Imagine that. Now this simple act of hiding the integer value, because it is causing problems somewhere else in your code, is now creating a new problem. Now for the other side of the coin - pulling the curtain and displaying something that we didn't even know existed. Great, now I don't have any problems doing what I need to do. I'm free to read any attribute values, I can use them to compute whatever I want and I don't have to worry about picking and choosing or about implementing workarounds. But what about the other problem, the whole thing about developers having access and abusing it? Wouldn't this just break things entirely since I can't trust anyone to do things right and only the most minimalist interface feasible is ever given out to developers? Not exactly. This is actually much safer to do because having something visible, waiting to be read, updated, or deleted entirely for that matter, will not break anything. This is contrasted with taking access away - big problems here. And can we actually plan for this kind of change? We're not exactly going to have error handling for invisible object properties. This chore is for humans to manage.

Another way to think about visibility with regard to object design is file system permissions. These are a lot more fine-grained than those of attribute visibility. With file system permissions, we have the ability to say that someone can read a file, but can't update it. Someone can execute a file while others can't. The number of combinations that can be used on just a handful of files is staggeringly complex. No wonder we don't have something like this in code. Its just trouble waiting to reek havoc. How would that work if we were to think about doing something more fine-grained, like file system permissions? Who would permissions be assigned to? Other software objects? Developers? Would permissions be granted on classes, or individual attributes? Suppose we could manage complexity at this level - a little intimidating isn't it? The problem here is that we're taking some of the creativity out of software design. And this is part of the problem with even simple restrictions like private vs public - we need at least some level of freedom to break things by playing with the internals of the map class. Mistakes will be made, no doubt, but this is part of the process. Object design isn't some black magic that will eliminate the burden of trial and error.
Complexity aside, let's revisit the idea of interfaces. They're more powerful for designing an encapsulated structure than visibility is, or so I'm claiming here anyway. If the contract designates what an object is supposed to do, and not how, we should be content with this as our restriction on how we go about deciding who can access what. Can interfaces guide our decisions regarding access policies? I would certainly think so. Remember, interfaces appropriate the visible attributes, not the hidden ones. How can I be sure that hiding the size attribute of my file class is a good idea until I see how the instances interact with others in the system?

If we don't worry about explicitly hiding information, we save time and effort. We're not concerning ourselves with what makes sense to expose, and what doesn't. In Python, we don't specify the visibility of an attribute or operation. It feels like a weight has been lifted when we adopt this attitude. It feels like I've put up a sign on my class that says "use at your own risk". Even in languages such as Python, there are feeble attempts made to make data private. One such method is prefixing the attribute name with an underscore. This doesn't actually reflect the verity of encapsulation. All we're doing here is sending a visual cue to human readers from the source code. This isn't necessarily wrong, it looks bad, but isn't wrong. I think it sends the wrong message however, because, using an arbitrary marker to signify an out of bounds entity, only signifies privacy, not why. It is probably better to use a descriptive name, one that makes readers think - I bet that attribute is private.

So what is the benefit to explicit data encapsulation in object-oriented software? Information hiding is the key idea that allows us as developers to take a concept and transform it into something more abstract. It is the inner details we often don't care about when we encounter something useful. In, the real world, the products we use on a daily basis have replaceable components. They also have internal parts that we're blissfully ignorant of. Had these inner pieces of an object's core been exposed, we'd risk damaging it. This is why we want to hide these details. This is the secret sauce that makes the thing work. It is the nature of the object. So applied to software, this idea works well - we don't want to expose the innards of our software components to other developers. I'm sure that enforcing these privacy characteristics isn't necessary. It is better to use a contract, this means defining an interface for you're object, either explicitly, or implicitly. Remember, interfaces do not describe the inner-workings of software objects. Interfaces are contracts and are only capable of describing the externally-visible behavior and or data of software objects. Using interfaces as the blueprint for visible properties, instead of explicitly hiding information is more descriptive for developers and thus more valuable.