Thursday, January 5, 2012

Network Protocol Methods

Computer systems need to communicate with one another.  One computer sends data to another, assuming the other is listening.  Down at the wire, the medium over which network information travels, communications are nothing more than electrons or photons.  At this level, the protocol is governed by the laws of physics. The mechanics of how the information is physically transported from point A to point B isn't relevant from the software perspective.  My system has data it wants to deliver to a remote region of the world.  As far as my software is concerned, the hardware could relay it via horseback.  It might take a year, but it would get there.

Of course, only the fastest delivery of information over networks will satisfy the demands of today's information-hungry applications.  There is so much information to absorb, it could not possibly fit in an isolated environment.  Information moves so fast that the only practical means of staying current is to continuously synchronize with other computers that have newer information.  In this regard, the barrier between stored information and information elicited from foreign sources has been permanently blurred.

How did we get to where we are today, and are we doing it effectively?  Big data is only big because software systems are decoupled, conjoined by cables and airwaves.  Big data translates into big connectivity.  Big connectivity can only mean that software systems that talk to one another also understand one another. There is an agreed-upon protocol that computer systems use to pack and unpack transmitted data.  But there is more to network protocols than how to extract information from data.  The web works the way it does because the protocol has application-level constructs that influence behavior.  What can we expect to see with future network protocols?  Will current protocols evolve into something that can support the ever growing vastness of our connectivity?

Sending and Receiving
Data doesn't traverse over a network spontaneously.  Applications instruct the data, influencing where it goes, how to get there, and what form to assume during it's journey.  These properties of network traffic are controlled through network methods.  A method pertaining to a network is something that an application uses to send and receive data.  Developers have a choice — we can get down to low-level socket details, specifying properties on individual network packets, or we can use higher level protocol methods that are more reflective of the architecture used at the application level.

Before we delve into the stuff applications use to make networks comprehensible from a development perspective, maybe it'll help us to think a little more about why additional semantics are necessary.  That is, we can't we simply use send and receive methods to exchange our data?

On the one hand, send and receive are simple — straightforward.  A handful of properties can tweak how our data is transferred and accepted on the remote end. Contrast this with a dozen methods to choose from, each with their own properties — a matrix of complexity emerges.  Conceptually, two methods means there are relatively few abstractions in the protocol to concern ourselves with.  We've got senders, receivers.  We've got packets, and routes.  We've got destinations and sources.  What this amounts to is point A and point B — with a few added details underneath.

The fact that we're able to use these seemingly low-level methods to control the flow of information over a network is due to even lower-level networking ideas. These ideas go beyond the IP layer of any network protocol stack.  Things that are handled by the operating system in the link layer, take care of a tangled mess for us.  If not for the operating system providing us with abstractions to overcome challenges associated with hardware compatibility, just imagine what the networking code for any application would look like.  So with the lowest-level network protocol details out of the way, the operating system has provided us with two basic methods, at the most fundamental conceptual level.

These two basic methods give applications everything they need to establish a connection and to communicate with remote entities — with meager networking code inside the application.  The challenge introduced by having simplistic network protocol methods is in the data itself.  It's easy to send and receive data.  It's difficult to make sense of that data.

Application Protocols
With the amount of data that needs to be transferred between modern applications over modern networks, the simplistic send and receive methods simply do not suffice.  Even smaller applications that aren't of any significant size in terms of complex features struggle to make sense of the data they're exchanging.  This is simply due to the complexity of the data, and the diversity of it.  Chances are, any given application you're using communicates with another service.  The two parties need to understand one another, the receiver needs to know how to unpack the data upon arrival.  This works well at the operating system level — we can send and receive raw data with ease.  But in the context of an application, there needs to be an agreed-upon format — a standard — before it becomes information.

This is why standardized data formats exist — to make communicating over a network easier for applications.  If two applications want to talk to one another, they might decide to use XML as the exchange format.  This way the sender knows how to pack the data and the receiver knows how to unpack and use that data.  The idea of standardized formats definitely makes life in a network much easier for developers in organizations extrinsic to one another.  One service makes interesting data available, in a particular format, perhaps even a selection of formats.  External entities can then consume that data, knowing how to read it, how to transform that data into information of value.

Standardized formats that applications send over the wire are of a different nature than that of the network method.  The data and it's assumed form is the what of the larger network picture.  The methods are the how.  Part of the problem we're seeing with the ever growing complexity of network-enabled applications is that the how isn't exactly clear-cut.  Why is this data being sent?  How should I respond to it? With standardized formats in place, it's easy to embed these types of semantics in the messages.  But it's difficult to sustain this approach on an Internet scale.

This is where HTTP comes in handy.  It's a protocol web applications and clients can use and readily understand without trying to stuff in new behavior.  That is, HTTP already has a set of methods as part of the specification that allows unambiguous intent to be transferred with each request.  You can GET a resource, POST a new resource, PUT new information on an existing resource, and DELETE a resource.  Like the concept of sending and receiving methods, the HTTP methods are simple, there are only four of them (aside from HEAD and OPTIONS which aren't widely used if at all).  Also like send and receive, HTTP methods are utilitarian — they don't pertain to any one specific application domain.

How well does it work?  How much longer will HTTP be able to support the activities of the web given it's current capabilities?

The Future of Network Methods
What does the future hold for methods that make computer networks functional, and not just a jumbled mess of connected cables?  Is what we have good enough to take us into the next five or ten years?  Well, if we consider how fast the web is growing today, that means a whole lot of new resources that'll come into being. That's many URIs to which HTTP methods can be applied.  Does that mean that we'll finally see a weakness of the protocol exposed, that four methods for interacting with networks in fact wasn't enough to do everything we needed?

It's a difficult question to answer because much of the global network activity isn't necessarily web based.  Not every agent wired to a network is a web browser, and HTTP isn't the silver bullet.  For example, real-time publish-subscribe systems use different standards such as AMQP.  Here we have an entirely different set of protocol methods with their own unique advantages, constraints, and limitations. So which network communications protocol are you going to adopt for your application?  Is it even a choice of one or the other, and can you interchange between two or more sets of defined network methods?

In the fast approaching future, the trouble is going to be dealing with large amounts of data, compounded by the large volume of network activity triggered by the growing online population.  We're shown examples of successful implementations that can handle anything thrown at them.  So you would think that we're safe with everything we have in place, right?  Well, that may not be the case, as we thought we had enough IPV4 addresses ten years ago.  So whats wrong with the current implementations?  If we can just hire the right people who've got the experience in constructing these hugely resilient infrastructures that can survive the apocalypse with minimal downtime, we don't have much to worry about.

Well, not everyone in IT is a rocket surgeon capable of cranking out enterprise-grade networked application suites while blindfolded.  We still need to strive for simplicity in the wake of an ever growing number of variables.  The small number of methods bundled with the HTTP protocol is why it's been such a success.  It's a small improvement over the send and receive methods, small enough to remain lucid yet powerful enough to transfer application operations over a network.

The real danger, I think, is in how some of the more powerful network protocols have a tendency to be misused.  HTTP is a good example, probably because it is so widely used, it is also widely misused.  For instance, sending POST requests to update existing resources, or deleting resources through means other than DELETE requests.  If enough applications continue to evolve in this manor, a lot of the value with HTTP is lost because different applications are inventing their own protocols on top of another.  We're simply inventing methods when they should be part of a standard, such was the case when we decided to improve on the send and receive methods by using a standardized format.  In reality, that's all there is to HTTP, and many other protocols — added data in the transferred network data. But we need to abide by the standards because deviations will only proliferate as the web evolves into something much too large to handle all these broken down contracts of communication.

The end result of such a breakdown?  The networks will always be there because we have such a firm grip on the low-level fundamentals that no developer need touch them.  Will the networks of the future, the web in particular, be all that it can be?  Will it provide the best possible service to all it's inhabitants under an array of circumstances?  It doesn't do that now.  I think it has potential to, but it'll take better cooperation between those who build the most flourishing systems on the web.  To standardize would be simply fantastic, because we could sit down and objectively measure how well our computer networks measure up with the protocols we have in place at the application level.  Maybe they don't.  But the solution isn't to invent on top of existing standards.  Let's make HTTP work the way it's supposed to.  Maybe then we'll see if the methods we have will do just fine in ten years from now.