Tuesday, April 27, 2010

Passing URIs

Uniform Resource Identifiers (URIs) are what enable us to find things on the web. These things refer to a resource, uniformly identified by a string. A resource is any digital media that has been made available on the Internet. At a higher level, search engines are what allow us to find resources that live somewhere in the web. Without a URI, there would be nothing useful for the search engines to display in their results. Additionally, it would be impossible for search engines to crawl websites without theURIs that make up the structure of the site.

APIs can be built with a set of URIs as well. These URI-centric APIs are sometimes referred to as RESTful APIs. RESTful APIs have a close association with the HTTP protocol. Because of this we can pass parameters to resources through GET or POST requests made by a client. But these are often primitive types that can be represented as a string. For instance, if I'm using someAPI to update my user profile, a numeric user ID might be a POST parameter I need to send. This is necessary so the application knows which user to update. But what if I were able to pass an entire URI as the identifier for the resource I want to update? Does that even make sense? Well, lets first think about how applications identify resources internally.

The most common way for a web application to identify a resource internally is by a primary key in a database table. This key is typically an integer value that is automatically incremented every time a new record is inserted. This key is unique for each record in that table. This makes the primary key of a database table an ideal candidate for using as part of a URI. You'll often see integers as part of a URI, for instance "/user/4352/". There is a good chance that the number is a unique primary key in the database. This uniqueness maps well toURIs because every URI should be unique in one way or another.

One potential problem with using primary database keys in URIs is that different records in different database tables may share the same key. This doesn't necessarily weaken the uniqueness of the URI because it is still referring to a different type of resource. Consider two database records in two different database tables. These records both have the same integer primary key value, 5. TheURIs for these two resources are still unique because they are referring to two entirely different concepts. So the first URI might be "/user/5/" and the second URI might be "/group/5/". But what if you don't care about the resource type?

A canonical URI might be composed of a UUID instead of the primary key of a database table. UUIDs themselves are unique and may refer to any resource. That is, a UUID doesn't need a context in order to be unique. If our above two URIs were to use UUIDs, they might look something like "/user/cadb1d94-5305-11df-98a5-001a929face2/" and "/group/d8eee85c-5305-11df-8d08-001a929face2". As you can see, we really don't need "user" or "group" as part of the URI. We could refer to a resource canonically with something like "/data/cadb1d94-5305-11df-98a5-001a929face2/". This could refer to either a user or a group. This can be both flexible and dangerous.

Having a canonical URI based on a UUID can be flexible because the client requesting the resource doesn't need to know the context. The client might have acquired this URI and has no idea what exactly it is a representation of. Even with just theUUID , the client now has the ability to discover the type of resource this URI is pointing to based on the data it returns. This can also be dangerous for exactly the same reason. If a client doesn't know how to handle the data returned by a canonical URI, chances of the the client malfunctioning are higher. The data representations returned by URI resources are a lot like interfaces; different data types can still provide the same interfaces by having a subset of common keys.

The location part of a URI might also be useful for passing as parameters to web applications. Until now, I've only been talking about the path in which the server must look for the resource. But this is making the assumption that the resource in question still lives on the same server. By only passing primary database keys or UUIDs as parameters, we leave the location aspect out of the equation. It might be more flexible to pass a full URI as a parameter. Even if the URI location is pointing to the same location in which the request arrived. It really isn't a big deal for a server to check when processing requests. If the resource lives here, we simplydissect the URI and process as usual. Otherwise, we forward the request to the appropriate location. I realize I'm oversimplifying this a little too much but the idea is to think about passing wholeURIs as parameters, not so much the details of how we should go about implementing a full-fledged distributed computing platform.

So remember that canonical URIs composed of UUIDs can be useful when treated with care. If context is important, don't use them. Stick to using primary database keys if it helps keep things simple. Try experimenting with a simple web application that will accept a full URI instead of an ID string of some sort. A flexible solution might even accept either or.