Thursday, October 22, 2009

Publishing CSV

I recently came across this Python recipe for serving CSV data from a CherryPy web controller. The recipe itself is fairly straightforward. It is a simple decorator that can be applied to a method which is also a CherryPy controller. The decorator will take the returned list data and transform it into a CSV format before returning the response.

You sometimes have to wonder about the CSV format, it is quite non-descriptive for humans to read. Fortunately, that isn't the reason it was created. The CSV format is easy for software to understand without requiring a lot of heavy lifting. This is why it is still widely used. Virtually any data can be represented with it. But what happens when something goes wrong? Typically, if an application complains about some kind of data it is trying to read, a developer of some sort needs to take a look at it. Good luck with trying to diagnose malformed CSV data, especially a large chunk of it.

The fact of the matter is, the CSV format is still a requirement in many contexts. Especially in terms of providing the data as opposed to consuming it. It is most likely going to be easier to provide a CSV format to a client than it is to say the client needs to be altered to support SOAP messages. If that were the case, there would certainly be many upset people using your service, or, nobody using your service at all.

As the recipe shows, transforming primitive data structures into CSV format isn't that difficult. Especially with high level languages that have a nice CSV library like Python does. The HTTP protocol is more than likely going to be the protocol of choice. In this case, the important HTTP headers to set are Content-Type and Content-Length.

Odds are, the CSV format isn't the format of choice for most web application designs. That is, the developers building APIs aren't going to use CSV. They are probably going to use something more verbose like JSON or some XML variant. This just makes the clients that need to interact with this data much easier. Also, they much more common. Chances are that CSV support will be an afterthought. This isn't an uncommon change request for any web application project though. No one building a web application can expect the initial chosen format to suffice throughout the lifetime of the application. What this means is that exposing the data to the web is the easy part. It is coming up with a sustainable design that is the challenge.

Many web application frameworks provide support for multiple response formats. And, if CSV isn't one of them, chances are that there is a plugin that will do it.

So, even if the framework doesn't support CSV data transformation functionality, as is the case with CherryPy, the Python CSV functionality will do just fine on its own. That is, the controller can be extended with CSV capabilities as is the case in the recipe. Below is an illustration of a web controller with CSV capabilities.



Here the Controller class inherits CSV capabilities. The DomainObject class, which for our purposes here could be anything that is part of the application domain and needs to be exposed to the web. With this design, as is the case with CSV functionality offered by the web framework of choice, the responsibility of the CSV transformation falls outside of the domain entirely.

Below is an alternate design that give CSV data transformation capabilities directly to the DomainObject class.



So which design is more realistic? Probably the former simply because the use case for CSV data transformation outside of the web controller context isn't all that common. But does the latter design even make sense? Are we giving too much responsibility to a business logic class? Well, I would argue that it depends on how portable the domain design classes need to be. Sure, with the CSV capabilities being given to a domain class, we are violating a separation of concerns principle, albeit, only slightly. It isn't as though the class itself is being altered to support a specific data format. If this is a valid trade-off in a given context, I would strongly consider keeping data format transformation out of the web controllers.