Friday, November 12, 2010

API Documentation

Creating API documentation is a tedious task at best. Automating this process removes most drudgery involved. Programmers glimpse at API documentation at several points throughout the day. We do this to confirm doubts or to make sure that the function was in fact returning an integer type. Not only does automated documentation generation safe us effort, it is also a time saver. Is there a need for human intervention, given that we have tools at our disposal to write API documentation for us? Are the tools really that good? And if not, if they're still lacking in one respect or another, will they ever replace us entirely?

What exactly do automatic code generators do? Put simply, they collect oodles of software system information and render it in a programmer-friendly format. The basic idea is to parse the source code tree of the project and write the documentation output. The format of the documentation can be anything really, but common forms are PDF and HTML. HTML is probably the best in most situations as it is easier to navigate and usually looks better.

Automatically generated API documentation depends largely on code higher-level code structures, not necessarily the low-level file operations and so-on. Modules, classes, and functions are the preeminent descriptions that make API documentation useful. Finer details about these high-level units can also be included by the documentation generator. Function parameters, return types, and class attributes - all relevant things programmers want to read about.

We annotate these these high-level language constructs using comments which are also captured by some of these documentation-generating tools. If you use Python, you have the added privilege doc-strings, which are part of the code objects themselves. In either case, you, the developer, have the ability to extend the content of the generated API documentation. You're effectively writing documentation and code at the same time. Sounds great, doesn't it? The difficulty with this approach is intent. Are code documentation and API documentation the same thing?

Documenting code is all about explaining the "nitty-gritty" to those with the misfortune of reading your code. You've agonized over strenuous details and need to walk any reading eyes through it. Probably this isn't the kind of thing we care to see in API documentation. If I need to look up something specific in the API documentation, I don't want to be bothered by big chunks of "this attribute needs to be reset after the....". Encapsulation also applies to documentation.

How do programmers format the comments lifted from the code by generators? Documentation tools do a fine job of formatting language constructs like classes, functions, variables, etc. What they can't do is format text embedded inside an explanation. In the documentation of a function, I'll sometimes want to see a brief paragraph that tells me why the state of the file object stored in an output stream queue changes. It is helpful to format text inside these explanations as well as link to other documents. A popular format for writing API documentation seems to be reStructuredText. What this amounts to is programmers using a markup language within the programming language. This adds an unnecessary burden - especially since comments should be geared more toward the low-level.

There really is no substitute for automatically generating the code structure used in API documentation. The only alternative is to do it manually, which is excessively tedious. But at the same time manually writing the API documentation gives you freedoms you don't necessarily have when generating it via code comments. For one thing, you're not legally bound for life to a single tool. You can format as you deem necessary. You're also not restricted by the configuration constraints generation tools impose - you can document as much or as little code as you want. Whichever method you choose, just be sure that good documentation is the result.