Monday, April 27, 2009

Python BaseHTTPRequestHandler Request Parsing

When a client sends an HTTP request to a server, that request is sent as a raw string of HTTP request data. Each component of the HTTP request, while in the transit state, is still considered a single monolithic component. This raw HTTP request must be parsed. It is the parsing action that transforms this raw HTTP request string into a coherent set of HTTP request components. Python provides a module in the standard distribution that provides basic HTTP request parsing capabilities. The class used for this is called BaseHTTPRequestHandler. As the name suggests, it is the most basic of all HTTP request handling capabilities provided by Python and must be sub-classed to carry-out any meaningful functionality. What we are interested in here isn't such much the extending of the BaseHTTPRequestHandler class as much as how the HTTP request parsing of the class is broken down. This fundamental action of web programming in Python is rarely seen because many web application frameworks do this for us. Even when using the fundamental BaseHTTPRequestHandler class, developers need not concern themselves with parsing individual HTTP requests. This is also taken care of by the class as a step in the request handling activity. It is useful, however, to see how the most fundamental HTTP request parsing functionality works and if there are any weaknesses with the approach.

The following is an illustration of how an HTTP request is parsed by the BaseHTTPRequestHandler class. Keep in mind, this is only a high-level view of what actually happens. There are many smaller steps kept out of the illustration for simplicity.

One question to ask here is if any of these actions can be parallelized. One candidate might be the initialization of the HTTP headers. The reason being, there is no prerequisite input to the header initialization that isn't available before the actions previous to the header initialization are executed. Suppose this action were carried out in a new thread, would there really be any significant performance gain. Unlikely. Is it an alternative design worth exploring? I would think so. See below for an illustration of the header initialization executing in parallel with the other HTTP parsing actions.