Wednesday, March 18, 2009

Evaluating file-monitoring techniques in Python

There is a general need to monitor changes made to files in any computer system. The question being, why? The short answer being that when a file has changed, the state of the system has also changed and there are going to be reactions to that change in state. These events that take place in response to file changes usually happen at a very low system level. At the application level, there is also a need to monitor the system state or sub-states such as files. For instance, if we are working with a web development framework such as TurboGears and we want the development HTTP server to reload every time a source code file changes, those files need to be monitored. Once a change has been detected by the monitoring process, the process can then reload the HTTP server. Another use for files is to communicate between different processes within a system. One process, or many processes can monitor a given file and react accordingly when the state of that file changes.

There are two approaches I'm going to evaluate here. The first is the CherryPy method which is non-blocking. The second is the generic method which is blocking. Although the two methods differ at a higher level, they are the same in that they determine if a file has changed state based on the modification date.

The first of the two methods is a blocking method. This method will block the flow of control within an application. This means that any code that comes after this logic, will not be executed until this file monitoring code is complete. The reason this method is blocking to begin with is it involves a loop and breaking out of it is the only way to terminate the file monitoring logic. The developer using this method can specify an interval at which the logic will test the specified files for changes. It will check the modification date of the file and compare it to the last modification date recorded by this method. If the date is later than the last recorded date, the file has changed. Obviously the more files being monitored, the more overhead involved so care must be taken to not overload the system by monitoring too many files. This method of file monitoring is generic and can be used in many contexts with little modification since it wasn't designed with any particular application domain in mind.

The non-blocking method is based on the CherryPy Python web application framework. It uses this non-blocking method to monitor changes made to Python modules within a given CherryPy application. Once a module state change has been detected, it is an indication that the HTTP server should be restarted to reflect those changes in the running application. The monitoring logic is periodically run in a separate thread of control at a specified interval. This means that is the server is in the middle of processing a request, the control flow does not block in the middle of the request entirely. The method used to determine if the file has been changed is the same as the blocking method. The modification date is compared to the previous modification date. This method of monitoring for file state changes on a file system is a very elegant solution. The main downfall is that it is context-specific. It was designed with CherryPy in mind. However, it is not so tightly-coupled to CherryPy that it could not be used somewhere else. Some minor changes would do the trick.

Lastly, if you need to monitor file system state changes, which method do you want to use? That is, which method is best suited for your application? There are a couple factors to consider. First, if your application can only support a single process, the blocking method is out of the question. However, this is rarely the case. The application could simply spawn a new file monitoring process. This could introduce a new problem though because the file monitoring process would need to communicate to the main application process. Having done this, you will have introduced a new communication channel in your application and thus increasing the complexity considerably. The CherryPy, non-blocking, file monitoring approach could prove to be the better approach if the application you are developing needs to react to file system state changes as well as changes in state from other resources. The challenge here is that it is not nearly as generic as the blocking method and would require a larger development time investment. Do some investigation as to what state changes your application must respond to. If only file system state changes, the blocking method may suffice. In must other cases, the non-blocking approach may be better suited.