Monday, June 29, 2009

Python CPUs

When building Python applications with any level of concurrency, it is often useful to know how many CPUs are available for instruction processing within the system. It is useful for applications to have this information because decisions of how to best utilize the system for concurrency can be made at runtime using this CPU information.

In Python, a common way to retrieve the number of CPUs on a given system is to retrieve the number of CPUs from system configuration values as shown in this entry. Here is the actual function taken from the entry:
def detectCPUs():
Detects the number of CPUs on a system. Cribbed from pp.
# Linux, Unix and MacOS:
if hasattr(os, "sysconf"):
if os.sysconf_names.has_key("SC_NPROCESSORS_ONLN"):
# Linux & Unix:
ncpus = os.sysconf("SC_NPROCESSORS_ONLN")
if isinstance(ncpus, int) and ncpus > 0:
return ncpus
else: # OSX:
return int(os.popen2("sysctl -n hw.ncpu")[1].read())
# Windows:
if os.environ.has_key("NUMBER_OF_PROCESSORS"):
ncpus = int(os.environ["NUMBER_OF_PROCESSORS"]);
if ncpus > 0:
return ncpus
return 1 # Default
Using this detectCPUs() function, developers can retrieve the number of CPUs available on any major platform. This is done by checking what is available in the system configuration and using that to determine if and where the number of CPUs is stored.

There is one very basic problem to using this approach in that it introduces new code at the application level that is already implemented at the Python language level in the multiprocessing module. There is a much simpler and more elegant method of retrieving the number of CPUs available on the system.
import multiprocessing

This method of retrieving the CPU count is superior as far as the separation of concerns principle. The multiprocessing module is concerned with CPU matters such as how many of them exist. Your application is also concerned with this information, obviously, otherwise it wouldn't be using multiprocessing to begin with. However, it is only the return value your application is concerned with, not the implementation of how the number of CPUs is retrieved. For older versions of Python, since multiprocessing was only introduced in Python 2.6, there exists a backport on pypi. This simply means that your application setup needs to depend on this package in order to support older Python versions.

A use case for using the number of CPUs available on a system within the context of a Python application would be maximizing the concurrency efficiency. The multiprocessing and threading modules share the same interfaces for most abstractions. This means that the application could make a decision at runtime on which module is best suited for the job based on the number of available CPUs. If there is a single CPU on the system in question, the threading module might be better suited. There there are multiple processors available, then multiprocessing might be better suited.


  1. The backport of multiprocessing for 2.5 and earlier is really not all that great - Jesse Noller found some bugs in the core of the interpreter that cause it to crash under certain circumstances while using multiprocessing, and if at all possible you'll be worlds better off using 2.6 instead of trying to play with fire.

  2. @augie The backport works, but lacks fixes to a deadlock-after-fork issues we found in core. The backport is very similiar to the original pyprocessing code in that respect. If people have been using pyprocessing, then the backport should work for them.

  3. I use the backport and it works great. It runs for a few hours a day spawning lots of processes (10s of thousands)