Posted By George W - 2005-04-22
Speed up Python with Pyrex
Speed up Python with Pyrex
- Pradeep Kishore Gowda
Language: Pyrex, Python, C
Platform: Linux, Windows
Level : Intermediate
Pyrex lets you write code that mixes Python and C data types any way you want, and compiles it into a C extension for Python
The often-heard complaint about Python, mostly from C/C++ programmers, is that Python code execution is slow. Python is fast enough for most I/O bound, DB and GUI applications. But, when it comes to pure number crunching, Python is many orders slower than a corresponding C application. The solution is to write the performance intensive parts in C, as an extension.
However, writing C extensions to Python is a non-trivial task. Even before you start writing your first extension function, there is certain amount of wrapper code you need to write. Then there is data type conversion between Python and C. Basic types such as integers and strings are easily converted, but user defined types are much more tricky. As with any C program, you are saddled with memory management chores. So, you might end up chasing nasty bugs while you should be more concerned about writing good code that does the work.
Tools like Simplified Wrapper and Interface Generator (SWIG), take away the burden of writing extension code to a certain extent. SWIG takes a definition file, consisting of a mixture of C code and specialised declarations, and produces an extension module. But, SWIG is not very helpful when you want to create new Python types.
Other projects like PyInline take a different approach, by allowing the C code to be embedded into the Python code. PyInline then extracts the C code from Python and compiles them into extensions. But the problem with types still remains.
Pyrex provides an elegant solution to these problems. Pyrex is a Python-like language specifically designed for writing Python extension modules. The syntax is almost Python-like, i.e., most of Python code is valid Pyrex and vice versa. In short,
Pyrex is Python with C data types.
The following (primes.pyx) is Pyrex code which computes the first 'n' prime numbers.
1 def primes(int kmax):
2 cdef int n, k, i
3 cdef int p
4 result = 
5 if kmax > 1000:
6 kmax = 1000
7 k = 0
8 n = 2
9 while k <= kmax:
i = 0
while i <= k and n % p[i] <> 0:
i = i + 1
if i == k:
p[k] = n
k = k + 1
n = n + 1
18 return result
This code reads as easily as any Python code. The only difference being the type declaration of the variables. On running pyrexc (the pyrex compiler) on this code, a C file is generated (primes.c).
bash$ pyrexc primes.pyx
This file can be easily compiled into a C extension. For example with gcc,
bash$ gcc -c -fPIC -I/usr/include/python2.3/ primes.c
This results in primes.o file which has to be linked to produce an extension module.
bash$ gcc -shared primes.o -lxosd -o primes.so
We can try out this newly created module in the Python interpreter as follows:
>>> import primes
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
Let us compare (compare.py) how this C extension compares with a pure Python function, in terms of execution speed. In the above example, removing cdef and int keywords will give a working Python module (pyprimes.py).
1 # compare.py
2 # Calculate prime numbers
4 import time
5 import pyprimes #pure python implementation
6 import primes #C extension - created using pyrex
8 if __name__ == "__main__":
n = 1000
print "Time taken for %d prime numbers : " % n
start = time.time()
x = pyprimes.primes(n)
end = time.time()
print "PURE PYTHON : %f"% (end-start)
start = time.time()
x = primes.primes(n)
end = time.time()
print "PYREX: %f" % (end-start)
bash$ python compare.py
Time taken for 1000 prime numbers :
PURE PYTHON : 0.317926
Whoa! That’s nearly 12 times increase in speed. In contrast to writing the same module in C, the Pyrex code is almost as long as the native Python code and it is as readable too.
How does Pyrex achieve this? If you recall, everything in Python is an object. Even a basic type like int is an object in Python. So, whenever you use an int, it has to box and unbox to get at the actual data. This adds overheads to each computation involving even the simplest type. This is the price we pay for automatic memory handling and intelligent interaction with other types. On the other hand, a C or Pyrex int is a location in the physical memory. An operation on C/Pyrex int does not involve redirection.
Creating New Types
In Pyrex, code which manipulates Python values and C values can be freely intermixed, with conversions being handled automatically whenever possible. Reference count maintenance and error checking of Python operations is automatic, and the full power of Python's exception handling facilities is available even when dealing with C-data. Pyrex also lets you write code to convert between user-defined Python data structures and C data structures, in an almost-transparent manner. The power of this is evident when compared to traditional methods of writing C extensions, where a good knowledge of Python/C API is required. Let us look at an example
1 cdef class Account:
2 cdef float balance
3 cdef char *name
4 def __new__(self,name):
self.name = name
balance = 0
7 def incrAmt(self,amt):
self.balance = self.balance + amt
9 def getBalance(self):
With this, we have created an almost-Python looking Type, which is accessible to both C and Python calls. Line 1 defines a new type - Account. The C variables are declared immediately after the class declaration. You cannot declare C variables inside the constructor, __new__. The rest of the code is just Python.
Note: __new__ is called before the object is created.
>>> from Account import Account
>>> myac = Account('Pradeep')
The Python usage of this new Type is transparent, as you can see from the code above. The Python programmer using this Type is completely unaware of its C underpinnings!
The integer ‘for’ loop in Python usually makes use of the range() function. ‘range’ is a Python function and hence slower. Pyrex provides another form of for-loop:
for i from 0 <= i < n:
Pyrex does not support all the functionality of Python. Some of the gotchas! to remember would be:
· import * is not allowed. However, other forms are allowed
· Generators cannot be defined in Pyrex
· In-place arithmetic operators (+=, etc) are not yet supported
· List comprehensions are not yet supported
· Functions cannot be defined inside other function definitions
Pyrex and the Programmer
Whenever we talk of speed, we usually refer to code execution. However, as any seasoned Pythonista would vouch, the increase in the programmer's productivity is what makes Python such an attractive language. With Python, the programmer can cut down the coding time by orders of magnitude. Pyrex allows the Python programmer to keep his lazy and Pythonic way of coding, and yet achieve execution efficiency. Just imagine all the time you would be saving by not having to hunt down memory leaks in C.
Where do we use Pyrex?
If you have blocks of code that deal with numerical computations in tight loops, then you should consider moving that code into a Pyrex module. If done correctly, those performance-intensive parts can give you a boost of anywhere between 10-50 times the Python speed. Code that deals mainly with I/O bound operations and library calls is not going to benefit much from Pyrex.
Pyrex Site: http://nz.cosc.canterbury.ac.nz/~greg/python/Pyrex/
Pyrex Guide: http://ldots.org/pyrex-guide
David Mert'z article: http://www-128.ibm.com/developerworks/library/l-cppyrex.html
Simplified Wrapper and Interface Generator (SWIG): http://swig.org
Pradeep Kishore Gowda is a Senior Software Engineer with ZeOmega, Bangalore, a company deeply committed to Open Source Software development. He is a GNU/Linux user since 1997 and is an active member of BangPypers, the Bangalore Python User Group. He blogs at http://btbytes.com. Email: