Tuesday, 9 March 2010

Is Python too slow?

Python is a wonderful programming language in so many ways. It is immensely productive. You can prototype code and get things running in minutes, explore changes and mess around.

The best things though is that Python code is clean and uncluttered. All that messing around doesn’t have to lead to messy code. This is pristine, pseudo-code-like logic you can walk through, even with non-programmers. The result: Python libraries tend to be beautifully designed.

So what is not to like about Python? One word: execution speed! Don’t get me wrong though. This is seldom a problem. Most of the time, it is more important to quickly write code which might runs a little slower.

My general rule of thumb is that is that drastic optimisations for speed are only worthwhile if they result in speedups of orders of magnitude. In other words, for production code, the difference between a programme which will complete in an hour instead of days may be worth sweating over.

The other general rule of thumb is that these sort of speedups are only possible with changes in the algorithm you are using. Algorithms are language agnostic. Or so I use to think.

Unfortunately, along with many other dynamic programming languages (Perl) Python can be dog slow. When I used to be a Perl programmer, I would impress my colleagues by re-writing scripts line-for-line in c++ and get 3 orders of magnitude improvement. Since c++ code can be much better abstracted (read some typical non-OOP hacky Perl code with 6 nested levels of references!), I often was able to get significantly better quality, more maintainable code as well.

Rewriting python in c++ takes a little more work and in the best cases I only get 2.5 orders of magnitude improvement. In case you misunderstand me: we are not talking about a 2.5x speedup; c++ code can be up to 300x faster.

So why not just rewrite all long-running critical code in c++ and be done with it? Good programmers definitely need more than one tool / language under their belts for different occasions.

The problem is that the seductive promise that dynamic languages offer, going back to the days of Smalltalk, is a more exploratory (agile?) mode of programming. If you wonder whether some approach is feasible: try it. Programme at speed and accomplish more, differently. Alas, slow execution speed prevents the fast turnaround needed for such experimentation. I am not talking about scripts which take hours to run either. Usually it is the “crippling” difference between a minute and a half and 8 seconds which kills you.

I would gladly trade off some of the much vaunted flexibility of python for some of that 300x speed reduction. Is cython or boost python the answer?