Tuesday, 2 November 2010

Dreaming of a new language

I have a rich imagination.

When I was (much) younger, I use to fantasize that Microsoft would be taken over by a billionaire (!), and start giving away free copies of a standards-conformant C++ compiler...

Ten years later, I have started to fantasize about my ideal programming language.

Now that I have started hacking away at both ends of the spectrum, so to speak, with Python and C++, I naturally want something that has the best of both worlds. The best way, I thought was to try and come up with all the things I could do without.


Declaring Python
The thing with programming in python is that half the time, I just cannot understand conceptually why it is so %$^*&!! slow. Otherwise, in most ways python language design is beautiful and orthogonal. But let us not go into details. What could I do without in Python?

Surprisingly, after 5 years of declarative-less heaven and dynamic typing, I can live with declarations. This is especially true for objects attributes. Of course it is very convenient to add these on the fly, but as my objects grew richer, and when I set aside my code for a few days, weeks, months, I found that I no longer knew what properties my classes had in which situation. Instead, my objects were slowly accreting richer and richer functionality in a non-deterministic fashion as they ambled gently through the data. While this is a beautiful metaphor for the real world, it was driving me around the bend.

Eventually, I had to lay down the law. If I am adding an attribute to a class, it has to happen in one place: __init__. This does mean some instances are wandering around with a lot of None members. So be it. This is usually a sign that I have to refactor the class hierarchy anyway.

Once I decided that all attributes were being added in __init__, it was a small step to thinking that maybe this whole traditional OOP thing of defining classes statically was not such a bad thing. Messing around with cython made me realise that declarations are not that onerous anyway. Python is not Perl: everything is strongly typed. At some level, you are thinking about what sort of thing your variables are. So why not make it more explicit?

I can hear pythonistas being sick in the aisles at this point. Don't worry, I am not talking about messing with Python, per se. Python is fine the way it is. I am fantasising about a new language, remember!


Stealing from C++

O.K. What about the other side? What does C++ bring to the table? 

Don't get me wrong. I love C++ to bits, warts and all. But the syntax is fairly evil. I never notice this normally, but once in a blue moon, I try to show some else what I am doing: "See how elegant and simple this C++ code is?", and turn around to see my colleagues choking to death. 

The declarative syntax inherited from C was a brilliant idea from the 1970s which alas didn't quite work out. I think Dennis Ritchie would do it differently today. Ken Thompson certainly has.


Generic programming
But that is a smallish wart most people quickly get used to. The real problem are C++ templates. Generic programming in C++ is amazing, and offers tremendous power and flexibility unequalled among mainstream languages.But generics in C++ sit on top of the template syntax. 

I first came across Erwin Unrh's ridiculous use of "template metaprogramming" to calculate and display prime numbers in compiler error messages back in the mid-90s.I remember laughing at this clearly useless practical joke. 

Well it turned out that C++ templates are in themselves  a Turing-complete language but for compile-time (static) operations.


Those unfamiliar with "template metaprogramming" must be wondering what the fuss is all about. The point is that making our problem domain more concrete or more specific can often produce code which is orders of magnitude faster and more efficient. The trivial example is when we use a packed array (or vector) of the integers rather than a list of references or pointers to integer objects. The latter is more general because the same list can hold strings, real numbers and so on. But if we are only interested in integers, the packed array will be tremendously faster, smaller, less fussy to deal with and so on. 


Boosting our Spirits
Type-specific data structures are of course the least of what C++ can do. 

boost::spirit is now my number one example of why C++ (still) rocks. Spirit contains a parsing library where the incoming data format is specified in a modified form of EBNF. For those who are unfamiliar, these are very much like regular expressions. The trick is that the library takes EBNF-like C++, and compiles it into the equivalent of hand-crafted, down-to-the-metal, faster-than-anything-else-on-earth code for reading, analysing your data and spitting out C++ objects or whatever you want to do. The parsing rules are first class C++, so you can assign, manipulate, read from them, handle errors gracefully etc. 

This is what C++ is best for: allying high level productivity with super-duper run-time speed.


C++ template syntax is evil!
The headline says it all. C++ meta-programming hijacks the template syntax to manipulate and compute with types much as normal run-time code manipulate and computes with values. Many of the methods in template metaprogramming perforce have a functional rather than an imperative flavour, much as if there is a sort of hobbled Scheme or Haskell sitting secretly on top of C++; and many of the most useful data structures accordingly look like (nested) trees. This is a bit disconcerting to most programmers but probably no bad thing in itself.

The real problem is that this whole hidden meta-programming language in C++ sprung organically, without proper design, out of the clay Bjarne Stroustrup left behind in his play room. Everything is convoluted, appalling difficult to understand and reasonable about, and the ability to fully exploit it hence confined to a high-priesthood with exactly 25.43 members (Some programmers have a fractional understanding...).

My dream programming language
This then is it:
My perfect programming language would have a python-esque resemblance to pseudo-code for the easy cases, and a fully type system which is fully orthogonal (equally easy) in compile-time and run-time manipulations. Oh, I would also like transparently easy multi-threading/multi-processing in the manner of Scala actors or Go channels.

Is all this too much to ask?

Tuesday, 9 March 2010

Is Python too slow?

Python is a wonderful programming language in so many ways. It is immensely productive. You can prototype code and get things running in minutes, explore changes and mess around.

The best things though is that Python code is clean and uncluttered. All that messing around doesn’t have to lead to messy code. This is pristine, pseudo-code-like logic you can walk through, even with non-programmers. The result: Python libraries tend to be beautifully designed.

So what is not to like about Python? One word: execution speed! Don’t get me wrong though. This is seldom a problem. Most of the time, it is more important to quickly write code which might runs a little slower.

My general rule of thumb is that is that drastic optimisations for speed are only worthwhile if they result in speedups of orders of magnitude. In other words, for production code, the difference between a programme which will complete in an hour instead of days may be worth sweating over.

The other general rule of thumb is that these sort of speedups are only possible with changes in the algorithm you are using. Algorithms are language agnostic. Or so I use to think.

Unfortunately, along with many other dynamic programming languages (Perl) Python can be dog slow. When I used to be a Perl programmer, I would impress my colleagues by re-writing scripts line-for-line in c++ and get 3 orders of magnitude improvement. Since c++ code can be much better abstracted (read some typical non-OOP hacky Perl code with 6 nested levels of references!), I often was able to get significantly better quality, more maintainable code as well.

Rewriting python in c++ takes a little more work and in the best cases I only get 2.5 orders of magnitude improvement. In case you misunderstand me: we are not talking about a 2.5x speedup; c++ code can be up to 300x faster.

So why not just rewrite all long-running critical code in c++ and be done with it? Good programmers definitely need more than one tool / language under their belts for different occasions.

The problem is that the seductive promise that dynamic languages offer, going back to the days of Smalltalk, is a more exploratory (agile?) mode of programming. If you wonder whether some approach is feasible: try it. Programme at speed and accomplish more, differently. Alas, slow execution speed prevents the fast turnaround needed for such experimentation. I am not talking about scripts which take hours to run either. Usually it is the “crippling” difference between a minute and a half and 8 seconds which kills you.

I would gladly trade off some of the much vaunted flexibility of python for some of that 300x speed reduction. Is cython or boost python the answer?

Sunday, 28 February 2010

Good looking code

Some programmers have the strange idea that if programming is a science or even a branch of engineering, then there should be no room for aesthetics. In my experience, the best programmers always care about aesthetics: beautiful code is effective, maintainable code. Refactoring code to make it look better removes bugs.

Python...
Like many programmers who have the choice, I chose my main programming language because it feels good. After several years of Perl-ing, Python has been a lovely language to use.

One constant surprise, after 20 years of programming, has been the high quality of the design of standard python modules. You can fix the innards of your library but mistakes in the public interface stay forever, unless you are prepared to piss off half your users by breaking their code. The quality of python libraries is a good sign the python community cares, and this is the best reflection of the language itself.

.. vs C++
Lately for performance as well as professional reasons, I have turning back to my first love, c++. Admittedly, c++ syntax appears full of unnecessary warts, ... if you are in the business of inventing a language from scratch. I haven't heard anybody talk about the beauty of the c++ language. Nevertheless, programming intensively in c++ again after a little break has been a forgotten pleasure.

It is a different aesthetic: the hunt for lean mean code, to write logic at a high level which can nevertheless be compiled down to the fastest executable imaginable. I can remember poring over Stephanov's STL in 1996, knowing that this would be game-changer for c++, and being astounded by Veldhuizen's Blitz++. This is the raison d'ĂȘtre of c++.

Back to aesthetics
This has reminded me that each programming community brings its own appreciation of attractive code. Most languages have to start off by attracting a core of enthusiasts, persuading them that programming in XXX is more fun as well as more effective than what they are using now. There is no point in inventing another ugly, painful language, however useful it might be. We have more than enough kludges, thank you very much.

Scala, Clojure and Google Go each has its own aesthetic which is happily defended with much vigour by their inventors and community. The resulting buzz means that they are accordingly much more likely to have significant communities in 10 or 20 years time. Is this where we should be putting our money? Time will tell.