Tuesday, 2 November 2010

Dreaming of a new language

I have a rich imagination.

When I was (much) younger, I use to fantasize that Microsoft would be taken over by a billionaire (!), and start giving away free copies of a standards-conformant C++ compiler...

Ten years later, I have started to fantasize about my ideal programming language.

Now that I have started hacking away at both ends of the spectrum, so to speak, with Python and C++, I naturally want something that has the best of both worlds. The best way, I thought was to try and come up with all the things I could do without.


Declaring Python
The thing with programming in python is that half the time, I just cannot understand conceptually why it is so %$^*&!! slow. Otherwise, in most ways python language design is beautiful and orthogonal. But let us not go into details. What could I do without in Python?

Surprisingly, after 5 years of declarative-less heaven and dynamic typing, I can live with declarations. This is especially true for objects attributes. Of course it is very convenient to add these on the fly, but as my objects grew richer, and when I set aside my code for a few days, weeks, months, I found that I no longer knew what properties my classes had in which situation. Instead, my objects were slowly accreting richer and richer functionality in a non-deterministic fashion as they ambled gently through the data. While this is a beautiful metaphor for the real world, it was driving me around the bend.

Eventually, I had to lay down the law. If I am adding an attribute to a class, it has to happen in one place: __init__. This does mean some instances are wandering around with a lot of None members. So be it. This is usually a sign that I have to refactor the class hierarchy anyway.

Once I decided that all attributes were being added in __init__, it was a small step to thinking that maybe this whole traditional OOP thing of defining classes statically was not such a bad thing. Messing around with cython made me realise that declarations are not that onerous anyway. Python is not Perl: everything is strongly typed. At some level, you are thinking about what sort of thing your variables are. So why not make it more explicit?

I can hear pythonistas being sick in the aisles at this point. Don't worry, I am not talking about messing with Python, per se. Python is fine the way it is. I am fantasising about a new language, remember!


Stealing from C++

O.K. What about the other side? What does C++ bring to the table? 

Don't get me wrong. I love C++ to bits, warts and all. But the syntax is fairly evil. I never notice this normally, but once in a blue moon, I try to show some else what I am doing: "See how elegant and simple this C++ code is?", and turn around to see my colleagues choking to death. 

The declarative syntax inherited from C was a brilliant idea from the 1970s which alas didn't quite work out. I think Dennis Ritchie would do it differently today. Ken Thompson certainly has.


Generic programming
But that is a smallish wart most people quickly get used to. The real problem are C++ templates. Generic programming in C++ is amazing, and offers tremendous power and flexibility unequalled among mainstream languages.But generics in C++ sit on top of the template syntax. 

I first came across Erwin Unrh's ridiculous use of "template metaprogramming" to calculate and display prime numbers in compiler error messages back in the mid-90s.I remember laughing at this clearly useless practical joke. 

Well it turned out that C++ templates are in themselves  a Turing-complete language but for compile-time (static) operations.


Those unfamiliar with "template metaprogramming" must be wondering what the fuss is all about. The point is that making our problem domain more concrete or more specific can often produce code which is orders of magnitude faster and more efficient. The trivial example is when we use a packed array (or vector) of the integers rather than a list of references or pointers to integer objects. The latter is more general because the same list can hold strings, real numbers and so on. But if we are only interested in integers, the packed array will be tremendously faster, smaller, less fussy to deal with and so on. 


Boosting our Spirits
Type-specific data structures are of course the least of what C++ can do. 

boost::spirit is now my number one example of why C++ (still) rocks. Spirit contains a parsing library where the incoming data format is specified in a modified form of EBNF. For those who are unfamiliar, these are very much like regular expressions. The trick is that the library takes EBNF-like C++, and compiles it into the equivalent of hand-crafted, down-to-the-metal, faster-than-anything-else-on-earth code for reading, analysing your data and spitting out C++ objects or whatever you want to do. The parsing rules are first class C++, so you can assign, manipulate, read from them, handle errors gracefully etc. 

This is what C++ is best for: allying high level productivity with super-duper run-time speed.


C++ template syntax is evil!
The headline says it all. C++ meta-programming hijacks the template syntax to manipulate and compute with types much as normal run-time code manipulate and computes with values. Many of the methods in template metaprogramming perforce have a functional rather than an imperative flavour, much as if there is a sort of hobbled Scheme or Haskell sitting secretly on top of C++; and many of the most useful data structures accordingly look like (nested) trees. This is a bit disconcerting to most programmers but probably no bad thing in itself.

The real problem is that this whole hidden meta-programming language in C++ sprung organically, without proper design, out of the clay Bjarne Stroustrup left behind in his play room. Everything is convoluted, appalling difficult to understand and reasonable about, and the ability to fully exploit it hence confined to a high-priesthood with exactly 25.43 members (Some programmers have a fractional understanding...).

My dream programming language
This then is it:
My perfect programming language would have a python-esque resemblance to pseudo-code for the easy cases, and a fully type system which is fully orthogonal (equally easy) in compile-time and run-time manipulations. Oh, I would also like transparently easy multi-threading/multi-processing in the manner of Scala actors or Go channels.

Is all this too much to ask?