Update 2006-01-30. Did a lot of surgery on the Ruby code to make it more idiomatic, based on many good comments I received.
Update 2006-01-30. Note on lines of code comparisons. The example code isn't trying to do a full-on SLOC comparison of the languages, but you can get a feel for the differences. Ruby and Python are much shorter than Java which is much shorter than C++. However consider this a very casual comparison because there was not attempt to compress lines of code or make all the comments equal length etc.
Update 2006-01-28. I got a lot of good comments on the article and code. I got a lot of suggested improvements, mostly to Ruby and some to Python. Java didn't get any comments. For C++ I got one that is pretty embarassing but sort of makes a point. My brother and his colleagues pointed out that I have memory leaks in the C++ code. Oops. The leaks are now fixed. My excuse is that I was writing code in 'garbage collected mode' and got burned for not mentally switching to 'C++ mode'.
I am a language agnostic journeyman programmer. I am not a fan of a particular language (I almost said 'fanboy') but thats a bit inflammatory). I just want to write useful programs and have fun doing it. I know C++ and Java pretty well. I did some beginner work in Python and Ruby. I then came up with the following conclusions. But before you flame, read the whole article.
When running various distributions of Linux, I always ran into the choice of KDE or GNOME. There are plenty of advocates on both sides, but there was no overriding authority. Then recently Linus Torvalds came out with a definitive opinion. He took the unequivocal position that KDE is best. Not that he is necessarily the final arbiter of user interfaces, but at least he provides a strong datapoint, and since he is smarter than me and since all the other opinions seem to come from biased sources, I can now pick KDE and feel better about it. Paraphrasing the old IBM criteria, 'no one was ever fired for following a Linus directive'. Heh, after all that it turned out that I wanted to use Ubuntu which works best with GNOME so I ended up with that for now. So 'most practical' won out over 'best'.
In the meantime I realized I needed to learn a new language and the current buzz is Python and Ruby. Again I couldn't find a definitive answer of which one is best. From all the buzz, I came up with a vague impression that Ruby is more pure and is set to win in the long run, but that Python is currently more practical for now. And Google uses Python, which is a significant datapoint. They aren't idiots over there.
To see what I could figure out myself, I decided to code up something in Java, then port it to Python and Ruby and see how I felt about each, and try to identify where the big wins are for each language.
One caveat. If something significant is missing from a language, like garbage collection, then I don't want to hear a response that says "well, if you use XYZ unsupported library, or you do ABC convoluted technique, then you can do the same thing in [put language name here]". I am trying to evaluate the STANDARD here, since of course you can probably do anything in any language including assembler if you work hard enough. And the problem with using a nonstandard library is not just the extra integration work, it's that you are basing your code on something that may fall by the wayside later on and then you are stuck. Sometimes it's worth it but that has to be proven on a case-by-case basis as far as I am concerned.
I started in C in 1985, learned C++ in 1990 (Zortech C++) and have been using it ever since. I learned Java in the mid-90's when it was first coming out, and found three big win's for Java over C++: Garbage collection, portability and simplicity. Garbage collection and simplicity created big productivity gains, and portability is portability. Not having done a garbage collected language before, the productivity gain was readily apparent. The simplicity of Java over C++ was really nice. When coding C++, I needed Meyer's Effective C++ on my desk at all times to be sure I wasn't invoking some weird type coercion or copy constructor/assignment operator anomaly. And don't even start with templates. With Java I never needed that because it is just simpler. And the Java libraries were more comprehensive and string handling was easier. So in general I was more productive coding away in Java. I still liked C++ but it seems that when programming C++, the fun is in figuring out the language and library , like solving a puzzle. That leaves less time to spend on solving the application domain problem.
The attached code samples are implementations of a Red-Black tree algorithm adapted from descriptions in "Algorithms in C++", Sedgewick and "Introduction To Algorithms",Cormen/Leiserson/Rivest. I picked this because it was short but had some complexity.
code notes and disclaimers:
It was surprisingly easy to port the Java code to Python and then port the Python to Ruby. A lot of it was regular expression search and replace, getting some naming conventions right and adapting to a few language differences. During the porting process, the two big gotchas I ran into were Python block indenting errors and Ruby's horrible compiler diagnostics.
Porting the Java code to C++ was much more a hassle. I attempted to make use of as much static type checking mechanisms as I could. In Java I used generics for the tree, and in C++ I used templates for the container and 'const' where appropriate. The big gotchas on porting to C++ were:
It took me a while to get my editor (JEdit) happy with Python and getting to not use tabs. Fortunately I never screwed up the file so bad that the code didn't work but I always had an unpleasant uncertainty about the indentation simmering in the background. Some (or all) of this may be prejudice. I really liked braces better than either Ruby Do-End or Python indenting, at least when I was coding. On the other hand, a properly indented Python file looks much much cleaner and is easier to read than any of the others because you don't need all the block closing symbols. However the explicit 'self' argument makes it look less clean than it could.
Many of the compiler diagnostics I got during the Ruby porting simply said "syntax error" and gave the line number for the last line in the file. Great. I spent a lot of time doing binary searches on my code to find the error source.
One thing I did differently in each language was try to adapt a 'Visitor' pattern (for traversing the tree) to the preferred idiom for each language. You could of course simply code up a Visitor class that is nearly identical for each language, but instead I did the following.
The Java and C++ approaches give you static type checking but takes a lot more cruft to get going. I found the Python named function parameter very convenient. But it doesn't carry any state so if you need state then you use a class. Surprisingly, I found the Ruby lambda easier to understand and implement than the Ruby 'block'. That is because my traversal algorithm is recursive, and the lambda just gets passed around as a parameter (like the python named function parameter). I didn't exploit the full potential of a lambda closure.
The Ruby block scheme (pun not intended) requires some tricky syntax in the recursive calls, and I could not find a good explanation of how to handle recursive use of blocks in the Ruby documentation. I found a single web hit with an example and after fiddling with it I got it to work. I think I understand them now but it is still a bit fuzzy. I mean, I know what to do now but it takes some concentration to figure out what exactly is happening and why the code looks like it does. I found that viewing a Ruby block as a co-routine (per the documentation) and not as a subroutine to be the best way to understand the whole thing.
All that said about the Visitors, I am a Python/Ruby novice so possibly I did things the hard way :)
There has always been this tradeoff. In fact, the Python/Ruby vs. Java performance controversy sounds a lot like the C/Assembler/Forth discussions in the embedded systems world of the early 1980's. Forth was interpreted, it didn't need a compile cycle and it had (supposedly) productivity enhancing features that C didn't have. Development cycles were much shorter with Forth. Performance was not as fast as C or assembler but was close. The drawback of Forth was the weirdness of the language. C won out and Forth went to the dark corner of mostly forgotten languages.
Interpreted languages give you a much quicker development cycle, especially on big programs. There is no doubting that. Its simply a tradeoff of execution speed vs. productivity. Some applications need the speed. I think it is a premature optimization to say the "I like C/Java better than Python/Ruby because they execute faster". Interpreted is better if you can get it. When I was testing the code I experienced the advantage of interpreted. I didn't really measure performance but other sources show the differences. But since Python/Ruby seem to interface to C/C++ pretty readily, I would be very comfortable working in the interpreted world and descending into the netherworld of compiled C/C++ when required. Yes, Python is actually compiled for a VM but you don't have an explicit compile operation so it acts to the user like an interpreted language.
Ok, I like the productivity increase provided by dynamic typing because it eliminates a lot of scaffolding. I found it quite interesting to see errors pop out at runtime that would normally be compile time in C++/Java. These runtime errors were obviously influenced by the paths taken in the test program (or how far it got before it barfed). For a given run, I clearly wasn't seeing all the instances of this class of errors as I would have with static type checking.
Coming from a statically typed language background, my gut says that dynamic typing creates a risk. The Python/Ruby bloggers say that if you just unit test properly, then there is no problem. Brucke Eckel has a well reasoned essay on the issue. I would argue that expecting unit test to catch typing errors has two issues:
We static typers may be wrong. I had a similar experience moving from PVCS to Subversion version control. Oh, My, God, no locking? The code will be completely ruined in a week. But it turned out to be a non-issue and Subversion added so much less friction to the development cycle that productivity was improved maybe 10%. The collective good experience overrode the predjudice based theoretical 'proof'. The same argument can be made for dynamic vs. static typing.
I wouldn't mind a separate 'lint' tool for Python/Ruby (is it possible?). I use lint for C/C++ religiously. The whole compiled language community is moving towards more static type checking (Java Generics, for example) rather than less. Are they all idiots? (don't answer that)
My conclusion is that dynamic 'duck' typing is more productive, more pleasant, gives cleaner looking code but it incurs a risk that you will get a runtime type exception in your application at some later date. The risk is there, quit denying it.
These results are meant to cover issues I noticed in the porting/testing I did. Not an overall evaluation. If I mention stuff that I didn't run into first hand, then throw that out.
none of these are that important
A final thought on C++. To me the C++ Standard Template Library is distinguished from the other language libraries in that it seems to be much more mathematically thought out. The containers and algorithms in the STL all have explicit runtime complexity guarantees. There seems to be much more computer science in the STL than in the other language libraries. Java is sort of like that, whereas Ruby and Python libraries seem much more ad-hoc. That probably has a lot to do with their open community driven approach to libraries. I really like how the STL was thought out and designed.
Java is more productive than C/C++. Use C/C++ only when speed or bare metal access is called for. Python/Ruby is more productive than Java and more pleasant to code in. There is a big question on static vs. dynamic typing. I contend that static typing has to be better for the purposes of program correctness, but the required cruft reduces productivity. If actual practice in large systems shows that in fact runtime typing errors don't occur often and are worth the productivity tradeoff, then I will bow to dynamic typing. I can't come up with a definitive answer to Python vs. Ruby. They seem very equivalent. Would choose based on practicality in a given situation. My general feeling was that Python annoyed me in ways that Ruby didn't, but I think those annoyances would disappear if I was using Python all the time.
Crap, the whole point was to pick Python or Ruby, but I am back where I started.
Code to HTML conversion done with JEdit and the CodeToHTML plugin by André Kaplan.