I've decided to try out Apache Thrift as an RPC-mechanism in a big project I am doing. Thrift is currently an Apache Incubator project, but was originally developed at Facebook.
You might ask yourself why companies like Google and Facebook develop their own RPC when the world is overflowing with different flavors of RPC? Well, the answer is easy: they need something that works. Most RPC flavors do not work for any meaningful definition of "works". They are either too slow, too complex, too fragile, too cumbersome to use or too limited in which languages they work with. Usually RPC mechanisms fit more than one category.
SOAP is in a special place because it is quite possibly the most ridiculous attempt at solving the problem to date. It combines the horrible mess that is XML with the sort of wrongineering that is so prevalent among the majority of Java developers. The sort of people Douglas Adams no doubt had in mind when he wrote about Vogons. Indeed, if you read through the cruft that is a WSDL file you can't help but think that this is really just thinly disguised Vogon Poetry.
But I digress.
Now, Thrift would not be my first choice. My first choice would be the RPC mechanism that Google uses internally. Unfortunately they have not open sourced it yet. If you have looked at Protocol Buffers you have seen part of the technology that is used there. I periodically pester Chris DiBona about Google releasing their RPC mechanism as open source, but the problem seems to be that they don't have any takers. Everyone at Google is terribly busy :-).
(If you work at Google and you want to do something about it, please talk to Chris. He should be able to help you figure out what you need to do in order to release it as open source).
I chose Thrift because it is used in production at a large site (well, several large sites) and because it has support for a great number of languages. The latter is important to me. For an RPC mechanism to be meaningful, it needs to support at least Java, C++, Python, Ruby, PHP and C#. It would be nice if they could get the Thrift libraries into the official Maven repositories soon too, but from what I gather this is happening Any Minute Now.
(There are open source projects for creating Protocol Buffer-based RPC mechanisms, but none of which gave me the confidence to choose them for an important project.)
The one thing I don't like about Thrift is that the compiler is written in C++ and it depends on Boost. The fact that the compiler is written in C++ makes it awkward to integrate nicely into multi-platform builds since you end up needing binaries for each platform. I'm on a Mac right now, and as we speak i am installing the Thrift compiler on my laptop. But before I can do that I have to install Boost.
Boost alone is reason enough to become seriously grumpy about Facebook choosing to implement the Thrift compiler in C++. It is a monstrosity and just to install it via the ports system takes forever.
I can't help but think they chose the wrong approach here. It isn't like the Thrift compiler needs to be extremely fast. It just needs to work, be fairly easy to extend and it should run "anywhere".
Which leads me to the next question: what should they have chosen?
For me it would be convenient if it was written in Java. But that would require firing up a Java VM during builds which may not be palatable to those who aren't using Java. A camp I find myself in from time to time. One should be careful not to foist one's religion upon others.
I think this leaves us with Perl, Python and Ruby.
I am not sure Ruby has achieved enough penetration yet; that is, I am not sure you can expect Ruby to be installed everywhere. Also, for those who worry about these things: Ruby is kinda falling out of fashion. (You need to be hacking Erlang to be part of the in-crowd nowadays).
Universal availability suggests Perl -- but seriously: perl!? You can write neat code in Perl, but almost nobody does. Its bad reputation is somewhat undeserved, but it is true that things tend to get messy when the leading stars of the community tend to cultivate terse lack of readability when you can write perfectly readable code without much loss of performance. So scratch Perl.
This leaves us with Python. Mind you, I have never been a big fan of Python. I am not sure why, but the language just doesn't fit my taste.
Still, I think Python is probably the right language for the job. It is reasonably clean, it is old enough to probably be semi-universal and it should be doable to implement a compiler that can run anywhere Python runs without modification. Not only that, you could run it on the JVM as well using Jython, to ease integration with Java build systems.
Because we are talking about what should be a relatively small compiler. We are talking about a program that takes text input, does a bit of pondering and then spits out text. Even for a huge service definition the input would be rather modest and it doesn't matter if it takes 0.1 seconds or 1.0 seconds to perform code generation. (Yes, I know it is a lot slower for typical web services code generators, but there you have to contend with an order of magnitude more complexity and fairly ratty implementations. I would not aspire so low as to measure myself against the likes of Apache Axis).
sloccount puts the current implementation at 21740 lines of C++ and if the smug people who prattle on about DSLs at conferences aren't completely full of it, it should have been doable to implement the Thrift compiler in Python instead. In fewer lines (consider that a challenge, Pythonistas of the world! :-)).
I'd be happy to hear YOUR opinion on RPC mechanisms, implementation of portable tools for DSLs etc.