2010-03-24

Musings on RPC mechanisms and DSLs

I've decided to try out Apache Thrift as an RPC-mechanism in a big project I am doing.  Thrift is currently an Apache Incubator project, but was originally developed at Facebook.

You might ask yourself why companies like Google and Facebook develop their own RPC when the world is overflowing with different flavors of RPC?  Well, the answer is easy:  they need something that works.  Most RPC flavors do not work for any meaningful definition of "works".  They are either too slow, too complex, too fragile, too cumbersome to use or too limited in which languages they work with.  Usually RPC mechanisms fit more than one category.

SOAP is in a special place because it is quite possibly the most ridiculous attempt at solving the problem to date.  It combines the horrible mess that is XML with the sort of wrongineering that is so prevalent among the majority of Java developers.  The sort of people Douglas Adams no doubt had in mind when he wrote about Vogons.  Indeed, if you read through the cruft that is a WSDL file you can't help but think that this is really just thinly disguised Vogon Poetry.

But I digress.

Now, Thrift would not be my first choice.  My first choice would be the RPC mechanism that Google uses internally.  Unfortunately they have not open sourced it yet.  If you have looked at Protocol Buffers you have seen part of the technology that is used there.  I periodically pester Chris DiBona about Google releasing their RPC mechanism as open source, but the problem seems to be that they don't have any takers.  Everyone at Google is terribly busy :-).

(If you work at Google and you want to do something about it, please talk to Chris.  He should be able to help you figure out what you need to do in order to release it as open source).

I chose Thrift because it is used in production at a large site (well, several large sites) and because it has support for a great number of languages.   The latter is important to me.  For an RPC mechanism to be meaningful, it needs to support at least Java, C++, Python, Ruby, PHP and C#.  It would be nice if they could get the Thrift libraries into the official Maven repositories soon too, but from what I gather this is happening Any Minute Now.

(There are open source projects for creating Protocol Buffer-based RPC mechanisms, but none of which gave me the confidence to choose them for an important project.)

The one thing I don't like about Thrift is that the compiler is written in C++ and it depends on Boost.  The fact that the compiler is written in C++ makes it awkward to integrate nicely into multi-platform builds since you end up needing binaries for each platform.   I'm on a Mac right now, and as we speak i am installing the Thrift compiler on my laptop.  But before I can do that I have to install Boost.

Boost alone is reason enough to become seriously grumpy about Facebook choosing to implement the Thrift compiler in C++.  It is a monstrosity and just to install it via the ports system takes forever.

I can't help but think they chose the wrong approach here.  It isn't like the Thrift compiler needs to be extremely fast.  It just needs to work, be fairly easy to extend and it should run "anywhere".

Which leads me to the next question: what should they have chosen?

For me it would be convenient if it was written in Java.  But that would require firing up a Java VM during builds which may not be palatable to those who aren't using Java.  A camp I find myself in from time to time.  One should be careful not to foist one's religion upon others.

I think this leaves us with Perl, Python and Ruby.

I am not sure Ruby has achieved enough penetration yet;  that is, I am not sure you can expect Ruby to be installed everywhere.  Also, for those who worry about these things:  Ruby is kinda falling out of fashion.  (You need to be hacking Erlang to be part of the in-crowd nowadays).

Universal availability suggests Perl -- but seriously: perl!?  You can write neat code in Perl, but almost nobody does.  Its bad reputation is somewhat undeserved, but it is true that things tend to get messy when the leading stars of the community tend to cultivate terse lack of readability when you can write perfectly readable code without much loss of performance.  So scratch Perl.

This leaves us with Python.  Mind you, I have never been a big fan of Python.  I am not sure why, but the language just doesn't fit my taste.

Still, I think Python is probably the right language for the job.  It is reasonably clean, it is old enough to probably be semi-universal and it should be doable to implement a compiler that can run anywhere Python runs without modification.  Not only that,  you could run it on the JVM as well using Jython, to ease integration with Java build systems.

Because we are talking about what should be a relatively small compiler.  We are talking about a program that takes text input, does a bit of pondering and then spits out text.  Even for a huge service definition the input would be rather modest and it doesn't matter if it takes 0.1 seconds or 1.0 seconds to perform code generation.  (Yes, I know it is a lot slower for typical web services code generators, but there you have to contend with an order of magnitude more complexity and fairly ratty implementations.  I would not aspire so low as to measure myself against the likes of Apache Axis).

sloccount puts the current implementation at 21740 lines of C++ and if the smug people who prattle on about DSLs at conferences aren't completely full of it, it should have been doable to implement the Thrift compiler in Python instead.  In fewer lines (consider that a challenge, Pythonistas of the world! :-)).

I'd be happy to hear YOUR opinion on RPC mechanisms, implementation of portable tools for DSLs etc.

3 comments:

  1. Whether having to set up Java and Maven correctly is a more excruciating excercise than installing Boost and building Thrift is really down to taste.

    Boost is available as a package in every OS I'm currently using, and Thrift builds cleanly with autotools. Yes, it takes time to install it from source, but come on.. And you don't have to go on that wild goose chase to find what your JAVA_HOME should be ;-)

    That being said, I agree that Thrift, through process of elimination, looks like the best alternative for open, compact, cross-language RPC. It's too bad that Google has not been able to get their act together and release their protobuf-based RPC system (I tried searching for it, the first page of results was full of tools of an entirely different purpose).

    I'm going to take a stab at using Thrift at Zedge. Our primary needs are PHP and Java support (including Android). Will let you know how it works out.

    ReplyDelete
  2. Indeed, I have seen no OS platform where Java is set up correctly from the get-go. (Note that I haven't used Solaris in year so it might actually be done correctly there). OSX is probably the platform that is closest to doing it right, but as with all other OS'es I have used, JAVA_HOME is not set and in my opinion Java is not properly installed until it works as intended out of the box.

    Maven is easy and quick to set up, but it does require that Java is properly installed.

    Learning how to *use* Maven is a whole other issue and indeed unnecessarily painful. Bug again, different discussion.

    On the Mac the best way for installing boost I have come across is to use the Darwin ports. This, of course, implies that you build Boost -- which is amazingly time-consuming.

    Thrift doesn't build cleanly on MacOS 10.5 btw. It requires you to manually copy the pkg.m4 file into aclocal. If you don't the build process fails in a rather spectacular manner and ends up firing up a headless X11 instance(!). Of course, ideally Thrift would be distributed as a patched port for OSX which compiles cleanly on all current versions of OSX.

    On a different note: I talked to some people yesterday and today, and people are really excited about Apache Avro. Last time I checked I wasn't really excited about it, but they released 1.3 recently, so it is worth a second look.

    ReplyDelete
  3. Your points on Java are well-stated. I administer an application that is built using Sun's application server (based on Tomcat), and runs on Sun Solaris using Sun's Java VM. It has a command-line tool, written in Java, to perform simple administration tasks. The actual program executes almost instantaneously, but initiating a Java VM and loading the relevant classes takes about 21 seconds on a quite powerful and capable machine:

    username@hostname:~$ time cptool list sessions
    Active sessions on hostname.domain.tld: 0
    Active sessions on hostname2.domain.tld: 0

    Total number of active sessions: 0


    User name
    ---------

    real 0m8.238s
    user 0m13.096s
    sys 0m0.485s

    ReplyDelete