2010-03-30

C++ was a child of its time

I posted this as a comment to "Never trust a programmer who says he knows C++" and figured I'd post it on my blog as well.

There is a kind of symmetry to the incremental teaching of C++, from “C with classes” to the whole “modern C++”: it was invented that way and inflicted upon the world that way. Consequently C++ exists in all its intermediate forms in the wild. C++ apologists tend to not want to talk about what this really means.

As an experimental language it did have some value; in much the same manner that experiments with eating the wrong sort of mushroom has value — unless you are the test-subject. You learn what not to ingest. This frankensteinian mishmash should have stayed in the lab. Unfortunately, at the time this pathogen seeped under the door and out into the world, it was just too tempting to make use of it. It offered familiarity and performance.

We have better computers now. We also have a much broader palette of good languages that can more clearly express intent. We have also become slightly better at building systems that make use of several languages to allow us to solve problems with languages that are better suited for the problem domain at hand.

The generation of programmers that matured in the 00s are both more fortunate and far more aware of the rich fauna of programming languages and more pragmatic approaches than the generations that matured in the decades before. It is not uncommon to encounter developers who are proficient in at least half a dozen languages and who have some knowledge of at least a dozen or so proper programming languages.

Most proponents of “modern C++” are, I’m afraid, still on the wrong side of the valley. Much of what is claimed about how it is perfectly possible to write clear, performant and simple C++ does not take into account that while this may be true for smaller systems that have been implemented from scratch, it is somewhere between myth and fiction in the harsh reality of systems with legacy code or even modest integration requirements.

With C++ things invariably get ugly. Very ugly.

I have yet to meet a “C++ purist” who can look me in the eye and tell me that he or she has not had to resort compromise his or her code as wishful thinking gets its nose bloodied on harsh reality. I have met plenty of people who consider themselves competent C++ hackers who are, at best the software engineering equivalent of drunk drivers.

2010-03-27

One man's game is another man's crowdsourced computation.

I stumbled across an iPhone app today called "The travelling salesman". To those of you with a background in computer science:  yes, it is exactly what you think it is.  From the blurb on the page:
The Traveling Salesman is a new puzzle for the iPhone. The object of the game is simply to find the shortest route between a set of cities, visiting each city once and returning to the starting city. In game are presented cities from each of the 50 states. Each state has four levels of difficulty for 200 puzzles in total.
As I read the blurb it struck me that for any computation where you need to find a viable solution to NP-complete problems you could map the concrete problem you need to solve into a suitable on-line game, possibly disguising the problem somewhat, and have the faceless millions go at it mechanical turk-style, picking what is at any given time the best solution that has been discovered.

That is, if you succeed in making the game interesting enough and succeed in getting mass adoption.

I am sure I am not the first person to think of this and I will no doubt be bombarded with links of clever projects who have found ways to do this in a systematic manner to broad classes of problems.  Nothing would please me more.

If I am mistaken in this assumption I do hope that someone would explore the subject further and that this someone is not averse to the idea of actually writing code rather than just make the idea an object of dull academic masturbation.

A brisk stroll down memory lane.

As I was feeding the shredder with old bank statements and expense reports I spotted a receipt that brought back fond memories.

A few years ago I was in Courmayeur skiing(*). I suddenly found that I needed to be in California 24 hours later -- for some meeting I can no longer remember.  So I needed to get from Courmayeur to Geneva.  In a hurry.  I asked the receptionist if she could arrange for a taxi.  I jokingly said "get a competent driver".  She nodded.  I thought nothing of it.

The driver turned up a bit later.  A man in his mid 70s who only spoke Italian save for a few standard phrases in english.  The car was some french minivan piece of junk.  I glanced worriedly at my wrist watch.

- "We go now.  Please... the seat-belt."

I have spent some time driving various forms of machinery and have a fairly good grasp of what it takes to cane a car around a track.  The choice of lines, the choice of gear, how you downshift an un-synchronized gearbox while braking etc.

As we headed out of Courmayeur I noticed that the taxi driver was using heel-and-toe technique.  Which was curious as the french donkey-cart no doubt had a perfectly modern, well-synchronized gearbox.  As we picked up speed his choice of lines through the curvy alpine roads was impeccable.  He did not lift unless he had to, he used every inch of usable road, and despite there being patches of ice here and there, kept the car together and balanced.  Precise, flowing moves.

Somewhere after the Mont Blanc tunnel an M3 cut in front of us.  I heard a faint curse, he switched the front lights to high beam and placed the nose of the miserable french car almost in the back seat of the bavarian masterpiece of automotive engineering.  He was not happy.  This driver was not worthy.  He would have to be punished.

Now, ordinarily you would think that an M3 with....what 350 or so ponies under the hood would outrun the french shitbox easily.  Probably, but the guy in the M3 was a mediocre driver at best;  and the italian gentleman next to me seemed to think he was Tazio Nuvolari.

For all I know, he could have been Nuvolari.  He was certainly old.  And he was certainly a skilled driver.

After a few kilometers the M3 driver misjudged a sequence of corners and had to brake.  The taxi driver never took his foot off the throttle.  There wasn't much to floor, but he kept his foot down.  His left wheels were outside the white line on the opposite side of the road before turning in, collecting the apex and heading out again.  Minutes later we could no longer see the headlights of the beemer.  Gone.  Foot still planted to the floor.

The trip was supposed to take over an hour.  I can't remember exactly how long it took, but I can remember being shocked at how quickly we had gotten there.  As well as a bit disappointed that the brilliant display of driving skill had come to an end.

I paid the man €230.  As I handed him the money he saw my membership card for the local Alfa Romeo owners club in Norway.  He smiled.

- "Forza Alfa Romeo!"

I guess the receptionist at the hotel got me what I asked for.  A competent driver.  And he was indeed a very competent driver.



The best €230 I have ever spent in the name of scalable computing.


(*) When I say "skiing" I am exaggerating greatly.  I was learning how to snowboard.  And I wasn't doing well.  I repeatedly fell down the mountain whilst dressed up like a stuffed sausage.  But it was fun.  All 8 hours of it.  The following 3 days ... not so much.  I was in pain.

2010-03-25

Scalable

"Scalable" is probably the most misused word in the software industry right now.  The most misused word that actually has a reasonably clear meaning -- unlike "cloud" which isn't a useful term at all because people use it about anything and everything.

Having spent a decade worrying about "scalable" I can tell you that most of the things onto which this label is slapped is decidedly un-scalable.

For instance, the other day I saw someone claim that some general purpose language was "scalable".  What on earth does it mean that a language is scalable?

Sure, you could claim that purely functional languages make implementing concurrent systems easier, but the fact that a language helps you not shoot yourself in the foot when you write concurrent programs in it does not make it inherently "scalable".  And sure, you could limit the discussion to special purpose languages that are used to describe distributed computing problems (such as Pig or programming models such as MapReduce), but saying that, for instance Java, Ruby or Scala is "scalable" makes absolutely no sense. 

I've also see the word "scalable" being used about systems that can only run on one computer -- systems that cannot make use of additional nodes.  I am sorry, but if your design only works on a single computer, then your design is not scalable.   It may provide great performance within the theoretical limits of the fastest iron you can buy, but it isn't scalable.  Once you have exhausted the resources you have nowhere to go.

Scalable is when you have a near-linear or sub-linear relationship between problem size and cost.

Of course, the above is subject to practicality.  A solution may be near-linear within certain limits, but if the problem size never pushes you near these limits, what lies beyond is irrelevant in practice.  That being said: ask Microsoft, Google and Amazon what single component they would want to have vastly improved in their data centers, and there's a good chance the response you would get is "better switches".

2010-03-24

Musings on RPC mechanisms and DSLs

I've decided to try out Apache Thrift as an RPC-mechanism in a big project I am doing.  Thrift is currently an Apache Incubator project, but was originally developed at Facebook.

You might ask yourself why companies like Google and Facebook develop their own RPC when the world is overflowing with different flavors of RPC?  Well, the answer is easy:  they need something that works.  Most RPC flavors do not work for any meaningful definition of "works".  They are either too slow, too complex, too fragile, too cumbersome to use or too limited in which languages they work with.  Usually RPC mechanisms fit more than one category.

SOAP is in a special place because it is quite possibly the most ridiculous attempt at solving the problem to date.  It combines the horrible mess that is XML with the sort of wrongineering that is so prevalent among the majority of Java developers.  The sort of people Douglas Adams no doubt had in mind when he wrote about Vogons.  Indeed, if you read through the cruft that is a WSDL file you can't help but think that this is really just thinly disguised Vogon Poetry.

But I digress.

Now, Thrift would not be my first choice.  My first choice would be the RPC mechanism that Google uses internally.  Unfortunately they have not open sourced it yet.  If you have looked at Protocol Buffers you have seen part of the technology that is used there.  I periodically pester Chris DiBona about Google releasing their RPC mechanism as open source, but the problem seems to be that they don't have any takers.  Everyone at Google is terribly busy :-).

(If you work at Google and you want to do something about it, please talk to Chris.  He should be able to help you figure out what you need to do in order to release it as open source).

I chose Thrift because it is used in production at a large site (well, several large sites) and because it has support for a great number of languages.   The latter is important to me.  For an RPC mechanism to be meaningful, it needs to support at least Java, C++, Python, Ruby, PHP and C#.  It would be nice if they could get the Thrift libraries into the official Maven repositories soon too, but from what I gather this is happening Any Minute Now.

(There are open source projects for creating Protocol Buffer-based RPC mechanisms, but none of which gave me the confidence to choose them for an important project.)

The one thing I don't like about Thrift is that the compiler is written in C++ and it depends on Boost.  The fact that the compiler is written in C++ makes it awkward to integrate nicely into multi-platform builds since you end up needing binaries for each platform.   I'm on a Mac right now, and as we speak i am installing the Thrift compiler on my laptop.  But before I can do that I have to install Boost.

Boost alone is reason enough to become seriously grumpy about Facebook choosing to implement the Thrift compiler in C++.  It is a monstrosity and just to install it via the ports system takes forever.

I can't help but think they chose the wrong approach here.  It isn't like the Thrift compiler needs to be extremely fast.  It just needs to work, be fairly easy to extend and it should run "anywhere".

Which leads me to the next question: what should they have chosen?

For me it would be convenient if it was written in Java.  But that would require firing up a Java VM during builds which may not be palatable to those who aren't using Java.  A camp I find myself in from time to time.  One should be careful not to foist one's religion upon others.

I think this leaves us with Perl, Python and Ruby.

I am not sure Ruby has achieved enough penetration yet;  that is, I am not sure you can expect Ruby to be installed everywhere.  Also, for those who worry about these things:  Ruby is kinda falling out of fashion.  (You need to be hacking Erlang to be part of the in-crowd nowadays).

Universal availability suggests Perl -- but seriously: perl!?  You can write neat code in Perl, but almost nobody does.  Its bad reputation is somewhat undeserved, but it is true that things tend to get messy when the leading stars of the community tend to cultivate terse lack of readability when you can write perfectly readable code without much loss of performance.  So scratch Perl.

This leaves us with Python.  Mind you, I have never been a big fan of Python.  I am not sure why, but the language just doesn't fit my taste.

Still, I think Python is probably the right language for the job.  It is reasonably clean, it is old enough to probably be semi-universal and it should be doable to implement a compiler that can run anywhere Python runs without modification.  Not only that,  you could run it on the JVM as well using Jython, to ease integration with Java build systems.

Because we are talking about what should be a relatively small compiler.  We are talking about a program that takes text input, does a bit of pondering and then spits out text.  Even for a huge service definition the input would be rather modest and it doesn't matter if it takes 0.1 seconds or 1.0 seconds to perform code generation.  (Yes, I know it is a lot slower for typical web services code generators, but there you have to contend with an order of magnitude more complexity and fairly ratty implementations.  I would not aspire so low as to measure myself against the likes of Apache Axis).

sloccount puts the current implementation at 21740 lines of C++ and if the smug people who prattle on about DSLs at conferences aren't completely full of it, it should have been doable to implement the Thrift compiler in Python instead.  In fewer lines (consider that a challenge, Pythonistas of the world! :-)).

I'd be happy to hear YOUR opinion on RPC mechanisms, implementation of portable tools for DSLs etc.

2010-03-23

Library design.

Yesterday I was looking for a library to extract some data from a well-known website.  The website in question has a rich REST-based API that allows you to retrieve and edit your content on the site.  As with all good Web 2.0 services, there is a considerable flora of libraries for different languages to interact with the site.  (I have omitted naming the library in question since this isn't a critique of that library in particular, but musings over sloppiness in general)

However, after surveying the libraries available in the programming language I wanted to use I was somewhat disappointed.  Of the two libraries available, one isn't being maintained anymore and the other...well, the other isn't really a proper library as such.

The point of creating a client library for an API is twofold.

First and foremost it is a mechanism for just making the primitives of the API available to the programmer -- or in a more abstract sense, to map between languages.  The languages being, for instance, the REST API you want to use and the programming language you use to interact with the API.  For the most part this is just grunt-work and often requires very little thinking.  Indeed, for Web Services this is, in principle, just a matter of pointing a WSDL parser to some definitions and have it generate the code you need. (In practice this process is very fragile when it comes to Web Services, but that is a different discussion).

The second task of a client library is to hide ugly implementation detail, and if possible, introduce a convenient model for interacting with the low level API without having implementation details seep out between the cracks unnecessarily.

The library I was faced with went to great lengths in capturing the full breadth and depth of the underlying REST API.  Sadly it did not provide a meaningful model for interacting with the service.  Just by inspecting the API documentation for the library it was relatively cumbersome to divine how you would perform a given task.   It would only make sense if you first studied the underlying REST API and then imagined how a lazy programmer would map these concepts into a programming language.

But the worst aspect was the leakiness of the library.  Not only did it force you to bother with the idiosyncrasies of the underlying REST API, but the damn thing was leaking implementation detail as well.  In particular, at every turn there were cascades of exceptions being thrown that would not make any sense to an application programmer. 

For instance,  there is absolutely no point in throwing bare XML exceptions from the XML parser used by the library.  What is the application programmer supposed to do with them?  They convey no meaningful information at this abstraction level and when these exceptions do occur the only code in any real position to make sound decisions on what this means and how this should be handled is the library.

There is another issue of a more fundamental character.  By throwing these exceptions, generated by dependencies of the library, you force the user of the library to depend on these libraries as well.  Not a good programming practice.


If you create a client library, using it should save people work.  Not generate more work.

2010-03-21

Lost opportunities.

As Murdoch, and others, are trying to put the toothpaste back into the tube and get people to pay for content I am still somewhat surprised that it is 2010 and the content rights holders are still not displaying any appropriate sense of urgency in addressing the real issues that threaten to disrupt and kill their business empires.  Here is a short list of things to consider:
  • Regional licensing is the #1 problem for content owners.  Whenever consumers can legally obtain a particular piece of content in one geographic region but not another, you create demand without providing supply.  This invariably leads to consumer-driven piracy.  The content-consuming world is now largely global, but the content-owners still live in the past.
  • The fragmentation of available channels is a variation over the theme in the previous point.  Content-owners are being too picky about whom they are doing deals with leading to some content being available from just some vendors.  This is not congruent with consumer expectations.
  • Convenience is more important than price -- but only for as long as the consumer is conditioned to pay for content or feel bad about pirating content.   Due to DRM and the two previous points there are often fewer hoops to jump through if the consumer pirates content. The convenience factors to consider are:
    • Immediacy: consumers want immediate access to content.
    • Universal: consumers want to consume the content on the device of their choosing.
    • Reasonable terms: people do not want to be screwed over.  Reasonable pricing, reasonable terms of use.
It is hard to force consumers to do what you want.  The carrot is more powerful than the stick.  Besides a few carrots are cheaper than truckloads of sticks.

2010-03-19

Photoshop thoughts

When the next Photoshop release comes out, Adobe have the opportunity to really shoot themselves in the foot.  If they decide to make the upgrade path from CS3 unavailable or overly expensive, my bet is that if the next version has compelling features, more low-budget users will either just stick to CS3 or stop buying legitimate licenses and go pirate.

By now it should be relatively obvious even to Adobe top brass that the CS3 to CS4 upgrade just was not worth the money to a lot of users.  Given that I have very little faith in Adobe management in general, I think it is entirely within the realm of the possible that they'll dig themselves even deeper into the hole they are in by continuing to display utter ignorance of how the premises for their product has changed the last decade.

If I were the owner of a potential "Photoshop killer" I would aim for the next Photoshop release to pounce on Adobe.


(As for myself, I have a CS3 license.  I looked at CS4 and I even tried to upgrade it at some point, but since I bought CS3 while I was in the US there are untold hassles with the upgrade and Adobe's ultra-slow and ultra-annoying web store.  I gave up and now I am kinda glad I didn't waste my money on what was essentially a pointless upgrade.  If the next Photoshop version doesn't have really, really compelling new features, I am not upgrading.  CS3 does what it needs to do and the software is priced so exorbitantly that I cannot possibly justify shelling for a tool I only use occasionally)

Doing what I like.

For the last decade I've been in the lucky position of working for some of the more exciting companies on the planet.  I've worked for FAST, Overture, Yahoo and Google -- in that order.  Before that I had my own startup with 3 friends, which got acquired by FAST.

Last year I started working as an independent consultant.  After a decade of full-time positions where a paycheck arrives every month regardless of how inspired I feel or how productive I am, it was a bit scary to be "on my own" again.  Turns out I had no reason to be scared.  There is always plenty of work to do and there are always more interesting projects around than I can shake a stick at.

Over the past months I have worked with a large company to formulate a plan for how they should approach the future.  This work is starting to bear some fruits and by the looks of it, I will be working with a lot of exciting people on several exciting projects over the next 12 months.  It looks as if I will get to do what I am good at.  At least part of the time :-)

2010-03-09

Soundtracks for cars.

As new goodies are unveiled in Geneva I get a steady stream of links to videos of cars in my inbox.  The one thing that strikes me about a lot of these videos is how they are not targeted at anyone even remotely interested in cars.  They are touchy-feely, dull marketing wankery of the worst kind.

When Alfa Romeo launched the 8C they released a very nice video with some vintage footage followed by various gorgeous shots of the Alfa Romeo 8C driving along country roads.  There was just one problem.  A big one.  The soundtrack.  Dull muzak.

I mention this example because the one defining characteristic of the 8C is not so much its amazing looks, but the lovely sound it makes.  Anyone who has been in the presence of an 8C, or better yet, driven one, knows that the sound it makes is pure glory.  It growls and howls and the transition from one to the other, and back again, is just wonderful.  Even next to most Ferraris, the 8C has the better engine and exhaust sound.

While the 8C film released by Alfa Romeo a few years ago was probably the most grave example of automotive promotion incompetence, most manufacturers seem to hire equally clueless directors.  Why not get someone who actually cares about cars?

A couple of years later Alfa Romeo released another film.  This time for the MiTo GTA.  While there was a bit of engine noise at the start of the film, the rest of the thing had some of the tackiest rubbish music I've ever heard.  Why!?