"Scalable" is probably the most misused word in the software industry right now.  The most misused word that actually has a reasonably clear meaning -- unlike "cloud" which isn't a useful term at all because people use it about anything and everything.

Having spent a decade worrying about "scalable" I can tell you that most of the things onto which this label is slapped is decidedly un-scalable.

For instance, the other day I saw someone claim that some general purpose language was "scalable".  What on earth does it mean that a language is scalable?

Sure, you could claim that purely functional languages make implementing concurrent systems easier, but the fact that a language helps you not shoot yourself in the foot when you write concurrent programs in it does not make it inherently "scalable".  And sure, you could limit the discussion to special purpose languages that are used to describe distributed computing problems (such as Pig or programming models such as MapReduce), but saying that, for instance Java, Ruby or Scala is "scalable" makes absolutely no sense. 

I've also see the word "scalable" being used about systems that can only run on one computer -- systems that cannot make use of additional nodes.  I am sorry, but if your design only works on a single computer, then your design is not scalable.   It may provide great performance within the theoretical limits of the fastest iron you can buy, but it isn't scalable.  Once you have exhausted the resources you have nowhere to go.

Scalable is when you have a near-linear or sub-linear relationship between problem size and cost.

Of course, the above is subject to practicality.  A solution may be near-linear within certain limits, but if the problem size never pushes you near these limits, what lies beyond is irrelevant in practice.  That being said: ask Microsoft, Google and Amazon what single component they would want to have vastly improved in their data centers, and there's a good chance the response you would get is "better switches".

1 comment: