2010-11-25

Microdesign and red flags.

I don't know a good name for the art/science/practice of thinking about and formulating code at a micro-level so a term I have used informally the past couple of years to talk about this is "microdesign".

If you go to conferences most presentations on design and architecture are on lofty topics.  Why service buses are a good/bad idea, why this set of patterns is good, why some semantically obese framework is better than other semantically obese frameworks etc.  There is precious little being said about the basics.  The practice of writing good, readable code.   This is, of course, unfortunate.  Because it is more important than people think and because the majority of programmers need to work on their basic programming skills.  Becoming a good programmer starts with attention to detail

In this posting I thought I'd say something about two "red flags":  things that should make you stop and think about what you are doing if you put them in your code.


Beware of "else" clauses.

The first red flag is the presence of an "else" clause in your conditionals.  I am not saying that you should never use "else" in your code, but when you do, you should stop and think about what you are doing, because most of the time you are about to write code that is unnecessarily complex.  Let's consider the following

public void someMethod() {
  if (condition1) {
    if (condition2) {
      // do something
    } else {
      // do something when condition1 is 
      // true and condition2 is false
    }
  } else {
    // do something when condition1 is
    // false
  }
}

Now, the above code is not an uncommon sight. Unfortunately, it is not very readable code. You often see this sort of code when people do things like trying to acquire some resources, like opening files. They key insights here are:
  • If both condition1 and condition2 are true, the "normal" program flow is already nested in two blocks.
  • If condition1 does not hold, the program flow continues far down, so to understand what happens if condition1 is false the reader has to skip down, taking care to match up braces visually to identify where program flow continues.
  • As the blocks grow in size or more conditionals are added, some of which may have "else" clauses and some of which may not, the whole thing becomes messier and harder to read.
So let's see how we can improve this:

public void someMethod() {
  if (! condition1) {
    // deal with condition1 being false and return
    return;
  }

  if (! condition2) {
    // deal with condition2 being false and return
    return;
  }

  // Invariant: condition1 and condition2 are true
  // continue normal program flow
}

Some people call the above "guard based" programming -- the idea being that the conditionals are "guards" if some condition does not hold and normal program flow cannot continue. Negative conditions are detected and dealt with as early as possible.

The main idea is that the method can be read top to bottom and it is easy for a given line number in the method to say exactly which invariants are in effect. When you have passed the first if, you know that condition1 must have been true. You also know that if condition1 is not true, the matter has been dealt with and the rest of the code need not concern itself with condition1.

The above method was relatively short, yet it was still hard to read in its original incarnation. Imagine what happens as more conditions are added and the various blocks acquire more logic to deal with both error conditions and normal program flow. Things rapidly grow out of hand and you will find yourself straining to match up curly braces to figure out which else clauses belong to which ifs.

In the 80s it was rather popular to enforce a single point of return in functions or methods. Some misguided programmers still try to enforce this practice. Having a single point of return does not lead to clearer code -- quite often it leads to the opposite. In particular if you need to allocate additional state on the stack to navigate your way to the return statement.


Boolean return values on non-predicate methods.

Another red flag is boolean return values for non-predicate methods.  A predicate method is a method which simply tells you whether something is true or not and which has no side-effects -- ie. it does not alter the object.

It is quite common for some programmers to return boolean values from methods that have side-effects.  Sometimes this is perfectly okay, but in some cases this is indicative of the programmer not having thought through the API properly.  Let's say that we have a method that persists an object into some kind of storage:

public boolean addEntry(SomeEntity entity) {
  // assign an id to entity and store it.
}

Now if the above method call succeeds we return true and everyone is happy. But what if the operation fails? We get a false return value and then what? We know it didn't succeed, but we have no way of knowing why. Of course, we can log the error condition, but that effectively disconnects this information from any piece of code using this method; it becomes programatically complex to determine why the operation failed and whether it is something that we should, or can, deal with in our code.

Merely logging an error does not amount to providing error handling.

This is a very common pattern for, for instance, when people open files. In code you often see these "semantic bottlenecks" -- where information is discarded and the caller is left with no information if the operation fails. There are examples of APIs in the wild that do this -- for instance there are certain file system interfaces that are unable to report whether an operation failed because of an IO error or because of missing permissions to perform some operation. The root cause being APIs that have been designed to naively discard vital information.

There are several strategies for dealing with the above. You could throw an exception (which opens up a whole can of worms with regard to the checked/unchecked exception debate), or you can have a return type that can communicate error conditions. The latter can be an attractive option if the method plays a role in remote procedure calls or any scenario where you are dealing with asynchronous invocation.

I am not going to recommend any given strategy for addressing the above problem. The main point I am trying to make is that when you see this in code you need to stop and think about what you are doing. It is a red flag.



Updates:
  • Some people pointed out that guard-based programming doesn't mesh well with resource acquisition and subsequent release -- for instance in C++.   Well, this is what happens when you do not pay attention to your field:  you shouldn't be doing this manually anyway.  If you are doing this manually without making use of RAII-techniques then please don't waste my time commenting.  Update your programming knowledge first and then perhaps we can talk. Ok?
  • People still defending "Single Entry, Single Return" as valid dogma is a bit...odd in 2010.  This stuff was considered a good idea nearly 40 years ago when "structured programming" was hot stuff.  We are now in the post-OOP era and if you are using any language with exceptions, you have multiple returns all over the place anyway and there is not one thing you can do about it.

2010-11-13

Java, MySQL, Oracle and Apple

Oracle and Apple recently announced the OpenJDK project for OSX.  While this looks promising on the surface, but it doesn't really change anything: Apple won't be distributing Java as part of OSX in the future.  It will be a separate download from Oracle, so you won't have Java on OSX unless you explicitly decide to install it.

This is fine for people like me.  In fact, it is more than fine because the default install of Java is usually broken on most operating systems.  (On Linux you risk having GCJ or similar garbage pre-installed and then tangled into the broken and inflexible "alternatives" mechanism in Debian or Ubuntu.  To add insult to injury, JAVA_HOME is usually not set.  On any OS. Java is not properly installed until JAVA_HOME is set.  Always and for every user.  Interestingly, on OSX you have a configuration panel to choose your Java version.  It doesn't work properly.  It never has.  So you have to edit symlinks manually.  But I digress).

As with any piece of software from Oracle, this means going to the website, agreeing to a license and then downloading it manually.  This is a pain in the neck.  Some open source projects have unwisely chosen to depend on certain libraries that are only available in this manner -- which means you can't automate dependency management properly:  you'll have to manually download dependencies and then decide how you want to incorporate them into your projects.  This is very "developer hostile".

To summarize: Java is going to be a second class citizen on OSX,  but we may have reason to expect easier access to current versions of Java on OSX.  Good news for developers who write server side Java software -- bad news for developers who want to write desktop applications for OSX users.

Oracle and "premium".

Much noise has been made over the fact that Oracle will be offering premium versions of MySQL and the JVM.  Strictly speaking, this changes nothing in the short term.  What has been free all along will stay free -- modulo some uncertainty over the InnoDB engine.  But in essence:  this doesn't change anything so there is no reason to panic.

Strangely enough, when I pointed out that what Oracle said about a premium JVM was basically the same thing they have been communicating about MySQL some people interpreted this in weird ways.  I was pointing out the similarity -- not that what they were doing was inherently evil.

If Oracle wishes to offer "premium" versions of free software then that doesn't automatically imply that the free versions will disappear.  No reason to panic.

However...

Oracle has been very weak on messaging.  As a company they have no real grasp on how to deal with a developer community and they still have a lot of learning to do.  There are also some stories circulating that make people like me wary of licensing software from Oracle.  Building something with an inherent dependency on Oracle software borders on the irresponsible.

In addition there is the Google lawsuit that casts a shadow over the whole Java community.  Oracle may be able to extort some pennies from Google, but in doing so will sacrifice a good chunk of goodwill and developer trust.

The Java technologies are now fairly toxic IPR-wise.   People feel betrayed.   And rightly so.  Android may be the last big contribution to the Java community. (Let us remind ourselves:  Java and mobile pre-Android was a laughable matter.  JavaME has been around forever and has been a chaotic, fragmented, frustrating, irrelevant, and under-powered affair.  And Android is now the seccond biggest smartphone platform and overall a major operating system).

Nobody in their right mind would tie themselves to the mast of Larry Ellison's boat when creating a new major software ecosystem (say, for the "web of things" or machine to machine applications).

Mobile computing is the most aggressively growing market, and Ellison is willing to poison the one really big thing Oracle has going for them in this area for some short term gains.  I'd be reluctant to make any bets on technologies owned by a company that isn't in it for the long haul.  And Oracle's short term greed leaves us with the question if they even have any credible long term plans.

2010-11-05

The Wrong Stuff

Imagine for a moment that Tim Berners-Lee had been a complete moron.  Imagine he had taken the single most important innovation of the web, namely the URL,  and split in two parts.  One part being the server address

someserver.example.com

and the other part being the path.

/my/file/here.html

Now imagine that the server part would have been part of the browser configuration -- so that the web browser knows about a small but configurable set of web servers.  Web page adresses would simply be paths, which when combined with the browser configuration would form what we today know as an URL.

But of course, unlike the URL it would only exist in the browser after combining the browser configuration and the page address (the path) in question.

Next, imagine that any path given to the browser is blindly probed by the browser until it hits a server that actually has the page you refer to. So for the page /my/page.html you would have a probing sequence of this type:

someserver.example.com/my/page.html
otherserver.example.com/my/page.html
thatserver.example.com/my/page.html

Not the brightest resolution scheme there is, right?  In light of how the web we know and love works, this seems like a pretty stupid idea today.  What if /my/page.html exists in mutiple of these servers?  What if there were millions of servers and we would have to wait for the browser to resolve the page by asking about half of them before scoring a hit?

(Don't laugh yet; At the time I set up my first web server there were about 50 well known web servers in the world so this isn't that far fetched.  If you account for the existence of unimaginative people, that is.)

To mitigate some of the pain of downloading large pages you introduce a cache locally on your machine.  A caching layer that doesn't really differentiate between pages that you have stuffed into the cache locally and pages you have downloaded from one of the servers.  Or, in practice, differentiate between which pages were downloaded from which server.

You might think that nobody would be stupid enough to come up with a lame scheme like that.  Especially not in a post-web world.

Well, you can stop laughing. Someone has come up with a scheme like that for a similar problem.  In post-web times. It is called "Apache Maven" and it is used for dealing with software artifacts.   So the next time you come across some developer stating that Maven is a great solution for sorting out software dependencies you have my permission to point and laugh. Even if it is usually socially unacceptable to poke fun at the intellectually challenged.

Maven was well intended, but designed and implemented by people who were The Wrong Stuff.

2010-11-01

On Facebook hiring ex-Googlers

I've read several blog postings the past few days criticizing Facebook for poaching employees from Google.  There's the usual speculation about the motives of the defecting Googlers.  Pure greed for modestly priced pre-IPO stock being the most obvious motives cited.

Some people have also pointed out that it is sort of pointless to quit Google to start working for Facebook when you will find yourself in a company with pretty much the same people.

For Facebook I think the motives are as obvious as they are unexciting: Google has high hiring standards.  The hiring process itself has been the cause of much debate, but it is hard to deny that Google have high standards.  If you hire someone who was deemed good enough to work for Google, chances are that they'll be a relatively safe hire.

As for the ex-Googlers: I doubt that the roles and teams at Facebook will be mirror images of the roles and teams they were in at Google and thus it isn't like they will be doing the same job in the same team just with a different logo over the door.

I think the Google engineering culture has a lot to offer to other companies.  Google did a lot of things right and it is only healthy that good practices and ways of doing things are spread to new companies.  It will do little to line Schmidt's pockets, but it will be good for the industry.  Of course, as in any company, Google got a lot of things wrong as well, but this too is valuable knowledge.

As for myself, I am a bit curious if Facebook will ever come here (Trondheim), now that Google has left town, Microsoft have moved in and Yahoo are still here. Most of the brilliant people I've enjoyed working with for the past decade are still here -- but sadly they have been dispersed across too many companies and activities.

And if Facebook or Amazon are not planning on opening new offices here: it might be time to gather some of the troops to find new and exciting adventures ;-).