2010-11-25

Microdesign and red flags.

I don't know a good name for the art/science/practice of thinking about and formulating code at a micro-level so a term I have used informally the past couple of years to talk about this is "microdesign".

If you go to conferences most presentations on design and architecture are on lofty topics.  Why service buses are a good/bad idea, why this set of patterns is good, why some semantically obese framework is better than other semantically obese frameworks etc.  There is precious little being said about the basics.  The practice of writing good, readable code.   This is, of course, unfortunate.  Because it is more important than people think and because the majority of programmers need to work on their basic programming skills.  Becoming a good programmer starts with attention to detail

In this posting I thought I'd say something about two "red flags":  things that should make you stop and think about what you are doing if you put them in your code.


Beware of "else" clauses.

The first red flag is the presence of an "else" clause in your conditionals.  I am not saying that you should never use "else" in your code, but when you do, you should stop and think about what you are doing, because most of the time you are about to write code that is unnecessarily complex.  Let's consider the following

public void someMethod() {
  if (condition1) {
    if (condition2) {
      // do something
    } else {
      // do something when condition1 is 
      // true and condition2 is false
    }
  } else {
    // do something when condition1 is
    // false
  }
}

Now, the above code is not an uncommon sight. Unfortunately, it is not very readable code. You often see this sort of code when people do things like trying to acquire some resources, like opening files. They key insights here are:
  • If both condition1 and condition2 are true, the "normal" program flow is already nested in two blocks.
  • If condition1 does not hold, the program flow continues far down, so to understand what happens if condition1 is false the reader has to skip down, taking care to match up braces visually to identify where program flow continues.
  • As the blocks grow in size or more conditionals are added, some of which may have "else" clauses and some of which may not, the whole thing becomes messier and harder to read.
So let's see how we can improve this:

public void someMethod() {
  if (! condition1) {
    // deal with condition1 being false and return
    return;
  }

  if (! condition2) {
    // deal with condition2 being false and return
    return;
  }

  // Invariant: condition1 and condition2 are true
  // continue normal program flow
}

Some people call the above "guard based" programming -- the idea being that the conditionals are "guards" if some condition does not hold and normal program flow cannot continue. Negative conditions are detected and dealt with as early as possible.

The main idea is that the method can be read top to bottom and it is easy for a given line number in the method to say exactly which invariants are in effect. When you have passed the first if, you know that condition1 must have been true. You also know that if condition1 is not true, the matter has been dealt with and the rest of the code need not concern itself with condition1.

The above method was relatively short, yet it was still hard to read in its original incarnation. Imagine what happens as more conditions are added and the various blocks acquire more logic to deal with both error conditions and normal program flow. Things rapidly grow out of hand and you will find yourself straining to match up curly braces to figure out which else clauses belong to which ifs.

In the 80s it was rather popular to enforce a single point of return in functions or methods. Some misguided programmers still try to enforce this practice. Having a single point of return does not lead to clearer code -- quite often it leads to the opposite. In particular if you need to allocate additional state on the stack to navigate your way to the return statement.


Boolean return values on non-predicate methods.

Another red flag is boolean return values for non-predicate methods.  A predicate method is a method which simply tells you whether something is true or not and which has no side-effects -- ie. it does not alter the object.

It is quite common for some programmers to return boolean values from methods that have side-effects.  Sometimes this is perfectly okay, but in some cases this is indicative of the programmer not having thought through the API properly.  Let's say that we have a method that persists an object into some kind of storage:

public boolean addEntry(SomeEntity entity) {
  // assign an id to entity and store it.
}

Now if the above method call succeeds we return true and everyone is happy. But what if the operation fails? We get a false return value and then what? We know it didn't succeed, but we have no way of knowing why. Of course, we can log the error condition, but that effectively disconnects this information from any piece of code using this method; it becomes programatically complex to determine why the operation failed and whether it is something that we should, or can, deal with in our code.

Merely logging an error does not amount to providing error handling.

This is a very common pattern for, for instance, when people open files. In code you often see these "semantic bottlenecks" -- where information is discarded and the caller is left with no information if the operation fails. There are examples of APIs in the wild that do this -- for instance there are certain file system interfaces that are unable to report whether an operation failed because of an IO error or because of missing permissions to perform some operation. The root cause being APIs that have been designed to naively discard vital information.

There are several strategies for dealing with the above. You could throw an exception (which opens up a whole can of worms with regard to the checked/unchecked exception debate), or you can have a return type that can communicate error conditions. The latter can be an attractive option if the method plays a role in remote procedure calls or any scenario where you are dealing with asynchronous invocation.

I am not going to recommend any given strategy for addressing the above problem. The main point I am trying to make is that when you see this in code you need to stop and think about what you are doing. It is a red flag.



Updates:
  • Some people pointed out that guard-based programming doesn't mesh well with resource acquisition and subsequent release -- for instance in C++.   Well, this is what happens when you do not pay attention to your field:  you shouldn't be doing this manually anyway.  If you are doing this manually without making use of RAII-techniques then please don't waste my time commenting.  Update your programming knowledge first and then perhaps we can talk. Ok?
  • People still defending "Single Entry, Single Return" as valid dogma is a bit...odd in 2010.  This stuff was considered a good idea nearly 40 years ago when "structured programming" was hot stuff.  We are now in the post-OOP era and if you are using any language with exceptions, you have multiple returns all over the place anyway and there is not one thing you can do about it.

3 comments:

  1. Mhhh... what about single point of exit?

    Something like:
    someMethod() {
    var result;
    if(! condtion1) {
    } else if (! condition2) {
    } else {
    //Both true
    }
    return ....;
    }

    ReplyDelete
  2. juancn: I don't think that piece of code was very readable. I had to read it twice to be sure that I understood it correctly and even then I wasn't sure if I should read it a third time, just to make sure.

    ReplyDelete
  3. I completely agree. Safeguards are the only sane way to go about it. With safeguards, it's also much easier to refactor the code and split out parts of it into separate methods. It also works much better when throwing exceptions instead of just returning from the method.

    ReplyDelete