2011-05-07

Failed library design

In the past week I've spent a considerable amount of time being angry and frustrated with what I view as inept API and library design.

The point of having libraries is to make a given problem domain more accessible, by creating meaningful abstractions that insulate the programmer from some of the underlying complexity, to help the programmer produce correct code and to reduce the workload on the programmer.  

It is important to note that the latter is more about time and effort than the number of lines of code necessary.  It isn't about the amount of typing that has to be done.  This also extends to the documentation and whether it is necessary to spend days reading up on something just to use parts of a library.  If you have to spend a considerable amount of time to understand the design and implementation of a library just to use it in a simple context, the library designer has failed.

Failed library design is counterproductive and uncool.

In the past couple of weeks I have tried to make use of various SAML-related libraries and I have to say that I am thoroughly unimpressed.  In fact, I am angry.  I haven't seen such massive amounts of inept library design and sloppy programming since the last time I had to deal with ... well, certain XML technologies.

It is for good reason I become wary when people tell me they have written infrastructure code that deals with XML because it would seem that this particular niche in programming attracts people who simply are no good at what they do.  Before hiring people with extensive XML-experience on their CV I need to see two things:  I need to see code they have written which they think represents their best effort, and I need to talk to them to understand if they are prone to pointlessly complex library design.

If what I see is along the lines of what you can often find in certain XML libraries, then they can't expect to be hired.  I'm sorry, but I firmly believe that the job of a programmer is to shun complexity and create simplicity wherever possible and I have no desire to work with people who are either too lazy or too dumb to at least aspire to usefulness.

An example.

So let's have a look at just a single line of code (admittedly with a line break inserted by me to make it fit into the blog) from the sort of library I despise.  This is the recommended, idiomatic way you are supposed to create an Assertion.  Or rather, create a builder that can eventually be used to create an assertion:
SAMLObjectBuilder<Assertion> builder
= (SAMLObjectBuilder) builderFactory.getBuilder(Assertion.DEFAULT_ELEMENT_NAME);
The above code tells you a lot about what's in store for you if you should be so foolish to waste time trying to use the OpenSAML library.

So, the programmer wishes to create an Assertion.

In order to do this he first has to get hold of a builderFactory (not shown in the above code).  This in itself is sloppy.  Why do we need a builder factory?  Under what circumstances would an ordinary user need to even be aware of there existing multiple sets of builders for any given type?  This is plumbing sticking out of the walls for no good reason.

Next the programmer has to get a builder for the assertion.  The first thing I find interesting is that the designer fails to abstract away the underlying complexity.  The programmer has to know the element name of the Assertion.  Why?  Isn't the point of having a library to help the user deal with the problem domain in a more abstract manner?   The fact that the Assertion class defines a handy constant for this does not excuse this.  This is a level of detail that the user should not have to worry about.  Besides, you will note that the constant holding the DEFAULT_ELEMENT_NAME has "default" in its name so it isn't like the designer didn't have an intuitive sense that there was an expected default.  A good designer would have thought about this and concluded that if the default element name is what will be used in 99% of the circumstances, then the programmer should not have to type this in.  Ever.

The second interesting thing is this whole SAMLObjectBuilder affair.   To get one you have to hand-crank the creation process with casting and the assigning it to a genericized SAMLObjectBuilder type.  Why?   Would it not have been better to just have an Assertion.Builder type that you simply instantiate and then use?  Why would you first want to have a factory, and then design it so badly you have to cast the results it produces?  In fact there is a AssertionBuilder that extends an AbstractSAMLObjectBuilder that extends an AbstractXMLObjectBuilder which is defined god knows where,  but instantiating it directly isn't the intended way to do it, and if you start to mess with those types then it is going to take you a while, and a few screenfuls of windows with various Java source files, to make any sense of the whole giant mess. There's a lot of structure for structure's sake.

Also, why would you call something a builder when it is in fact a factory?  While some literature may be overly vague about what a builder is, good practice is to use builders as an instantiation helper that ensures the produced object is in the desired state.   Often producing an immutable object -- sometimes to insulate the programmer from ardous initialization.  Assertion has both getters and setters and the builder, near as I can tell, is just used for object creation with the inane buildObject() method.  In other words, the whole builder thing is just another bit of plumbing sticking out of the walls.

If you want a good example of builders you can have a look at the API of code generated by Google Protobuffer compiler.  Here's a typical example of what object construction looks when you use protobuffers:
Timber.LogEvent logEvent = Timber.LogEvent.newBuilder()
.setTimestamp(rec.getMillis())
.setLevel(rec.getLevel().intValue())
.setHost(hostName)
.setServiceName(serviceName)
.setTid(rec.getThreadID())

.setType("T")
.addPayload(
  Timber.Payload.newBuilder()
  .setContentType("text/plain")
  .setPayload(ByteString.copyFromUtf8(rec.getMessage())))
.build();
In the above code snippet we create a log event builder, set some values, add a nested payload and then construct the needed objects. Understanding how to use this is dead simple.  All generated types have a newBuilder() that produces a correct builder without any silly fluff.  All builders have setters for their values and adders for nested collections.  Setters are chainable.  The build() method validates and produces a finished instance.  Done.  All this takes you about 5 minutes to learn and is so simple you won't have to read the source or the javadoc again for months or years.

It should be noted that we have still only investigated a SINGLE line of idiomatic OpenSAML use, and already we have trodden in a miserable swamp of shoddy design and had to tour the source for answers (which is like peeling an onion:  layer upon layer with little or no substance in any one layer and a lot of crying) -- even before actually looking at how you would accomplish anything of use with this library.

Believe me, it gets a lot worse when you do.

You may ask why I am singling out OpenSAML for criticism.  Well, why not?  I had to pick an example and it might as well be OpenSAML because I have been trying to use it lately -- and have come to the conclusion that it is actually easier, faster and safer to just read the SAML specs and write things from scratch.  It is a pain in the ass, but considerably less risky than trusting heaps of code too messy to read.  Granted, the SAML specs are a dull read and I still have to deal with heaps of terribly designed XML infrastructure, but it beats wasting even more time on an unhelpful library that fails to hide the details anyway.

Besides, it will mean that I do not force those who come after me to take on the design debt of OpenSAML as well.

I am also using OpenSAML to make a point.  It must be okay to point out bad design.  It may not be nice, but then again, wasting my time is also not nice. If we avoid naming and shaming bad design, people will not learn.

Now, if you have worked on OpenSAML I would imagine that you are feeling defensive right now.  Perhaps a bit angry.  You may think that I am unfair because you just did what everyone else did when designing and writing the code. (The idioms found in OpenSAML sure are plentiful in other XML code, though that doesn't make them "good").

If you feel like responding (please don't.  I am not interested in hearing a defense for OpenSAML.  There really is no point in even trying to defend it): before you respond, I think you should sit down and ask yourself if you are really sure that OpenSAML is helpful to anyone but those who are prepared to a) spend a lot of time understanding the library, or b) are utterly comfortable with cutting and pasting code they barely understand to put something together that apparently works.

Then you should ask yourself if what OpenSAML does is really so complicated that the complex, unfriendly, messy API is warranted. (I can give you a hint:  it isn't).

If you think that people should have to spend a lot of time understanding OpenSAML you are wrong and you should probably never write library code ever again.

No comments:

Post a Comment