If you are trying to figure out what a program is doing, chances are that you would like to look at the log files or the log output emitted on the console.  Depending on your luck this may give you the answers you seek ... or hair-related problems (discoloration, loss of etc).  Most programmers add logging code to their programs in some form -- from just printing messages on the console to adding and configuring logging libraries and logging messages that way.

Unfortunately, logging is by a lot of programmers, considered a sort of by-product of the development process.  Log messages are often added to aid the programmer while developing and then unceremoniously forgotten or ignored.  Decent logging often isn't an explicit requirement in many projects and thus programmers allow themselves to be sloppy about it.

Why is it important?

Being responsible for running software that was authored by other people you quickly learn how important log messages can be.  Especially if you do not have access to the source code.  The logs are the only real means by which a piece of software can communicate what it is trying to accomplish, and possibly, how it is failing (except for error messages and traces displayed as part of the UI).  Not knowing what a failing piece of software is doing leads to guesswork, and being forced to guess what is going on is rarely a good way to solve problems.  Knowing is preferable.

What to log?

The first rule of logging is to ask yourself if the information you are logging is needed.  Some programs that have some logging facility are inordinately chatty.  While logging too much is usually preferable to logging too little, the main problem with logging too much is that you drown the user in irrelevant information.

For every log message you add you should ask yourself if the log message is stricly needed.  Does it tell the user anything important about the state, or state transitions, of the program?   Is the program trying to access some external resource that it would be useful to know about?
 Most things that are part of the main program flow usually do not mandate logging.  If you are importing a file of records into a database, you do not need to log every record you are adding.  Though it might be a good idea to say which file you are reading and what database you are importing the records into.  Depending on how the log messages are consumed (appended to file, displayed on console etc) you may want to provide the user with some progress feedback for long-running tasks.  Possibly a summary after the task has completed with some pertinent information, such as the number of records added.

Of course, any exceptional conditions that occur need to be logged.

Debug output can be nice to have, but if you include messages that only make sense to the developer and which are largely superflous to normal operation, you should ensure that these messages are logged at a lower log level so they can be filtered out.


If you are taking the trouble of emitting a log message, you might as well do it properly.  Be sure to include all the critical information needed to understand what is going on.  If you get a parse error in a file, include the name of the file and possibly the line number or offset into the file.  If you are trying to access an external resource, like a web server, include the URL or the hostname etc.

If you need to track the lifecycle of some object, then try your best to be consistent and track at least creation and death of the objects you track so that it is possible to analyze the logs.  I once worked on a project where we needed to figure out why the system was losing a lot of data and nobody really knew where the problem was.  Since the system had incomplete logging just analyzing the logs didn't work.  Sometimes the developer logged when a resource it was handling was introduced and sometimes not.  The number of conditions where a resource could disappear was fairly huge, and few of them were logged in any meaningful way.   Unfortunately, the system didn't have just one problem, so we spent months trying to figure out what was going wrong.


Human readable log formats are mostly a good thing, but that doesn't mean that you should design log messages that only make sense for humans.  When designing a log format you are, in effect, designing a protocol and all the considerations you take when designing a protocol also apply to log messages.  You should take care to design log messages so that they lend themselves to easy-to-implement parsers.  If you have little experience with writing parsers then I suggest you sit down and write a parser for the log output of your program right now.

The most important part of a log format is to get the "framing" right -- that is:  it should be trivial to determine the boundaries between log messages.  Using linefeed to signal the end of a log message is a trivial way of framing log messages (if you have ensured that log messages will never contain an un-escaped linefeed).  Expecting the user to parse XML output is not.  XML parsers are fragile, slow, clumsy and not much fun to work with.  Especially not if you have terabytes of log messages to chew through.  Also ensure that the contents of log messages do not confuse the parser -- for instance that it doesn't get de-railed if you embed a log message within a log message.

Many log formatters burn CPU cycles on pointless formatting jobs.  A good example is formatting dates.  If you produce terabytes of log messages and most of the log lines are consumed by other programs rather than humans there is no point in formatting the date in a human readable format only to burn even more CPU parsing those dates.  Also, chances are you are not smart enough to pick a consistent time-zone for those dates and timestamps.

If you really need human-readable formats then provide a simple log filter program that you can pipe log messages through so that people can display log messages in whatever way they prefer.

If you use a line-oriented format you should also take care to choose the order of fields properly.  This can help immensely when working with log files using the UNIX command line tools.  In a company I used to work for the first field of any log message was the log level.  This meant that merging and sorting log files could not be done with trivial use of the sort program because it would jumble the log lines out of time sequence.  A thoughtless design decision that cost lots of time working around it -- quite possibly motivated by the original authors faulty understanding of how one might want to work with the log output (I suspect he formatted the log messages for screen use only).

A bad example.

A piece of software worth studying is the Maven build system.  This widely used piece of software pretty much breaks all the rules in the book and is a good example of horribly sloppy logging design.  It is overly chatty, which means that the information you really need is buried in heaps and heaps of unimportant information.  The formatting is very sloppy -- being designed only with a human reader in mind.  This means that writing a parser for the Maven output is a relatively daunting task -- most people wouldn't even bother trying.

It also illustrates what happens when you are not disciplined from the outset:  people take cues from those who went before them and emulate what they do.  So as new plugins are added to Maven, the new plugins tend to be just as bad.

No comments:

Post a Comment