2014-07-08

Ignorance and uncertainty.

Years ago I was involved in building a web scale search engine.  At the time this was the sort of undertaking that wasn't really well understood by a lot of people.  Information retrieval had existed for many years and was well enough understood, but almost nobody had any practical experience applying it at a massive scale.

And back in those days: 100 million documents was massive.  Computers were dog slow, we didn't have a lot of them and there wasn't a lot of open source software to help you do things at scale.  Not like today, where searching 100 million documents on a device that fits in your pocket is easily within practical reach.

What was hard back then is easy now.  Well, easier, at least.

But when I think back at those times it isn't really the technology that interests me.  By today's standards what we did was pretty crude,  but it did represent some of the best work that was done up until that point. Ever.  Anywhere in the world.  This is increasingly rare today:  the opportunity to work on something very few people have figured out how to do really well.

What interests me are two things.

The first is: what it takes to do something much better than anyone else.  I think the biggest advantage we had was that we had no idea what we were doing.  Which meant that we were not anchored by whatever other people did.  We truly had to explore and invent ourselves.

If you spend too much time trying to solve a given problem using whatever approach other people are following, you will be held back by the same limitations that they are going up against.  Worse yet, you will be tempted to not spend a sufficient amount of time trying to understand the problem you are solving.  Understanding the problem or problems you need to solve is always where you should start. This is why you should always try to spend a few weeks thinking about a given problem yourself.

This is why I avoid paying too much attention to what people who came before me did to approach a problem.  I might inform myself of their general approach, but I try to develop my own before devoting time to understanding theirs.  This doesn't always yield results, but if inspiration or deep understanding does strike, I will have an advantage.  Once I feel that I am starting to understand the problem and I know how I want to attack it I start to peek at what other people have done.

This is applicable to a surprising breadth of fields.

The other thing that interests me is the continuum from ignorance and uncertainty to cocksure certainty. While we were building a web scale search engine there was the engineering effort on one side and the sales people on the other side.

On the engineering side the basic question "can we do it?" wasn't a given.  Whether it was solving some problem of a deeply theoretical nature, turning ideas into working implementation or just delivering a product within an acceptable set of parameters (time, cost, quality etc.).   The sales side, as seen from the engineering side, seemed to take for granted that we had an infinite supply of magic hats from which we could pull rabbits at the last moment.  Theirs was a world that was about promises and at least the pretense of certainty.  Ours was one of sleep deprivation and raw panic.

I used to say that the salespeople sold our customers technology that we hadn't yet imagined we didn't know how to realize.

I liked to compare much of what we were doing to when Grumman built the Apollo Lunar Module. Nobody had built a lunar lander before.  Which meant that there was no known right way to do it.  Nor was there any meaningful way to estimate how long it would take or how much it would cost -- or indeed if it was at all possible.

In particular I got a facefull of this when we indexed one of our very first indices and performed searches against it.  The results were atrocious.  The results were so full of duplicates that for some searches you would get just pages and pages of identical results.  This was a wednesday.  I was asked to look into the problem.  The initial guess was that we should have a workable solution within a week. On sunday I had to call my boss and say "listen, the duplication problem is much harder than we had feared -- it'll take a bit longer".  Turns out there are several classes of duplication on the web.  And sorting them out is properly hard.

Note that it wasn't just about being able to do it -- it was about being able to do it in a practical manner. And most published efforts up to that point assumed small document sets and lots of time within which to do the deduplication.   Which meant that a "solution" that has quadratic (or worse) complexity just doesn't work.

We eventually came up with a battery of solutions.  All of them much faster than the feared quadratic solutions.  Many of them completely novel.

However, in retrospect, the interesting bit is the dynamic where people who like to deal with certainty and hard promises need to deal with uncertainty and ignorance.  For most complex technology projects, delays, disappointments and setbacks are inevitable because there are huge unknowns.  And more so for the projects that are really worth doing.

On one hand, hiding uncertainty behind a wall is dangerous because it makes our approach to hard problems fragile.  On the other hand, the pressure it creates can be valuable in spurring on discovery and progress -- by assuming "you will be able to figure this out".

No comments:

Post a Comment