2009-10-14

"Discovered"

Andrew Wiles "discovered" the proof for Fermats last theorem much in the same way I "discovered" two matching socks in my sock-drawer this morning. I can assure you that neither undertaking is by any means trivial and the use of the word "discovered" seems to suggest an almost casual coincidence rather than laborious process.

Andrew Wiles did not discover the proof. The guy dug it out of granite with his bare hands by an act of sheer stubborn will.

2009-10-08

Private clouds.

During the past year there has been much excitement in the press about cloud computing and the prospect of being able to buy computing resources as they are needed. For some companies the benefit is very obvious -- in particular in scenarios where there is an intermittent
need for resources or in scenarios where one needs to be prepared for rapid growth without having to commit to huge capital expeinditures up front.

A lot less has been said about "private clouds" -- having an elastic, self-service computing infrastructure in place for use within an organization. Personally, I find this scenario much more exciting because it is a reality in many large companies that computing resources are scarce to individual teams or parts of the organization while on a whole, the hardware they own is often under-utilized.

Acquisition of hardware is usually an arduous task as well. Not to mention how many large organizations struggle with bureaucratic IT departments that seem to have nothing better to do than to present obstacles (and conversely, annoying users that break the system administrators' neat and tidy systems with outrageous requests).

This is not a new problem. About 15 years ago a friend of mine worked for a large company that bought servers by the truckload and did a lot of number-crunching. Yet the average utilization of CPUs across the company was less than 10%. (Which is quite terrible given that most of the software they ran was CPU-bound). Meaning they were wasting most of their capacity. His findings was that this was mostly due to the fact that the computers were "owned" by various departments within the company and they would guard their treasured computers jealously
-- not allowing anyone outside the department access to them while they were not being used.

Of course, for a lot of uses, it makes sense to have dedicated servers and it doesn't matter if they are not running at capacity. Some applications require isolation, predictability and a defined level of security.

But in some organizations you have a lot of computing jobs that do not impose very stringent requirements. For instance, if you need to mine a large dataset for knowledge, that is usually not something which is extremely time-critical (as in "when" rather than "how long"). You just want to chew through the data as fast as possible. If you want to perform periodical system tests that require a number of servers, that usually isn't extremely time-critical either. There are also scenarios where you need to store lots of data in a cost-effective yet secure manner, but where random access times are less important than, say, throughput.

Now let's say you have an allowance of X dollars to solve some computation task. For instance a recurring data mining task. It is likely that you would be able to finish your mining job much quicker on a large set of computers that you "lease" for a short amount of time than performing the same amount of work on a much smaller set of dedicated resources that X dollars will buy you.

So, from a utilization point of view there are definitive benefits to a private cloud.

There are several companies today that maintan their own "private clouds" where various projects/groups/departments can "buy" the resources they need as they need them. Variation in loads on production services as well as the intermittent nature of ad-hoc jobs,
testing etc usually means there are spare resources that can be used for running lower priority jobs.

In a previous life I used to work for such a company. Resource utilization was of course always something that was paid close attention to, but the variations in demand for resources meant that there were always some spare resources that were available to me if I needed to do some testing or needed to run ad-hoc jobs. In fact, I was rarely unable to find spare resources to run test-instances of the systems I worked on -- some of them being quite large. (And when I say "quite large", I mean "quite large").


Challenges.

The main challenge in adopting a "private cloud" model is that it requires a considerable education effort. Merely purchasing a datacenter full of machines, and possibly virtualizing the hell out of them doth not a cloud make, yet many decision makers are entertaining such thoughts. (Worse yet, there are companies that claim to do cloud computing when in reality they are just serving up the same old gunk only this time on virtualized hardware...)

A lot has happened in certain parts of the computer and Internet industry over the past decade, and we are only beginning to see the technologies employed by these companies become available to a bigger audience. Projects like Hadoop provide an open source version of
several technologies originally pioneered by Google. Technologies that make operating at scale possible. There is a plethora of databases not based on the relational model emerging. Both commercial and open source virtualization technologies are available.

Still, adoption in the mainstream industry has been somewhat slow. It takes time for architects and developers to understand more modern approaches and it takes even longer for managers and executives to understand the implications and to build confidence.

You are more likely to find these types of technologies in young companies that have no choice but to be innovative, cost-efficient and forward-thinking.

It takes more of an effort to teach old dogs new tricks.

In much of the computer industry, people still think in terms of traditional components: relational databases, ordinary file systems, application servers etc. All neatly deployed on cutely nicknamed, dedicated machines. Additionally, when looking at how companies deploy
applications there is usually a lot of static configuration. It is not uncommon that merely moving a system from one set of machines to another involves weeks of hunting down concrete references to IP-addresses, hostnames, ports etc to get everything up and running
again.

For a cloud approach to make sense, systems have to be engineered to work in a dynamic environment. One could say that the time has come to "walk the walk" and deliver on the promises made by architects talking about a Service Orienter Architecture and similar ideas to
which lip service often is paid, but which rarely materialize in practice.

Services need to be able to be far more agnostic of the underlying physical infrastructure - it shouldn't matter to a service which node it runs on and where its peers run within the same data center. Also, in order to address availability, stability, and performance services
need to be designed in a manner that allows for redundancy as well as incremental scalability.

Incremental scalability is hard. For some components, like relational databases it is immensely hard (as well as expensive) once you start dealing with enormous amounts of data and/or very high transaction rates.

You usually end up at some point where throwing money at the problem doesn't really help -- and throwing a lot of money at the problem only has marginal benefits. It is not unlike trying to accelerate a ship past its hull speed -- possible, but vastly inefficient, bordering on insane.


Fortunately, with the emergence of companies like Cloudera and the availability of open source implementations of scalable technologies, there is some hope that some of the challenges posed by the knowledge gap can eventually be overcome.

2009-10-07

iPod Touch suggestions

I often listen to audio books on my iPod Touch in bed. This means I sometimes fall asleep while listening and the iPod just keeps on playing. Figuring out where I should start listening again the day after is a bit of a hassle as most audio books are divided into big chunks. Usually I have to do some back and forth scrubbing to figure out what I can remember listening to.

What if a feature was added so that you can tap the display to drop a bookmark in the audio book?

The feature could also be configured so that if you do not do this for some configurable number of minutes, the iPod just stops playing back (so you won't run out of batteries if you fall asleep).

It would be very helpful if you could see the bookmarks along the timeline and if you could have some annotations on them (though there should be an easy way to just add a bookmark without having to mess with annotating). Annotations should automatically be marked with a timestamp saying when you added the bookmark etc.

The iPod does have some support for remembering where you left off playback, but for some reason this feature is flakey. Sometimes the iPod just forgets where you were. I haven't paid enough attention to what causes this to happen so I can't really say, but it happens frequently.

Oh, and while you are at it, Apple, please make it possible to turn off cover-flow mode in the music player app. It is incredibly annoying to use the iPod when lying down and it keeps going into cover-flow. Yes, I know that you think it looks very cool, but it isn't really usable when you have as much content on your iPod as I have.

2009-10-05

What I like about TV

I've finally found a redeeming feature of television.

Hilde is abroad this week. Meaning I am home alone and thus: I eat alone. Now I easily get bored while eating. I am in that inbetween generation. The generation that gets a headache from the offensive filth that is the MySpace aesthetic, yet grasps what is cool about twitter on some unconscious level.

This means I get terribly bored while tending to my bodily needs.

To alleviate the boredom of eating alone I sometimes switch on The Altar Of Consumerism And All Things Cheap, Shallow And Dull. Or the "television set" as the pudding-munching, planet-wrecking generation would say.

It doesn't really matter what is on. It is all more or less dreadful. Including the weather-forecast which is downright depressing if you live where I live.

While watching TV today it dawned on me what the TV reminds me of. It reminds me of those annoying people you sometimes fail to spot in time at parties so you can take evasive action, hide in the bathroom and avoid having to listen to their inane babblings about whatever they managed to dig out of their navel this morning. The sort of people who pin you down for what feels like a quantum of time best measured in the sort of time units plate-tectonics nerds use.

Which brings us back to the one redeeming feature TVs have: the off-switch. Tedious people unfortunately do not.