February 2005 Archives

Steve Mallett

AddThis Social Bookmark Button

In the last couple of years one of the greatest software engineering projects has surfaced and become a household name. Google. One thing powers its greatness. Software.

The open source world prides itself rightly on its incredible successes. Apache Server, Linux, all email software worth mentioning, and recently Firefox. These are, have been, and will forever be marvelous feats.

What real technological competition have they been up against? Firefox vs the long abandoned IE. Apache against ISS. Linux vs Windows Server (Unix is technologically great, but cut its own throat to succeed en mass). Frankly, these successes have balanced more on putting out the word that they exist, disarming FUD, and the willingness of people to try something new.

Google. Its technological greatness is revered by all. Others like Yahoo are chasing it, but at best they’ll do nothing more than chase it. They have no real advantage over Google.

A search site takes a lot more than just bitchin’ software. There are a lot of costs. Bandwidth of crawling is the biggie, serving results, hardware, people.

Enter Nutch. Nutch is an open source search engine crawler, indexer, etc. The project appears to have been a bit dormant since its first media splash a few years ago, but has just recently become incubated with the Apache Software Foundation.

As I write this I have Nutch crawling a few sites just to test it out on my own. It’s the fifth of my tests. I’m increasing the search depth, and playing with a few of its knobs & buttons. The first few tests worked, but weren’t terribly compelling. Not that the Nutch site doesn’t give you the straight goods upfront. Their site says, “Nutch has not yet been tuned for quality. There are ten or twenty knobs that we can twiddle to adjust the ranking formula. We are developing software to do this tuning automatically, but the current code just contains guesses. With a little tuning we should be able to get results that are competitive with those of major search engines.”

Attract some more developers and I bet this happens sooner than later.

I think a commercial search engine based on Nutch could be a huge deal. Such an operation requires a ton of money for equipment and bandwidth so it would have to pay its own bills. However; the open source software component would give such an operation a scrappy little advantage. If open source can take on a truly great competitor, the operation would have the distinct advantage of better results and not the overhead of personnel like Google, Yahoo!, and their ilk have. The new search site would want to hire key people so they don’t have to worry about paying the rent and feeding the kids, but that’s a lot more talent available for less.

I think, and I’m really only guessing, that Nutch hasn’t prospered to where I would like it to be because of the costs of running the operation. To truly test the system you need a big index. You need to spend a lot of money crawling. To test it against Google anyway. How big? Well, the Internet Archive hosts “some work” of Nutch’s. They seem to have more bandwidth than the average bear.

Back to the main point… given the resources could an open software based search engine beat a great proprietary competitor. There’s only one way to find out, and what counts most of all is real results.

Steve Mallett

AddThis Social Bookmark Button

Related link: http://osdir.com/Article3992.phtml

It’s not everyday the retired CTO of World Bank wants to write an article for you. It’s not everday that someone challenges your thinking like he does either.

At first I had no idea who was really at the other end of this article. I ask for submissions for articles all the time. I also read more people who come up with their own financial reasons about why businesses should adopt Linux on their corporate desktops.

While reading through this lengthy and well thought out article I was almost instantly averse to it. I sensed that it was right, but something was nagging me: “Who does this guy think he is to make these assertions?” Ones like that it makes more sense financially to convert employees to OpenOffice from MS Office and stay on Windows than to change to Linux entirely. Or how about that linux advocates stupidly include linux on the desktop scenarios when they should be focusing on linux in the server room. That we are deluding ourselves as much as Microsoft’s Get the Facts campaign deludes the “facts”.

Hey! That’s not what I want to hear. That certainly is a hard truth.

Personally I’ve never concerned myself with adding up the numbers. They’re self-evident aren’t they? No licensing fees, no per-seat nonsense, upgrade at will, work on the cutting edge (or as cutting as you need).

Still, these arguments were sound. Damn, I hate bad news. CNN doesn’t attract ratings by saying, hey, maybe we don’t know what we’re talking about and neither do our readers. One paragraph, from yet to be published Part II rang with me, “Here (OSDir.com) people believe in Open Source in general, and Linux in particular. But a regimen of only agreeable points of view, while comfortable, may not be the best fodder for growth and improvement.”

Ok, I was hooked. But I had to find out who this really was who was writing me. All I knew was that his name was W. McDonald Buck. I wrote to him, “This first thing, I think, people will ask is how qualified are you to answer these questions or present this analysis. Do you feel qualified to answer this? How so?” In other words, are you ready to be attacked for being heretical? I also pointed out some arguments that I felt were contrary to what he wrote. He only wrote back soundly detailing how those arguments weren’t quite accurate. That was enough for me.

Later in the article, yet unpublished Part III, he mentions that he’s the ex CTO of a large, multi-national corporation. I was editing this piece having already given the green light so I wrote to him again asking who the mystery organization was. After all, the ex CTO of Enron isn’t exactly in our corner is he? He wrote back World Bank.

As I wrote back to him, I sat up a lot straighter in my seat when I read that. It’s one thing to say you’re a linux advocate but now its pretty obvious that indeed he writes some things that are hard to hear, but in the end are for our own good.

So, if you’re a linux advocate, or a CTO, you should tune into what “Dee” has to say. The news isn’t all bad. It’s just a good adjustment of what you’re used to hearing… or advising.