Skip to content

There have been many significant changes to the face of search over the last several years with engines becoming more intelligent than ever before. Today’s users expect mainly fast, easy, relevant and satisfactory search results. In response to this search engines have responded by giving users more control over search results than ever through the emergence of alternative search engines.

One instance of these so-called alternative search engines goes by the name of Nutch ([]). Nutch is a two-year-old open source project, which has been hosted previously at Soundforge and backed by a non-profit organization. Since then it has been determined that the Apache license is the most appropriate, with Nutch no longer requiring the overhead of an independent non-profit organization. The board of directors and the developers both were in favor of the move to the Apache Foundation.

Nutch builds on Lucene technology, which was developed under the watchful eye of Doug Cutting, the primary developer for both of these open source projects. Doug has been working in the field for almost two decades and has spent three years at Apple, four years at Excite as well as 5 years at Xerox PARC, so it is safe to say that Doug definitely knows his stuff. Lucene is suitable for nearly any application that requires full-text search, especially cross-platform. It is a full-featured, high-performance, text search engine library, coded entirely in Java to implement web search. Nutch is an application; you can download it and run it. It adds a crawler and other web-specific stuff to Lucene as well as it’s very own search algorithm and a link analysis module. Nutch aims to search the entire web like Google or Yahoo! but has a few tricks up its sleeve thanks to the beauty of open source licensing.

I recently had the privilege to interview Mel Strocen, the CEO of Jayde Online, Inc. (, one of the Web’s major online publication and search companies. Mel had some
very exciting news to report on how Jayde is planning to utilize the Nutch application.

Jayde has been developing a customized version of Nutch for the last eight months and is planning to launch a search engine based on the Nutch technology within the next few weeks. The initial beta version will consist of a network of dedicated servers with an index of between 20 and 30 million website listings.

The real potential of this new search engine, and others using the Nutch technology, lies in the fact that it is open source and uses a “Plug-In Architecture”. What this means is that the engine will be perpetually evolving and constantly improving to better facilitate the needs of searchers. One terrific example that shows us just how beneficial this type of open source plug-in technology can be is the FireFox web browser (

FireFox, in its short existence has eaten up a significant portion of the once all mighty Internet Explorer’s market share. The popularity of this browser is due to the fact that it is constantly making itself smarter. You can now find a plug-in for virtually anything that you require , ranging from web developer, downloading, and search tools to privacy, security, website integration and humorous plug-ins. You name it, there is an extension for it. The extension library consists of nearly six hundred different plug-ins and is growing daily thanks to the help of contributors everywhere.

Now just imagine implementing this type of plug-in technology to a search engine, with one type of plug-in for say searching MP3s and another plug-in for downloading PDFs. The possibilities of this new open source search technology are infinite. Now the term “open-source search engine” may make a lot of people’s minds wander towards the idea of Black Hat search engine optimization. The primary developer of Nutch, Doug Cutting, feels that the closed-source advantage is not nearly as much of a factor as one might imagine it to be. The fact that the search engine is open-source allows sp@mmers to be detected far faster than that of closed-source search engines latest sp@m detecting algorithms. Either way, you know that the sp@mmers will eventually figure out how it works, the only difference is how quickly. So the top anti-sp@m techniques, closed or open source, are those that continue to function even when their mechanism is known.

Another type of alternative search engine technology has just recently been released to beta version is “Relevancy Rank” from the Claria Corporation ([]), the minds behind Gator. I had the pleasure to conduct an interview with the Vice President and Executive Chief of Marketing, Scott Eagle. He had some very interesting things to say about the launch of this new product and what exactly the benefit of Relevancy Rank has to the user. This unique search technology takes the results from the top search engines and applies its very own algorithms to output to the user the most relevant results.

Relevancy Rank is a combination of personalization, localization, time spent at any one site, click through rates as well as conversions. These are all taken into account to provide the most relevant results. “For an example, if you happened to be a zoologist who loved to search for different animals and information relating to animals and you entered the word “Jaguar” you would be returned far different results from say a car enthusiast who searched frequently for different types of vehicles and also typed in the word “Jaguar””, noted Scott. Relevancy Rank helps to provide you with the most relevant results based on your previous search behavior.

With the end users expectations continuing to grow, these twists on the way that results are gathered and displayed are an enormous help in satisfying the user’s hunger to get to the results that they are looking for. I am quite anxious to see how these new forms of search technology fair out over the next several months. One thing is for sure, these new technologies are sure to revolutionize the way that web search is conducted and pave a new path for the evolution of search.

Source by Tyler Houston