August 10, 2007

Google faces more than just a new rival in Wikia

Wikia's open source project will drastically reduce the cost of making a search engine, opening the door to potentially hundreds of new competitors

Google and other search engines face far more than just a new rival in Wikia, they face the prospect of hundreds, even thousands of new competitors.

The entire search engine project Wikia is working on will enter the open source domain, drastically reducing the cost for just about anyone to make a search engine, said Gil Penchina, CEO of Wikia. Instead of paying millions of dollars to index the Web, create the software to build a search page, a filter for empty or spam pages, and an algorithm to calculate and rank findings, new search companies will find these items free online thanks to the open source and free software communities.

"In search, it still costs about $5 million to $10 million to build a site," said Penchina during an interview in Taipei. "We want to make it possible for anyone to build a search site for $500. We don't view Google as the competition, we view cost as the competition."

The project, which was started by Wikipedia co-founder Jimmy Wales, consists of four components, the indexing of the Web, developing a search engine application, an algorithm, and using people to help filter sites and rank results.

One of the most expensive components of a search engine is the effort needed to index the Web. Companies have to buy servers and software to crawl the Web looking at what's on every page, in order to create a comprehensive list of what's on the Web.

"Your average search startup will spend over $1 million buying servers and collecting data. That's bad for a couple of reasons. One is that everyone spends millions of dollars doing what is essentially the same work, which is like writing an encyclopedia all over again. Well, what if all of that data was available over the GNU Free Documentation License, which is the free content license? So our goal is to make a crawl of the Web publicly available," said Penchina.

The cost of indexing the Web is one of the main hurdles to starting a search engine, and for-profit companies have raised the bar year after year by indexing the Web more and more often. It used to be catalogued once a week, or once a day. Now it's once an hour, or even more often. The high cost of running these crawls has become a competitive weapon.

Wikia believes its crawl of the Web will cost nearly nothing, because it's asking Internet users to help out by downloading Web crawling software from Grub, which will use their computers during idle time to crawl the Web, and send results back to Wikia for the index. So far, a thousand people have downloaded the application, and Penchina is hoping for 100,000 or more. The goal is to post the entire index online, as well as regular updates, so anyone can use them.

Asking the Internet community for help this way is reminiscent of the Search for Extraterrestrial Intelligence (SETI) project, which asks users to run a free application that downloads and analyzes radio telescope data and sends the results back to a computer operated by the SETI@home group.

Close

On Twitter now

Data management

Powered by Twitter

On Twitter now

additional resources
White Paper - How to Improve Delivery of Advanced Web Applications

White Paper

Virtual Workforce: The Key to Expanding The Business While Cutting Costs

Get the independent advice and expertise you need to support a virtual workforce.

Go inside:
The three-step approach to making a virtual workforce a reality.
The four flavors of client virtualization technologies.
The three key initiatives that solve IT challenges.
Download now »
White Paper: Successfully Secure Your Wireless LAN With Wi-Fi firewalls.

White Paper

Addressing Linux Threats Leveraging Fewer Resources

The increase in Linux popularity has increased the frequency and sophistication of malware attacks. Read this 2 page white paper now to learn how you can protect your Linux environment with real-time protection that is certified by all major Linux vendors.

Download now »
White Paper - The 2009 Handbook of Application Delivery

White Paper

The 2009 Handbook of Application Delivery

Ensuring acceptable application delivery will become even more difficult over the next few years. As a result, IT organizations need to ensure that the approach that they take to resolving the current application delivery challenges can scale to support the emerging challenges. This handbook elaborates on the key tasks associated with planning, optimization, management and control and provides decision criteria to help IT organizations choose appropriate solutions.

Download now »
White Paper - Is Your Backup System Outdated?

White Paper

Mid-range Storage Considerations

A common misconception is that mid-range storage requirements are dramatically different than that of a larger enterprise. Mid-range storage users may require less capacity, but they have similar functionality and management requirements. This ESG paper examines mid-range storage needs and reviews a new solution that adjusts size while retaining value, performance and functionality.

Download now »

Today's Headlines: First Look Newsletter

Find out what will be news for the day, with our first-thing-in-the-morning briefing.

©1994-2010 Infoworld, Inc.