June 04, 2008

Jimmy Wales unimpressed with Powerset's Wikipedia search

Wikipedia founder says he's not yet sold on semantic search technology in general

As startups and established players strive to develop Google-killer search technology, conceptual differences in their approaches make for interesting discussion.

Case in point: Jimmy Wales' Wikia Search and rival Powerset, which is using Wikipedia, the free online encyclopedia project Wales founded, to argue that its semantic engine represents the future of search.

During an interview this week, Wales was asked for his opinion about Powerset, and he declared himself unimpressed, at least so far.

For starters, finding content on Wikipedia is quite easy, he said. "I find that search at Wikipedia works perfectly fine." Plus, indexing Wikipedia content doesn't pose a major challenge, Wales said.

When Powerset unveiled a test version of its much-awaited semantic search engine last month, Wikipedia played a major role in the marketing push as one of only two Web sites included in the index.

While acknowledging that the scope of its index was extremely limited, Powerset executives said that the engine's ability to -- in their view -- improve Wikipedia search reflects what it will do later for Web search in general.

Wales isn't convinced.

"It's really hard to judge right now [the quality of Powerset's engine] because searching Wikipedia is a pretty easy thing to do. It doesn't present much of a challenge. Wikipedia isn't a very large data set, and it's a pretty simple thing to do, to index Wikipedia," Wales said. "So whether their approach is going to be useful on a bigger data set [is hard to tell]."

It will be interesting to see how Powerset's technology evolves, said Wales, but he added that he isn't sold on semantic search technology in general. "I haven't been very persuaded so far by what I've seen about the semantic approach. At least so far, I'm not that interested in it," he said.

According to Powerset, its users do find value in its Wikipedia search. Because Powerset can "understand" the pages it indexes, it can do more than return the proverbial 10 blue links for search results. For example, it can assemble a collection of facts related to the query, as well as summarize the found information. It can also provide direct answers to factual questions.

"Our early users tell us that Powerset’s automatic extraction and aggregation of key facts about topics is extremely useful, since that information is often strewn across many different pages in Wikipedia," said Scott Prevost, Powerset’s general manager and director of product, via e-mail.

"We've also received a lot of positive feedback about our automatically derived summaries of each Wikipedia page, which helps users to scan a page’s content and easily navigate to relevant parts of the text," he added.

Prevost also defended the promise of semantic search, which is designed to extract meaning from the Web pages it crawls, as opposed to focusing on keywords, which is the approach of all major search engines, including Google's.

Close

On Twitter now

Data management

Powered by Twitter

On Twitter now

White Paper

D2D Virtual Tape Library Replication Primer

This whitepaper explains the terminology and concepts behind Data Replication technologies and establishes some sizing rules through worked examples. Learn the new paradigm in disaster tolerance—protect data anywhere.

Download now »

White Paper

An Alternative to Virtualization for Datacenter Cost Savings

Server virtualization is a popular option for dealing with mounting datacenter costs. Another equally promising approach is the use of an Application Delivery Controller. Citrix NetScaler provides a low-cost way for organizations to reduce their server count and accrue cost savings from a reduction in space, cooling, power and personnel.

Download now »

White Paper

Why Your Firewall, VPN, and IEEE 802.11i Aren't Enough to Protect Your Network

The emergence of WLANs has created a new breed of security threats to enterprise networks.

Included in HP ProCurve WLAN solutions is security technology that alleviates threats from WLANs through:
* Monitoring wireless activity inside and out of the enterprise
* Classifying WLAN transmissions into harmful and harmless
* Preventing transmissions that pose a security threat to the enterprise network
* Locating participating devices for physical remediation

Download now »

White Paper

Bringing the Edge to the Data Center

Effectively address data protection challenges, implementing solutions that help store and protect business–critical data while cutting costs and improving efficiency and reliability.

Download now »

Sign up to receive Data Management Resource Alerts

Subscribe to the Today's Headlines: First Look Newsletter

Find out what will be news for the day, with our first-thing-in-the-morning briefing.

©1994-2009 Infoworld, Inc.