The public will get its first chance Monday to test a search engine from start-up Powerset that eschews conventional keyword technology and instead is designed to understand the meaning of Web pages.
As such, Powerset's search engine holds the promise of fundamentally changing people's expectations for search engines by, in theory, offering a smarter, more efficient experience.
However, Powerset's beta version, while delivering impressive results, has a limited scope and index, leaving unanswered questions about its ability to work its magic at the massive scale of Google's keyword-based search engine.
[ See related video of Eric Knorr's visit to startup Powerset and a demo of its search engine. ]
"We're changing the way information is searched by doing a much deeper analysis of the pages we index," said Scott Prevost, Powerset's product director.
Keyword engines treat pages as word bags, indexing their content without grasping its meaning, he said. Meanwhile, Powerset's engine, applying technology developed in-house as well as licensed from Xerox's PARC subsidiary, creates a semantic representation by parsing each sentence and extracting its meaning. "Meaning is what we index," he said.
In an interview in October with IDG News Service, Marissa Mayer, Google's vice president of Search Products & User Experience, acknowledged that the company's search engine should -- and will -- overcome its keyword dependence in time.
"People should be able to ask questions and we should understand their meaning, or they should be able to talk about things at a conceptual level. We see a lot of concept-based questions -- not about what words will appear on the page but more like 'what is this about?' A lot of people will turn to things like the semantic Web as a possible answer to that," she said.
But she added that Google's search engine acts smart thanks to the humongous amount of data it crunches. "With a lot of data, you ultimately see things that seem intelligent even though they're done through brute force," she said. As examples, she cited a query like "GM," which the engine interprets as "General Motors" but if the query is "GM foods," it delivers results for "genetically modified foods." "Because we're processing so much data, we have a lot of context around things like acronyms. Suddenly, the search engine seems smart, like it achieved that semantic understanding, but it hasn't really," she said.
For now, Powerset's index is very limited, consisting only of millions of pages from Wikipedia and Metaweb Technologies' Freebase, a Web-based structured database of information. However, Prevost vows that the index will begin growing within a month after its launch and eventually rival in size those of Google, Yahoo and others. "Our technology fully scales," he said.
This whitepaper explains the terminology and concepts behind Data Replication technologies and establishes some sizing rules through worked examples. Learn the new paradigm in disaster tolerance—protect data anywhere.
Download now »Server virtualization is a popular option for dealing with mounting datacenter costs. Another equally promising approach is the use of an Application Delivery Controller. Citrix NetScaler provides a low-cost way for organizations to reduce their server count and accrue cost savings from a reduction in space, cooling, power and personnel.
Download now »
The emergence of WLANs has created a new breed of security threats to enterprise networks.
Included in HP ProCurve WLAN solutions is security technology that alleviates threats from WLANs through:
* Monitoring wireless activity inside and out of the enterprise
* Classifying WLAN transmissions into harmful and harmless
* Preventing transmissions that pose a security threat to the enterprise network
* Locating participating devices for physical remediation
Effectively address data protection challenges, implementing solutions that help store and protect businesscritical data while cutting costs and improving efficiency and reliability.
Download now »
Sign up to receive Data Management Resource Alerts
