Anyone who has been transfixed by a gymnast or a figure skater knows that the magic happens when they perform flawlessly and
yet make it seem easy. That’s how a search should work: Enter a query, and the right results appear in simple, elegant fashion
-- even if it took countless hours of preparation to make the magic possible.
Yet most enterprise users still stumble as they try to extract data from multiple repositories, each with its own search engine.
Enterprises seem awash in a rising tide of structured and unstructured data. And even though users are often forced to tag
documents manually across various content management systems in hopes that those documents will be easier to retrieve, searches
still yield a surfeit of irrelevant, time-wasting results.
ESPs (enterprise search platforms) are on a mission to change all that. These new, comprehensive bundles of search and integration
technologies unlock information tucked away in data stores across the enterprise. The goal of ESPs is deceptively simple:
to take fairly simple queries and return the most relevant results possible, all in one place. But under the hood, ESPs aggregate
a host of emerging technologies such as autocategorization, entity extraction, and NLP (natural language processing). With
an ESP as a foundation, businesses can build customized search applications while automating the process of preparing documents
for archiving and indexing.
“The building blocks are converging so that you don’t have to cobble together all the pieces yourself,” observes Susan Feldman,
vice president of content technology research at IDC. These advanced search platforms establish sophisticated gateways to
silos of information -- even those with their own search engines. ESPs also provide a common set of data and search logic
that can be tuned on an application-by-application basis to improve the relevance of search results.
IBM last month came out swinging with its DB2 Information Integrator, code-named Masala, which contains an advanced search
engine designed to complement the company’s other heavy hitters in the content management arena, DB2 Content Manager and WebFountain.
With Masala, IBM joins the ranks of Autonomy, Convera, EasyAsk, Endeca, Fast Search & Transfer (FAST), iPhrase, and Verity,
each of which offers search-application platforms with a different mix of features.
Breaking down the walls
ESPs are transforming the way the enterprise conducts a federated search, the process by which a single query is passed to
multiple search engines and the user is presented with aggregated results. A federated search can augment searches of similar
data stores but loses traction when it runs up against external databases that require specific syntax.
Basic federated search, which has been in existence for years, “doesn’t protect the user from another kind of infoglut --
getting irrelevant results from multiple search engines instead of just one,” observes Hadley Reynolds, vice president and
director of research at Delphi Group. “Without some additional sense-making, it’s a blunt instrument.”
Compounding matters, enterprises typically have multiple search engines embedded in various applications -- for instance,
one in a content management system, one in the Microsoft Office environment, and another in an e-mail program. The ESP transcends
these search-engine silos and corresponding data repositories and imposes syntax translation and other linguistic manipulations,
such as spell-check and phrase detection, on the query prior to crawling the data stores.
At the indexing layer, the ESP aids the user by returning lists of improved query choices based on the context of the original,
sometimes vague, query. Take FAST’s ESP, which powers the public-facing Scirus.com. If you type the word “nuclear” in an effort
to retrieve published science-journal entries related to that topic, the keyword will reap more than 700,000 returns. A refined
keyword search selected from the list of suggestions on the right-hand side of the page -- “nuclear facility” -- whittles
that to approximately 1,000. Click once more, on “uranium enrichment,” and you’re down to about 10.