Search takes smarts

High-level computer science drives enterprise search 

Google has spoiled us. We type in a few keywords and a screenful of Web site links magically appears, ripe for the clicking.

Of course, the Google engine searches a homogeneous environment, consisting primarily of Web sites and documents, which expose their information through standard interfaces. Not to belittle everyone’s favorite search engine — which has done more to make the Net accessible than anything since Marc Andreessen created the Mosaic browser — but enterprise-based search apps face a more daunting challenge.

For one thing, ESPs (enterprise search platforms) like those described in "Refining enterprise search" must deal with a bewildering array of nonstandardized sources, from unstructured documents to legacy databases. They also need to address questions of access, identity, and security, because not all members of the organization should have equal entrée to information. And all this has to take place inside the firewall, adding yet another layer of complexity.

Indeed, under the hood, enterprise-level search technology brings together some of today’s most advanced, cross-disciplinary computer science. “The technical nitty-gritty of search is remarkable,” explains Associate Editor Richard Gincel, who wrote our cover story. “All these separate disciplines come together in the indexing layer, which contains phenomenal algorithms, permissions, rules, and mechanisms for optimizing relevant results,” he notes.

To make all this search business work, the typical ESP draws heavily upon advances in computational linguistics and employs processes such as “entity extraction,” which helps to create metadata and make associations across indexed content. Then there’s natural language processing, autocategorization, a host of security and permissions technologies, as well as domain expertise for customizing ESPs to specific vertical industries and lines of business. According to Gincel, customization represents the future of enterprise search: “There’s a crusade among many search pioneers to banish the generic search model,” he says. “The consensus is that knowledge-driven and search-driven apps should be tuned to the professional domain.” According to Gincel, that future is not all that far away.

As regular readers of InfoWorld know, we believe that technology discussions should not be simply theoretical. So to see where the rubber meets the road, we set Test Center Contributing Editor Mike Heck loose on leading search appliances from Google (yes, they do enterprise search also) and Thunderstone (see "Google and Thunderstorm deliver plug and search to the enterprise"). After putting both devices through their paces, Heck came away impressed by their capabilities and hard-pressed to declare a winner. It seems all that under-the-hood computer science must be paying off.

Copyright © 2004 IDG Communications, Inc.

How to choose a low-code development platform