Dave Girouard, general manager for enterprise search at Google, cautions that ESPs “are putting a bigger burden on the user. As long as the results show up in the first page, [users] don’t care what’s behind it. … We have the right relevancy algorithms. So, in terms of [too much] content, we’re saying, ‘Bring it on.’ ”
The Google appliance may save the day for enterprises with broken search technology: Just open up the repositories and rev up the Google engine. But Delphi Group’s Reynolds thinks that “IT should stop investing in generic search tools and start concentrating on their professional domains. At the same time, the business side should be more involved, to ensure that IT commits the resources to develop business-oriented applications of search.”
Andrew McKay, vice president of direct sales at FAST, agrees but adds that vendors “aren’t necessarily fighting over a percentage of the pie. It’s about making the pie dramatically larger,” as information stores expand exponentially.
It’s all in the pipeline
For years, businesses have been fighting to get searches of unstructured data -- information that resides outside enterprise applications and databases -- to achieve the kind of accuracy and precision expected with structured data. According to Delphi Group’s Reynolds, with ESPs, the search-indexing process for unstructured information is evolving into a pipeline of different search algorithms and advanced technologies. These allow for dynamic categorizations or targeted text analytics to take place within the processes that parse documents when they come into the search platform, and within the processes that evaluate queries and return relevant information.
A relatively new addition to the pipeline is entity extraction, in which a search engine dynamically extracts terms from indexed content on the fly through grammatical analysis. The process includes identifying proper nouns and creating a list of people, places, and things from a document and then inserting a new level of metadata into that document.
Click for larger view.
As for metadata, the old way of manually defining properties of a document is waning in favor of an intelligent search platform’s capability of autotagging based on users’ “custom logic,” according to FAST’s McKay.
ESPs can discover patterns in the content and enhance the value of that content within the search platform infrastructure by automatically creating metadata elements. Thanks to the exponential spread of XML across search environments, this metadata can then be used for a wide range of application processing, query enhancements, and presentation options.
Enhanced classification and taxonomy come into play by enabling users to browse information by subject area rather than relying solely on the blank search field and their capability of constructing an effective query. Dynamic classification capabilities can modify the presentation of subject areas based on the query’s context.