Without an enterprise search engine that can locate the report or its underlying data, a user has few ways of getting the information. This situation leads to undesirable results: the employee forgoes the search or expends significant effort culling the data from other sources and re-creating report. In the latter case, this results in duplication of effort and the risk that two reports that purport to present the same data have differing figures. Even when users can find reports, the documents will often lack the desired data. And because the reports are template-driven, users cannot easily modify the reports to provide different data.
Regulatory compliance is also driving the need for BI search. Compliance officers need to be able to search through CRM databases and e-mail stores, for example, looking for dangerous phrases such as "We guarantee" or "I shouldn't be telling you… ."
How BI search is different
One way that search makes it easier to extend access to BI is that users already know how to employ it due to their familiarality with Web-based search engines. With little training, users can be shown how to use additional options, similar to those in the Advanced Search features on the Web engines.
What happens behind the scenes in an enterprise search, however, varies significantly from the operation of Web search engines. Most Web queries today target unstructured data, such as HTML, PowerPoint presentations, and PDF files. Because these resources have a document-orientation, the engine can make intelligent decisions about the meaning and the relevance of the data. (Web pages even have specific tags to facilitate this process).
Structured data, by contrast, does not generally provide this contextual information. Open a database and read a column of figures called "part" and you have very little knowledge of what that number refers to (part number, cost, inventory, location, among other things). As Baya points out, "this problem will eventually be solved by use of metadata; and this is already happening via the support for XML in databases. But with regard to the vast majority of structured data today, there is no easy solution."
BI software solves this problem in part by the use of templates and the definition of data relationships by trained analysts. Because of this, many search enterprise engines today -- such as Google and X1 -- hand off searches of structured data to the BI software and then federate (that is, combine) the results with items from their own search index.
Unstructured data has its own challenges. The first is pure volume. As Mark Andrews, program director of IBM's Information Management Strategy points out, a typical business user will deal with 70 e-mails per work day (including receiving and sending). In a company of 25,000 employees, that's nearly half a billion e-mails per year that must stored (for compliance purposes) and made searchable. Add to this all the other documents (HTML, word processing, spreadsheets, and presentations) and you have a tremendous capacity issue that translates itself into another challenge. With many searches turning up thousands of results, how do you rank the results for relevance?
As Matthew Glotzbach, head of products at Google Enterprise, observes, "Unlike Web searches, you don't typically have spamming sites that are trying to fool your algorithms, but you also don't have a large set of usage data to guide you." Google doesn't reveal its algorithms, but it does try to establish the "authoritativeness" of specific entries.