Exclusive: IBM enters enterprise search fray
WebSphere Information Integrator finds almost everything -- but slowly
Instead, OmniFind weights its searches with data such as how often a keyword appears in the page, whether it appears in the title or subheads, and how often it appears in anchor text. OmniFind also uses a dynamic mechanism that tracks how often previous searches on a specific keyword have resulted in clicks to a particular page. So, as more searches are performed, the quality of the ranking improves significantly.
Users have limited access to the ranking mechanism: They can specify links that must show up first for a given keyword, but they can't do much more to tweak rankings. This could prove a limitation for companies that have considerable material for a given keyword and want to make specific documents more salient. Indexing can also be administered so that reindexing can be scheduled when systems will be least affected.
The results display shows a broad capability of selecting and choosing search items. A keyword search is the base level. A user, however, can ask for specific records or data items using an SQL-like query language. If the results derive from a database, they are shown in complete field detail.
OmniFind's security currently is coarse-grained. The display mechanism checks authorization levels before displaying data to make sure an employee is entitled to see a given result. Unfortunately, OmniFind lacks document-level security. Moreover, no mechanism exists to support an LDAP directory to automate access to an employee's credentials, although this feature is forthcoming.
OmniFind is an impressive tool in terms of the sheer volume of data that it can federate. It is clearly designed for enterprise use and scales to handle huge amounts of data.
I was surprised, however, by some limitations. For example, the crawler doesn't open .zip or .tar files. Help files, which frequently contain a wealth of searchable information, are also skipped.
Performance was hard to assess. IBM claims a minimum of 30 dps (documents per second) for crawling and the same rate for indexing, with bursts of 100 dps. My experience was that these numbers were aggressive: Indexing is gated by disk I/O and, in the demo I saw, it wasn't near 30 dps. The test system was not set up to simulate a true crawl -- as all the documents were local -- so crawl performance was more difficult to ascertain.
OmniFind is, for the most part, an easy-to-run, configurable, scalable, and intelligent enterprise search engine. However, the lack of document-level security, the absence of LDAP support, and the ignored file types suggest OmniFind's first release needs some tweaks.