In 2000, computer researchers at Carnegie Mellon University started a project to fundamentally shift how search results are organized. The idea behind the approach, called clustering, was to find meaningful connections among Internet search results to speed and improve research.
During the next five years the resulting company, Vivísimo, commercialized the original clustering engine and extended its offerings to include meta (federated) search and its own search engine. Called Velocity, this technology threesome should have wide appeal. Academics, scientists, government analysts, market researchers, online publishers, and product managers in any industry will benefit from Velocity 4.2, as they all must search through and make sense of large, diverse data sources.
Velocity specifically includes three components: Vivísimo Clustering Engine, which automatically categorizes search results on the fly into meaningful hierarchical folders (it overlays any search or database query engine); Vivísimo Content Integrator, for simultaneously querying multiple content sources -- such as search engines and databases -- in one step and combining the retrieved information; and the Vivísimo Search Engine.
Enterprises will typically start with clustering and metasearch because most already have some type of search engine in place. I tested all three components, however, on an Intel-based server running Red Hat Enterprise Linux 3.0.
Velocity is an especially deep product, as is reflected in the number of options available from the UI. But the UI may confuse first-time users. For example, some menus are several layers deep and not always labeled intuitively. Vivísimo developers are working with usability experts to improve this shortcoming.
Still, in the more important performance areas, Velocity delivers. To evaluate clustering I connected Velocity to an existing Verity Ultraseek search engine. The process involves completing two Web forms, one that describes the XML output from the search engine and the second that indicates how to parse the results. Although this does require knowledge of your original search implementation, I had clustering running in approximately 30 minutes. Vivísimo has done an excellent job organizing results into clusters by intelligently using words and phrases contained in the original searches.
Although Velocity didn’t have a specific setup for Ultraseek, there are clustering templates for other common enterprise search engines, including the Google Search Appliance; these prepopulated forms should save administrators time and reduce setup errors.
Configuring Vivísimo’s search engine required little effort. I easily created a source by defining the starting URL of an intranet Web site. Then I selected a few other options, such as the maximum link depth. Again, clustered results were very precise; no fine-tuning was required.
Next, I used the built-in search to index documents on a file server and Microsoft SQL server database. Besides handling typical file formats (Microsoft Office, PDF, e-mail archives, and Zip archives), the search engine crawls sources that require authentication, such as a content management system. In the latter case the software correctly hid results from users not authorized to view them.
| Test Center Scorecard | |||||||
|---|---|---|---|---|---|---|---|
| 20% | 20% | 20% | 20% | 10% | 10% | ||
| Vivisimo Velocity 4.2 | 8 | 9 | 9 | 9 | 8 | 9 |
8.7
Very Good
|

Sign up to receive Data Management Resource Alerts