May 23, 2005

Vivísimo Velocity brings structure to enterprise search

Search platform clusters data into useful, relevant categories

In 2000, computer researchers at Carnegie Mellon University started a project to fundamentally shift how search results are organized. The idea behind the approach, called clustering, was to find meaningful connections among Internet search results to speed and improve research.

During the next five years the resulting company, Vivísimo, commercialized the original clustering engine and extended its offerings to include meta (federated) search and its own search engine. Called Velocity, this technology threesome should have wide appeal. Academics, scientists, government analysts, market researchers, online publishers, and product managers in any industry will benefit from Velocity 4.2, as they all must search through and make sense of large, diverse data sources.

Velocity specifically includes three components: Vivísimo Clustering Engine, which automatically categorizes search results on the fly into meaningful hierarchical folders (it overlays any search or database query engine); Vivísimo Content Integrator, for simultaneously querying multiple content sources -- such as search engines and databases -- in one step and combining the retrieved information; and the Vivísimo Search Engine.

Enterprises will typically start with clustering and metasearch because most already have some type of search engine in place. I tested all three components, however, on an Intel-based server running Red Hat Enterprise Linux 3.0.

Velocity is an especially deep product, as is reflected in the number of options available from the UI. But the UI may confuse first-time users. For example, some menus are several layers deep and not always labeled intuitively. Vivísimo developers are working with usability experts to improve this shortcoming.

Still, in the more important performance areas, Velocity delivers. To evaluate clustering I connected Velocity to an existing Verity Ultraseek search engine. The process involves completing two Web forms, one that describes the XML output from the search engine and the second that indicates how to parse the results. Although this does require knowledge of your original search implementation, I had clustering running in approximately 30 minutes. Vivísimo has done an excellent job organizing results into clusters by intelligently using words and phrases contained in the original searches.

Although Velocity didn’t have a specific setup for Ultraseek, there are clustering templates for other common enterprise search engines, including the Google Search Appliance; these prepopulated forms should save administrators time and reduce setup errors.

Configuring Vivísimo’s search engine required little effort. I easily created a source by defining the starting URL of an intranet Web site. Then I selected a few other options, such as the maximum link depth. Again, clustered results were very precise; no fine-tuning was required.

Next, I used the built-in search to index documents on a file server and Microsoft SQL server database. Besides handling typical file formats (Microsoft Office, PDF, e-mail archives, and Zip archives), the search engine crawls sources that require authentication, such as a content management system. In the latter case the software correctly hid results from users not authorized to view them.

Test Center Scorecard
20%20%20%20%10%10%
Vivisimo Velocity 4.2899989
8.7
Very Good
Close

On Twitter now

Data management

Powered by Twitter

On Twitter now

White Paper

D2D Virtual Tape Library Replication Primer

This whitepaper explains the terminology and concepts behind Data Replication technologies and establishes some sizing rules through worked examples. Learn the new paradigm in disaster tolerance—protect data anywhere.

Download now »

White Paper

An Alternative to Virtualization for Datacenter Cost Savings

Server virtualization is a popular option for dealing with mounting datacenter costs. Another equally promising approach is the use of an Application Delivery Controller. Citrix NetScaler provides a low-cost way for organizations to reduce their server count and accrue cost savings from a reduction in space, cooling, power and personnel.

Download now »

White Paper

Why Your Firewall, VPN, and IEEE 802.11i Aren't Enough to Protect Your Network

The emergence of WLANs has created a new breed of security threats to enterprise networks.

Included in HP ProCurve WLAN solutions is security technology that alleviates threats from WLANs through:
* Monitoring wireless activity inside and out of the enterprise
* Classifying WLAN transmissions into harmful and harmless
* Preventing transmissions that pose a security threat to the enterprise network
* Locating participating devices for physical remediation

Download now »

White Paper

Bringing the Edge to the Data Center

Effectively address data protection challenges, implementing solutions that help store and protect business–critical data while cutting costs and improving efficiency and reliability.

Download now »

Sign up to receive Data Management Resource Alerts

Subscribe to the Today's Headlines: First Look Newsletter

Find out what will be news for the day, with our first-thing-in-the-morning briefing.

©1994-2009 Infoworld, Inc.