Google and Thunderstone deliver plug and search to the enterprise

Search appliances serve up admirable results

If you need to overhaul an aging or inadequate intranet or Web search service, search appliances afford a suitable option. There are plenty of software solutions such as Verity Ultraseek, but by the time you install and configure the software on your server, you could have several sites fully indexed with an appliance.

I tested Google Search Appliance GB-1001 Version 4 and Thunderstone Search Appliance Version 5 to see how they fared against each other. I also compared their search results with those of two software solutions, Convera RetrievalWare and the search service bundled with Microsoft SharePoint Portal 2003.

After searching multiple intranet and public corporate sites -- a total of 25,000 pages -- with all four solutions, the appliances clearly won, boasting easier implementation and significantly more relevant results.

Overall, the appliances are similar in many areas. Both install in a few minutes, and search quality was virtually indistinguishable. Thunderstone's user interface is a bit less polished, but it gives you more flexibility in configuring crawls. Significantly, developers can customize the search software for special needs. Plus, you're paying far less than you would for Google's appliance.

On balance, either one would suffice in most situations. I give a slight edge to Google because the system's security is more hardened and its RAID configurations provide enterprise-level fault tolerance. And, if you can afford larger configurations, Google scales up. However, Thunderstone shows it can keep up in the important search-quality area -- and certainly puts pressure on Google to open its own system and provide better value.

Google Search Appliance GB-1001 Version 4

Arguably, Google is synonymous with fast, accurate, simple Web searching -- a reputation that sets high usability expectations for employees when you roll out enterprise search. First introduced in early 2002, the "Google-in-a-box" appliance does a great job transferring the experience of the company's public product behind your firewall. Setup and ongoing administration are minimal. Just as significant, compared with our previous review the latest version adds enterprise-specific functions, such as continuous crawling and unlimited collections, and it supports forms-based SSO (single sign-on).

Google's taken plug and play to the extreme. I merely connected a power cord to the 2U rack appliance, plugged in my laptop to a spare Ethernet port to enter network settings, and was off and running. Logging in to the Admin console from a browser presents a simplified environment for creating collections, managing crawls, customizing the layout of search pages, and testing queries.

Crawling and indexing is unusually simple, with the software automatically performing many complex tasks for you such as figuring out what content to recognize, including metatags. Just enter the starting URLs and a few other basics, such as which types of files to exclude from crawls, and you're in business. I particularly liked the graphs that chart crawl performance.

In general, the appliance includes capabilities similar to those of the public search software, including automatic spelling checker and Keymatch, which allows administrators to highlight the top search result for a given query.

Of the additions aimed at enterprise users, foremost is Collections, which allowed me to divide the main index into a number of segments that could be searched separately. For example, you might have a site within your intranet for the R&D group; a Collection would allow R&D employees to search their site exclusively. There are also options to crawl Web servers protected by user authentication and to create synonyms -- words or phrases that should be treated as equivalents -- specific to a given Collection.

The appliance continuously crawls and automatically detects the optimal crawling frequency for certain documents or pages; however, I could force the system to crawl certain URLs more or less often. For emergencies, you can inject a specific URL into the queue to be crawled.

Google also provides what it calls Front Ends, which allowed me to easily modify the output format of the search box and results to match the style of an intranet site. The Page Layout Helper makes the entire process foolproof. Investigating one of Google's options, I employed the XSLT (XSL Transformation) style sheet editor to make more intricate changes to the underlying formatting code.

For end-users, the results are pure Google: Relevant pages -- based on word matches, links, and about 100 other criteria -- are shown first, with summaries that highlight the matched word in the content of the page. Overall, searches of my Web content were very relevant and extremely fast.

Both the Thunderstone and the Google appliances generate some log-based reports, including search activity and the top 100 keywords and queries. The big difference between the two is that Google allows you to export usage logs for analysis in other packages.

The Google Search Appliance highlights both the advantages and pitfalls of this approach. It's exceedingly simple to set up and requires none of the elaborate document preparation of other enterprise search solutions. Plus, it delivers a familiar, pleasant search experience for employees who search intranets and for visitors to your public Web sites. But for enterprises needing federated search, you're limited to Web resources. If your needs are within those bounds, this is a wise choice.

Thunderstone Search Appliance Version 5

Thunderstone has been supplying search engine technology since 1981, with the company's core Texis application powering its Thunderstone Search Appliance. The software is particularly suitable when you need to sort through a large amount of structured data and unstructured content and then return results in grouped categories. Besides general Web search, catalogs, classified advertising, and document management are a few other typical apps.

Similar to the Google Search Appliance, Thunderstone's small Linux-based appliance is a true turnkey solution. I connected the server to my network and logged in from a PC. The Web interface and forms are organized based on common tasks such as basic walk (crawl). As such, I had my first collection built and a basic search running within a few minutes.

Options on the advanced configuration page are less clear, but once you figure out what they do -- good context help is provided -- Thuderstone's Search Appliance shows exceptional breadth. For example, I enabled searching from JavaScript-based menus, crawled password-protected areas, and specified various file types to search, including Macromedia Flash movies. Additionally, I indexed documents on multiple servers within one collection. Conversely, I easily excluded URLs and content from being crawled.

Released at the end of August, Texis Version 5 includes a number of improvements over the version InfoWorld tested a year ago. Foremost is adaptive indexing. Put simply, the crawler revisits each page or document on a separate schedule based on how often it has changed in the past.

Akin to sponsored links on commercial search engines, Best Bets has potential, but it falls short due to poor implementation. The idea is to enter some keywords and the associated pages you want to fall at the top of the results when someone enters that query. Unfortunately, making this work required completing a number of steps spread across multiple menus.

That wrinkle aside, the Thunderstone Search Appliance performed very well where it counts: delivering relevant results. Thunderstone's advanced search options allowed me to change ranking factors such as the importance of a word's frequency in a document, which helped the few times I wanted better results. Plus, I had no problem configuring search settings so that results were formatted using a custom XSL style sheet.

The system offers some basic real-time reporting based on logs, including Top Queries and Top Query Words. It would be helpful to have charting -- or at least a way to export report data to another app.

Thunderstone crawls a variety of Web content and produces highly relevant results. Administration options are more than enough, as are the ways to customize the search interface's appearance. The

Web Services API enables the appliance to be integrated into other applications, such as portals, via SOAP. (If you need to do heavy customization, Thunderstone offers an upgrade path to their full Texis application.) There's room for improvement in the UI, however, and it would be helpful to have more options, such as related concepts, to format results.

Ultimately, both appliances do what appliances are supposed to do: They just work. Although they don't claim the federated search possible with expensive enterprise search implementations, both Google's and Thunderstone's search appliances are affordable, require little support, and deliver relevant results.

InfoWorld Scorecard
Value (10.0%)
Performance (20.0%)
Ease of use (20.0%)
Management (20.0%)
Integration (20.0%)
Scalability (10.0%)
Overall Score (100%)
Google Search Appliance GB-1001 Version 4 7.0 8.0 9.0 8.0 7.0 9.0 8.0
Thunderstone Search Appliance Version 5 9.0 9.0 8.0 7.0 8.0 7.0 8.0

Copyright © 2004 IDG Communications, Inc.

How to choose a low-code development platform