If you need to overhaul an aging or inadequate intranet or Web search service, search appliances afford a suitable option.
There are plenty of software solutions such as Verity Ultraseek, but by the time you install and configure the software on
your server, you could have several sites fully indexed with an appliance.
I tested Google Search Appliance GB-1001 Version 4 and Thunderstone Search Appliance Version 5 to see how they fared against
each other. I also compared their search results with those of two software solutions, Convera RetrievalWare and the search
service bundled with Microsoft SharePoint Portal 2003.
After searching multiple intranet and public corporate sites -- a total of 25,000 pages -- with all four solutions, the appliances
clearly won, boasting easier implementation and significantly more relevant results.
Overall, the appliances are similar in many areas. Both install in a few minutes, and search quality was virtually indistinguishable.
Thunderstone's user interface is a bit less polished, but it gives you more flexibility in configuring crawls. Significantly,
developers can customize the search software for special needs. Plus, you're paying far less than you would for Google's appliance.
On balance, either one would suffice in most situations. I give a slight edge to Google because the system's security is more
hardened and its RAID configurations provide enterprise-level fault tolerance. And, if you can afford larger configurations,
Google scales up. However, Thunderstone shows it can keep up in the important search-quality area -- and certainly puts pressure
on Google to open its own system and provide better value.
Google Search Appliance GB-1001 Version 4
Arguably, Google is synonymous with fast, accurate, simple Web searching -- a reputation that sets high usability expectations
for employees when you roll out enterprise search. First introduced in early 2002, the "Google-in-a-box" appliance does a
great job transferring the experience of the company's public product behind your firewall. Setup and ongoing administration
are minimal. Just as significant, compared with our previous review the latest version adds enterprise-specific functions, such as continuous crawling and unlimited collections, and it supports
forms-based SSO (single sign-on).
Google's taken plug and play to the extreme. I merely connected a power cord to the 2U rack appliance, plugged in my laptop
to a spare Ethernet port to enter network settings, and was off and running. Logging in to the Admin console from a browser
presents a simplified environment for creating collections, managing crawls, customizing the layout of search pages, and testing
queries.
Crawling and indexing is unusually simple, with the software automatically performing many complex tasks for you such as figuring
out what content to recognize, including metatags. Just enter the starting URLs and a few other basics, such as which types
of files to exclude from crawls, and you're in business. I particularly liked the graphs that chart crawl performance.
In general, the appliance includes capabilities similar to those of the public search software, including automatic spelling
checker and Keymatch, which allows administrators to highlight the top search result for a given query.
Of the additions aimed at enterprise users, foremost is Collections, which allowed me to divide the main index into a number
of segments that could be searched separately. For example, you might have a site within your intranet for the R&D group;
a Collection would allow R&D employees to search their site exclusively. There are also options to crawl Web servers protected
by user authentication and to create synonyms -- words or phrases that should be treated as equivalents -- specific to a given
Collection.
The appliance continuously crawls and automatically detects the optimal crawling frequency for certain documents or pages;
however, I could force the system to crawl certain URLs more or less often. For emergencies, you can inject a specific URL
into the queue to be crawled.
Google also provides what it calls Front Ends, which allowed me to easily modify the output format of the search box and results
to match the style of an intranet site. The Page Layout Helper makes the entire process foolproof. Investigating one of Google's
options, I employed the XSLT (XSL Transformation) style sheet editor to make more intricate changes to the underlying formatting
code.
For end-users, the results are pure Google: Relevant pages -- based on word matches, links, and about 100 other criteria --
are shown first, with summaries that highlight the matched word in the content of the page. Overall, searches of my Web content
were very relevant and extremely fast.