Free Newsletters
Technology & Business Daily

InfoWorld
Log-in | Register

Google and Thunderstone deliver plug and search to the enterprise

Search appliances serve up admirable results

By Mike Heck
October 15, 2004
 

If you need to overhaul an aging or inadequate intranet or Web search service, search appliances afford a suitable option. There are plenty of software solutions such as Verity Ultraseek, but by the time you install and configure the software on your server, you could have several sites fully indexed with an appliance.

Free IT resource

TechNet: More ways to know it, share it, and keep it running.

Sponsored by Microsoft

Free IT resource

Attend the SOA Executive Forum: Breaking SOA Bottlenecks SOAExecForum.com/may2007

Sponsored by InfoWorld

I tested Google Search Appliance GB-1001 Version 4 and Thunderstone Search Appliance Version 5 to see how they fared against each other. I also compared their search results with those of two software solutions, Convera RetrievalWare and the search service bundled with Microsoft SharePoint Portal 2003.

After searching multiple intranet and public corporate sites -- a total of 25,000 pages -- with all four solutions, the appliances clearly won, boasting easier implementation and significantly more relevant results.

Overall, the appliances are similar in many areas. Both install in a few minutes, and search quality was virtually indistinguishable. Thunderstone's user interface is a bit less polished, but it gives you more flexibility in configuring crawls. Significantly, developers can customize the search software for special needs. Plus, you're paying far less than you would for Google's appliance.

On balance, either one would suffice in most situations. I give a slight edge to Google because the system's security is more hardened and its RAID configurations provide enterprise-level fault tolerance. And, if you can afford larger configurations, Google scales up. However, Thunderstone shows it can keep up in the important search-quality area -- and certainly puts pressure on Google to open its own system and provide better value.

Google Search Appliance GB-1001 Version 4

Arguably, Google is synonymous with fast, accurate, simple Web searching -- a reputation that sets high usability expectations for employees when you roll out enterprise search. First introduced in early 2002, the "Google-in-a-box" appliance does a great job transferring the experience of the company's public product behind your firewall. Setup and ongoing administration are minimal. Just as significant, compared with our previous review the latest version adds enterprise-specific functions, such as continuous crawling and unlimited collections, and it supports forms-based SSO (single sign-on).

Google's taken plug and play to the extreme. I merely connected a power cord to the 2U rack appliance, plugged in my laptop to a spare Ethernet port to enter network settings, and was off and running. Logging in to the Admin console from a browser presents a simplified environment for creating collections, managing crawls, customizing the layout of search pages, and testing queries.

Crawling and indexing is unusually simple, with the software automatically performing many complex tasks for you such as figuring out what content to recognize, including metatags. Just enter the starting URLs and a few other basics, such as which types of files to exclude from crawls, and you're in business. I particularly liked the graphs that chart crawl performance.

In general, the appliance includes capabilities similar to those of the public search software, including automatic spelling checker and Keymatch, which allows administrators to highlight the top search result for a given query.

Of the additions aimed at enterprise users, foremost is Collections, which allowed me to divide the main index into a number of segments that could be searched separately. For example, you might have a site within your intranet for the R&D group; a Collection would allow R&D employees to search their site exclusively. There are also options to crawl Web servers protected by user authentication and to create synonyms -- words or phrases that should be treated as equivalents -- specific to a given Collection.

The appliance continuously crawls and automatically detects the optimal crawling frequency for certain documents or pages; however, I could force the system to crawl certain URLs more or less often. For emergencies, you can inject a specific URL into the queue to be crawled.

Google also provides what it calls Front Ends, which allowed me to easily modify the output format of the search box and results to match the style of an intranet site. The Page Layout Helper makes the entire process foolproof. Investigating one of Google's options, I employed the XSLT (XSL Transformation) style sheet editor to make more intricate changes to the underlying formatting code.

For end-users, the results are pure Google: Relevant pages -- based on word matches, links, and about 100 other criteria -- are shown first, with summaries that highlight the matched word in the content of the page. Overall, searches of my Web content were very relevant and extremely fast.


Continued
1 | 2 | Next Page » 



Google Search Appliance GB-1001 Version 4

Google, google.com/services/

Very Good  8.0
criteria score weight
Ease-of-use 9 20%
Integration 7 20%
Management 8 20%
Performance 8 20%
Scalability 9 10%
Value 7 10%

Cost:
$32,000 for searching as many as 150,000 documents

Bottom Line:
Google takes the same algorithms that power Google.com and packages them in a ready-to-run system for in-house use. Although the initial cost is higher than Thunderstone, maintenance costs are included in the price. The solution delivers highly relevant results and requires next to no IT intervention to set up and keep running.

About our Reviews and Scoring Methodology



Thunderstone Search Appliance Version 5

Thunderstone Software, thunderstone.com

Very Good  8.0
criteria score weight
Ease-of-use 8 20%
Integration 8 20%
Management 7 20%
Performance 9 20%
Scalability 7 10%
Value 9 10%

Cost:
$10,000 for searching 250,000 documents; $20,000 for searching 1 million documents

Bottom Line:
Thunderstone's search appliance gives you a quickly deployed, affordable solution for adding search results to Internet or intranet sites. The system handles more than 45 queries a second, making it appropriate for larger sites. Efficient indexing returns relevant results from a variety of file types, although the administration UI could be improved.

About our Reviews and Scoring Methodology



 


 
Mike Heck is a contributing editor for the InfoWorld Test Center.
 

TOP NEWS:


»  Think small with Linutop 2 PC
The tiny, energy-efficient Linux-based Linutop 2 is a low-cost, minimalist PC that is eerily quiet to use

»  Sun technologist: SOAP stack a 'failure'
Tim Bray, co-inventor of XML, prefers REST mechanism over SOAP

»  Software piracy hurts the open-source community too
Many nations are beginning to see stolen proprietary software as a lost opportunity for open source software, whose development can encourage innovation and job growth

»  Intel readies slew of embedded chips based on Atom core
Intel is trying to increase performance and drop power consumption in more than 15 system-on-chips that use the Atom core

»  Microsoft surprise reorganization aimed at online woes
Microsoft's online troubles hint at larger vulnerability; the company is facing challenges in areas that have been a lock for many years

»  Attack code released for DNS bug
Security experts warn that this attack code may give cybercriminals a way to launch virtually undetectable phishing attacks




Are you ready for event-driven business?
"Faster than a speeding bullet" doesn't just refer to superheroes anymore, it's the velocity your business needs to compete. In this webcast you will learn strategies you can implement today that will keep your systems ahead of the increased business velocity. Sponsor: Progress Sonic

»  Click here to view this Webcast
  Zombie PCs Are Attacking Your LAN
A recent study showed that malware-infected zombie PCs are now a bigger threat to ISPs and Web infrastructure than DoS attacks. As this brand new IT Strategy Guide explains, an increased use of peer-to-peer techniques by the attackers has made it harder to fight back. Download now, compliments of Verio:

»  Click here to download now

- Special Advertising Partners -
WHITE PAPERS
 

» Technology White Papers Library

Technology White Papers by Topic

Technology White Papers E-mail Alert

Find out when the latest white paper is available:
 
 
INFOWORLD MARKETPLACE
 
» BUY A LINK NOW
 
SEE ALSO
• Gaga over Google
• Refining enterprise search
• Simple advice for complex search solutions


FIND PRODUCTS AND COMPANIES
» COMPLETE PRODUCT GUIDE



TECHNOLOGY INDEX
• Applications
• Application Development
• Security
• Networking
• Wireless
• Platforms
• Hardware
• Data Management
• Storage
• Web Services
• Business
• Telecom
• Professional Services
• Standards

TECH WATCH 


What's the 411 on GOOG-411?
Just as Google has become synonymous with "performing a Web search," 411 is understood to mean "information" -- as in "what's the 411?" I was thus surprised to discover, from a billboard, no less, that the king of search is taking on the ...

Apple HTML source reveals 'iPhone Extreme'
"This one's a stretch..." reports AppleInsider. Um, yeah. Reporting on HTML code sightings of product names could be called a stretch, but iPhone Extreme has a ring to it. Now, that sounds like the product Apple should have released first, rather ...

COLUMNISTS

Unified under law
Ephraim Schwartz's Column and Blog (InfoWorld) - In the litigious world we live in, deploying a unified communications platform in your enterprise could...
» MORE COLUMNISTS

MORE INFOWORLD BLOGS


Open Sources 
Product Management
When I joined MySQL four years ago, there was quite a lot of debate about product management. We didn't actually have ...

Zero Day 
Botnet herders tending smaller flocks
New research backs up the theory that botnet operators are keeping their networks smaller in a continued effort to keep ...



• Advice Line
• Database Underground
• The Deep End
• Enterprise Mac
• Geeks in Paradise
• Grid Meter
• The Gripe Line
• InfoWorld Daily
• Inside IT
• IT Troubleshooter
• ITXtreme
• Open Sources
• ProdBlog
• Real World SOA
• Reality Check
• Security Adviser
• SMB IT
• The Storage Network
• Tech Watch
• Virtualization Report
• Zero Day

ADVERTISEMENT


RESOURCE CENTERadvertisement 

GOVERNMENT IT & POLICY
'If you don't go after the network, you're never going to stop these guys. Never.'
From the State Department, All the News for Inquiring Minds
TechPresident, the Internet Citizenry's New Consensus Taker



Sponsored Technology Links

 
 
 HOME  NEWS  BLOGS  PODCASTS  VIDEOS  TECHNOLOGIES  TEST CENTER  EVENTS  CAREERS   About | Advertise | Awards | RSS | Contact Us 

Copyright © 2008, Reprints, Permissions, Licensing, IDG Network, Privacy Policy, Terms of Service.
All Rights reserved. InfoWorld is a leading publisher of technology information and product reviews on topics including viruses,
phishing, worms, firewalls, security, servers, storage, networking, wireless, databases, and web services.

CIO :: ComputerWorld :: CSO :: Demo :: GamePro :: Games.net :: IDG Connect :: IDG World Expo
Industry Standard :: IT World :: JavaWorld :: LinuxWorld :: MacUser :: Macworld :: Network World :: PC World :: Playlist