Desktop search gets down to business
Solutions from dtSearch, Google, ISYS, X1 allow users to explore knowledge assets throughout the enterprise
When enterprises roll out search applications, it's usually a big IT effort to keep indexes refreshed and the overall systems running. Because of this complexity and the reality that most enterprise knowledge resides on workers' PCs, consumer desktop search technology has infiltrated organizations -- and has caught IT executives off guard.
There's no questioning the benefit of quickly finding that e-mail or spreadsheet squirreled away months ago. Yet there are still red flags concerning security of consumer desktop tools, such as revealing private personal or corporate information or introducing spyware to the enterprise network. More significantly, these tools lack the centralized administration so essential for enterprise deployments.
What, then, distinguishes tools that are free or for personal use from those you'd consider purchasing for your organization? To answer this question, I looked at enterprise products from dtSearch, ISYS Search Software, and X1 Technologies, along with Google's Desktop Search, which has recently been outfitted with corporate features.
I checked the breadth of file types, total number of documents, and systems that each enterprise product indexes, as well as how each accomplishes this. Accuracy is of utmost importance, of course, along with usability. The end-user experience is not, however, just about forming queries and displaying readable results; the operational side, which includes the building and sharing of indexes, is equally significant.
Search performance goes beyond how fast a product indexes and returns results. Thus, in testing these products, I also considered what lies beneath, such as the index size and system resources consumed. Given that IT staff resources come at a premium, I examined how customizable each product is -- and whether rollouts and updates could be performed with existing software management tools.
Last but not least, security is paramount even when these search tools are used within a corporate firewall. Desktop search applications should respect Windows authentication and related permissions, such as log-ins to file servers, Web sites, applications, and local workstations.
dtSearch7.01
The old man of the bunch, dtSearch introduced its desktop text retrieval software in 1991, and Version 7.01 further improves the product's usability and performance. Besides full-text scanning of Outlook e-mail, indexed documents can be in HTML, PDF, XML, Word, Excel, PowerPoint, WordPerfect, RTF, and ZIP formats. The system also searches unindexed documents as well as a combination of both. The network version adds scans of remote file servers.
This application offers a wide range of search options (12 in all), including fuzzy, phonic, natural language, Boolean logic, and proximity. Search results appear in a customizable browser. Navigation commands permit quick scans through documents, although dtSearch lacks results clustering.
Among the products reviewed here, dtSearch offers the most options for managing indexes, including merging and creating libraries. You can index Web sites to any level you want, and the spider works on both static HTML pages and dynamic sites, such as those driven by content management systems. One improvement I'd like to see is password encryption -- passwords entered to crawl protected sites could potentially be read by anyone with access to your PC.
A separate application, which introduces two more interfaces, is used for search and displaying results. The search part puts the majority of options on the main tab, so it could be a bit daunting for first-time users. After you learn how to find your way around, however, the design saves time. For example, you can select the indexes to search, features, and relevance all at once.
When I searched local e-mail and Office documents, dtSearch always returned results in less than a second, living up to the company's claims. The software will store original documents or text equivalents, too, bringing the time to search remote servers or Web sites to less than a second as well -- although I found that this setting increases index size by about 20 percent. For the most part, indexes were reasonably sized -- about one-fifth of total document size -- because they're compressed in ZIP format.
I was especially impressed with the transparent features that boost accuracy. For example, dtSearch automatically recognizes fields in XML files and meta information embedded in PDF documents, resulting in more relevant returns.
In addition to precise results, these lists are usable, with the top part of the view showing document name along with a relevance score and the lower pane previewing the document in the original form with the hits highlighted. I had no problem jumping back and forth among documents or opening documents in applications associated with the hits.
To share indexes, users merely make a shortcut to a shared data folder on a networked server -- running dtSearch from my PC automatically searched indexes listed on the server.
IT staff can automatically deploy dtSearch's main executable file using Group Policy Objects in Microsoft's Active Directory or by employing Microsoft SMS (Systems Management Server). It's also easy to create and deploy the separate policy file that specifies options such as the location of shared index libraries. An optional client/server version of dtSearch, Network with Spider, is essentially the same software run from a central location. Users access the shared index using a menu in the client software.
I found dtSearch to be a versatile application that allows you to hunt quickly through large local and remote data stores using practically any search formula, from keywords to fuzzy logic. In some spots the design looks a bit dated, and usability could be improved by integrating indexing and search functions. Still, performance is great and makes dtSearch useful in almost any endeavor.
Google Desktop Search for Enterprise
Almost identical to the consumer version, Google Desktop Search for Enterprise indexes content on your local hard drive from various e-mail apps, business file types, Web pages viewed with the top four browsers, and AIM chats. Employees search and see results using the familiar Google interface.
This version has a legitimate claim to the "enterprise" tag because of its centralized administration and security. IT staff may restrict indexing of secure sites, and there's encryption of users' local indexes to protect them from unauthorized access. Plus, the Google software works on workstations with multiple log-ins: Users search across the files only they can access, while content associated with other accounts remains secure.
After a quick setup, Desktop Search automatically performs an initial index of all PC files and then perpetually refreshes the catalog. This step requires about an hour.
If you'd rather not wait for the auto-indexing, you may select which items to crawl, such as e-mail, Word documents, or PDF files. Google doesn't recognize as many file formats as do the other products reviewed here -- it covers a little more than 25 -- but it hits all the major types, including Outlook and Notes e-mail and Office documents. I didn't find any documents that I couldn't index.
In addition to built-in formats, plug-ins are available for download, many of which are freeware contributed by outside developers. I recommend looking into these if you have special search needs; the plug-ins are conveniently linked from Google Desktop's Help site and at desktop.google.com/plugins.
Google's numerous search operators include phrase, site, file type, and advanced e-mail; these apply whether you search using a toolbar search box or Web page. By combining appropriate operators, I quickly limited an e-mail search to messages on a specific topic from a certain person. Although this package doesn't match ISYS:desktop's advanced clustering, Google nevertheless does a solid job grouping all e-mail search results on the same topic.
File, chat, and Web search results are also organized well. The file results page shows an icon indicating its type, a snippet of the content, a link to open the file, and a link to cached versions of the file. Web results show a small thumbnail of the page's layout.
What impressed me most were the functions that should endear Google to your IT operations staff. Desktop Search for Enterprise includes a Group Policy Administrative Template, which permits policy settings targeted at each user. Policies are very granular (such as disabling indexing of certain document types or Web sites), permit securing of indexes with EFS (Encrypted File System), and are especially well documented. Distribution is performed via Microsoft Active Directory server or SMS, and staff can test updated versions of the software before distributing them.
To scan intranets and other corporate data, you'll need a Google Search Appliance or Google Mini in addition to Google Desktop Search. That adds cost to this solution, but not necessarily complexity. Although I wasn't able to test the latest version of the Google Search Appliance software, it has been improved during the past year to address earlier limitations on searching enterprise content. For example, you can now search IBM DB2, SQL Server, MySQL, Oracle, and Sybase databases.
"Free" is a bit misleading when factoring in an appliance to search business repositories, but even considering that addition, the complete solution can come in below the cost of other, similarly configured solutions -- throwing in premium support, however, may alter that.
ISYS:desktop7
A significant upgrade that streamlines the ISYS:desktop search solution, Version 7 gives users more relevant results. It indexes as many as 64 million documents per index and can chain as many as 128 indexes, or 8 billion documents.
A new taskbar object allows users to search at any time, with options to select indexes or launch the full query interface. In either case, users drill down through results faster because ISYS:desktop 7 now performs on-the-fly categorization. Also contributing to a fine experience is ISYS:desktop's capability of searching more than 140 structured, unstructured, and semistructured file formats.
As does dtSearch, ISYS:desktop 7 employs a separate utility for creating and managing indexes. Using this utility, I was able to define what each index should catalog, choosing among documents, e-mail, Web sites, and specific folders and file types. Each index has a setup wizard; they all include the option to set a daily update schedule and an agent to alert you when new documents appear that meet your search criteria. But, as opposed to the other products tested here, ISYS:desktop does not provide a built-in or optional server component -- to crawl and store searches centrally -- although indexes can be shared.
Indexing speed was reasonably good, with typical throughput of 10GB per hour -- a full index takes 24MB of disk space. An advanced feature, Rich SQL, displays database records in a readable format, and I used the built-in HTML Editing Suite to customize database display templates. ISYS also indexes documents and binary objects stored in SQL and Lotus Notes database.
In an unusual move for a desktop search tool, ISYS:desktop automatically creates categories when indexing, based on metadata, folder names, database table names, and related attributes. Typically you only see this with specialized products, such as Vivísimo Velocity, and it makes finding relevant information much easier. There are five ways to search indexes. Menu-assisted is by far the easiest method to build exact queries because it guides novices through using conditional operators. I also constructed natural-language queries easily and refined results with Word Wheel ("sounds like" or "starts with"). Web-style search uses a syntax common to public Internet search engines, and the command-line option will appeal to expert users.
After entering a query, ISYS:desktop responds in less than a second with a well-designed results list, including the most relevant documents grouped into categories, those with secondary interest, and a large review pane with hits highlighted. I would like to see a high-fidelity preview such as PDF or Word documents in their original form; right now, ISYS:desktop displays only a plain-text preview of a document's content. That said, you can immediately launch the original document from the ISYS:desktop 7 toolbar.
ISYS:desktop 7 worked well without much tuning. I searched within results, applied various filters, and hid results that didn't interest me, all of which helped refine results. ISYS:desktop 7 also searches most Asian and European languages -- a boon for international companies.
Those working with a lot of structured data will like ISYS:desktop 7's spiffed-up metadata handling. There are now more metadata search operators, and you can see a document's known metadata by hovering over the results. ISYS:desktop 7 is extensible with scripting tools; for example, you could inject metadata into documents from an external database programmatically to improve results. dtSearch 7 is the only other product in this roundup that offers that type of control.