exalead and Siderean guide users down differing paths to data troves
Competing search tools effectively group data and guide users
As a rule, search engines should return reliable results. So when unanticipated responses appear, it’s possible that the initial query was too broad or ambiguous. To help users sharpen their searches, vendors have turned to grouping results into manageable collections. Clustering and faceted categorization are two popular methods.
By placing related results together, clustering helps users discover unforeseen patterns in documents. Importantly, clustering doesn’t require organizations to preprocess documents or add special metatags.
Besides offering these advantages, exalead’s affordable search solution is available as a standalone desktop client -- and as workgroup or enterprise servers. A well-done interface presents users with clustered results.
Without prep work, however, clustered results aren’t always as relevant as you’d hope. For example, documents might appear in clusters where they don’t belong, or they might overlap categories and confuse users. This doesn’t happen as often with an alternate technique, formally called HFC (hierarchical faceted categories), or facets. Unfortunately, this added accuracy requires preassigning categories to each document.
Siderean’s Seamark Navigator automates this step while precalculating relationships among documents. As a result, Seamark Navigator is especially valuable for regulatory compliance and business intelligence apps.
exalead one:desktop, one:workgroup, and one:enterprise 4.0
exalead’s three products, designed specifically for enterprise search, share a common engine technology called exalead:search. exalead one:desktop Professional Edition, an end-user piece, indexes your hard disk, and Outlook and Lotus Notes documents. exalead one:workgroup server lets desktop users extend their searches to network file servers. exalead one:enterprise searches diverse databases; moreover, one:enterprise is based on Java and XML, so admins can customize the interface and integrate it with other apps.
exalead one:workgroup, a service running under Windows Server, has the same straightforward indexing setup. In this case, I selected shared folders on various file servers for crawling. one:desktop users then add a link to the one:workgroup server when they want to search these networked resources.
This ease extends to exalead’s federated search process. From one:desktop, I merely checked off the indexes (PC, workgroup server, or Web) and a navigation pane immediately appeared that summarized the structure and concepts contained in the combined search results.
In exalead’s model, you can conduct a general search with a few words and then focus the results. For example, exalead initially found files broadly related to my search topic in 20 categories; clicking on an undesired category title immediately removed those results. The speed with which it allows you to take an initial search in almost any direction is another great thing about exalead. After I located an e-mail from one person about my search term, I then found all e-mails from that author with a single click.
Search speed was excellent in my test of about 25,000 documents -- with results typically displayed in 0.05 of one second. Just as significant, searches produce easily understandable information. exalead generates thumbnails of documents and Web pages, shows a summary of the information (including where it resides in the source directory structure), and it provides a preview window.
exalead’s categorization also truly enhances the whole search process. For instance, I searched for a certain data leak review I wrote last year. Not only did exalead:one find that article on infoworld.com and my local drafts in Microsoft Word format, but exalead returned related Web links to security executives and articles about managing e-mail security.
Although to date, exalead:one websearch has indexed more than 4 billion public pages, Google and others needn’t fret about exalead encroaching on their leadership in consumer search. In fact, exalead offers a feature to federate Google public searches. But exalead:one represents a key trend of organizing -- and integrating -- public sources for specific research.
Indexing up to 200,000 documents, exalead:one enterprise’s straightforward GUI let me setup connectors for crawling SQL and Notes databases, file shares, and intranet Web sites. During crawls, exalead converts data to XML, analyzes it, and indexes it. Additionally, I had great control over the process, such as specifying which categories appeared in search results. Crossing over into the facet realm, admins can import, reuse and edit classifications from existing taxonomy projects -- or create new cataloging systems by extracting metadata from documents.
Organizations can install one:enterprise as their sole search solution or couple it with one:desktop in the latter case. I merely added the enterprise server’s index and thereby federated local and enterprise results.
exalead:one products all provide simple, but not simplistic, search, with quick and easy setup. A unified interface combines results from multiple sources, and the applications were strong performers in deriving structure from documents and automatically generating categories.
Siderean Seamark Navigator 4.0
Siderean’s enterprise search solution includes three main modules. Seamark Navigator 4.0 finds and indexes content found within RSS feeds and in enterprise databases by recognizing existing metadata -- which is then encoded according to the RDF (Resource Description Framework) open standard. After aggregating sources you specify, Navigator organizes the information into “facets” presented in a browser interface.
Seamark MAPP (Metadata Assembly Processing Platform) is an entity-extraction system that harvests metadata from unstructured sources, including Microsoft SharePoint and file systems. This application, which uses IBM’s open source UIMA (Unstructured Information Management Architecture) framework, also integrates with commercial products such as Lexalytics and Lockheed Martin’s AeroText.
Compared to the other solutions, Seamark required additional time for me to engineer a working search app. However, the Seamark Administration UI clearly maps out the necessary steps, so I didn’t expend much effort learning the system. I started by specifying feeds -- which can be XML documents, database queries, and direct input from supported enterprise search engines.
Once Navigator transforms the feeds into RDF and stores the descriptions in a relational database, Seamark automatically builds a default XRBR (XML for Retrieval by Reformulation) query containing all facets required for navigation. An information architect could then massage the XRBR for special needs.
At this point, Seamark creates a JSP search page that can be placed on a Web app server. Alternately, a SOAP API allows other applications to send queries to Seamark and receive responses in a SQL-like format; this capability enables Seamark to be customized for specialized e-commerce, business intelligence, or similar needs.
For my testing, I used Seamark Navigator’s stock Web search interface and found it worked as intended. The initial contextual view showed all content related to my query, organized into expected facets. Seamark’s relevance ranking algorithms performed well, displaying facets with my search terms first, and then placing likely documents at the top of the list within each facet.
With a few clicks I then pivoted searches to look at different paths and zoomed into results on particular facets.
Navigator dynamically updated the number and type of items in each facet as I changed views. Further, the system provided summaries of each item and highlighted keywords. Spelling correction, along with advanced search functions (fuzzy, proximity, Boolean, and grouping), helped me focus on the information I wanted.
Navigator’s RDF processing took about 15 percent longer than exalead’s indexing. Seamark’s Java search functions and presentation, however, didn’t introduce any measurable lag.
Seamark Navigator let me establish roles and then limited results based on a user’s authorization. However, some enterprises will want more security options, such as integration with Active Directory or LDAP servers.
One of the most interesting Seamark Navigator capability let me add my own tags to results. Not only did this allow me to reference certain pages in future searches, but it helps in building communities within your organization. For instance, experts within a department could tag certain documents and then those results would be elevated in results for anyone in a certain role. What’s more, new tags can be published as an RSS feed.
Siderean Seamark Navigator has many strong points -- from its open architecture and unique use of RDF to aggregate information from many independent sources to an intuitive search interface that dynamically pivots facets. As a result, users can sort through content without knowing beforehand precisely what they’re looking for. However, it’s not an out-of-box solution and requires work if your data is not already in XML format.
For large enterprises with the bulk of information in databases, Siderean Seamark Navigator 4.0 should be a very good fit, because structured information is processed -- and then presented for searching -- with relative ease. The system also has comprehensive features to extract metadata from unstructured sources, which does increase the solution’s complexity. I consider exalead:one’s components some of the up-and-coming leaders in enterprise search -- for allowing users to effortlessly explore clustered results, for offering many deployment options, and because of their affordable pricing.
InfoWorld Scorecard | Integration (20.0%) |
Scalability (10.0%) |
Value (10.0%) |
Performance (20.0%) |
Ease of use (20.0%) |
Management (20.0%) |
Overall Score (100%) |
---|---|---|---|---|---|---|---|
exalead one:desktop, one:workgroup, and one:enterprise 4.0 | 9.0 | 9.0 | 9.0 | 8.0 | 9.0 | 8.0 | |
Siderean Seamark Navigator 4.0 | 9.0 | 8.0 | 7.0 | 9.0 | 8.0 | 7.0 |
Copyright © 2006 IDG Communications, Inc.