Free Newsletters
Technology & Business Daily

InfoWorld
Log-in | Register

Refining enterprise search

Enterprise search is reaping relevant results thanks to new platforms and technologies

By Richard Gincel
October 15, 2004
 

Anyone who has been transfixed by a gymnast or a figure skater knows that the magic happens when they perform flawlessly and yet make it seem easy. That’s how a search should work: Enter a query, and the right results appear in simple, elegant fashion -- even if it took countless hours of preparation to make the magic possible.

Free IT resource

TechNet: More ways to know it, share it, and keep it running.

Sponsored by Microsoft

Free IT resource

Attend the SOA Executive Forum: Breaking SOA Bottlenecks SOAExecForum.com/may2007

Sponsored by InfoWorld

DOWNLOAD PDF

Click here to download InfoWorld's special report Enterprise search


Yet most enterprise users still stumble as they try to extract data from multiple repositories, each with its own search engine. Enterprises seem awash in a rising tide of structured and unstructured data. And even though users are often forced to tag documents manually across various content management systems in hopes that those documents will be easier to retrieve, searches still yield a surfeit of irrelevant, time-wasting results.

ESPs (enterprise search platforms) are on a mission to change all that. These new, comprehensive bundles of search and integration technologies unlock information tucked away in data stores across the enterprise. The goal of ESPs is deceptively simple: to take fairly simple queries and return the most relevant results possible, all in one place. But under the hood, ESPs aggregate a host of emerging technologies such as autocategorization, entity extraction, and NLP (natural language processing). With an ESP as a foundation, businesses can build customized search applications while automating the process of preparing documents for archiving and indexing.

“The building blocks are converging so that you don’t have to cobble together all the pieces yourself,” observes Susan Feldman, vice president of content technology research at IDC. These advanced search platforms establish sophisticated gateways to silos of information -- even those with their own search engines. ESPs also provide a common set of data and search logic that can be tuned on an application-by-application basis to improve the relevance of search results.

IBM last month came out swinging with its DB2 Information Integrator, code-named Masala, which contains an advanced search engine designed to complement the company’s other heavy hitters in the content management arena, DB2 Content Manager and WebFountain. With Masala, IBM joins the ranks of Autonomy, Convera, EasyAsk, Endeca, Fast Search & Transfer (FAST), iPhrase, and Verity, each of which offers search-application platforms with a different mix of features.

Breaking down the walls

ESPs are transforming the way the enterprise conducts a federated search, the process by which a single query is passed to multiple search engines and the user is presented with aggregated results. A federated search can augment searches of similar data stores but loses traction when it runs up against external databases that require specific syntax.


Click for larger view.
Basic federated search, which has been in existence for years, “doesn’t protect the user from another kind of infoglut -- getting irrelevant results from multiple search engines instead of just one,” observes Hadley Reynolds, vice president and director of research at Delphi Group. “Without some additional sense-making, it’s a blunt instrument.”

Compounding matters, enterprises typically have multiple search engines embedded in various applications -- for instance, one in a content management system, one in the Microsoft Office environment, and another in an e-mail program. The ESP transcends these search-engine silos and corresponding data repositories and imposes syntax translation and other linguistic manipulations, such as spell-check and phrase detection, on the query prior to crawling the data stores.


Click for larger view.
At the indexing layer, the ESP aids the user by returning lists of improved query choices based on the context of the original, sometimes vague, query. Take FAST’s ESP, which powers the public-facing Scirus.com. If you type the word “nuclear” in an effort to retrieve published science-journal entries related to that topic, the keyword will reap more than 700,000 returns. A refined keyword search selected from the list of suggestions on the right-hand side of the page -- “nuclear facility” -- whittles that to approximately 1,000. Click once more, on “uranium enrichment,” and you’re down to about 10.


Continued
1 | 2 | 3 | Next Page » 



 


 
Richard Gincel is an associate editor at InfoWorld.
 

TOP NEWS:


»  Think small with Linutop 2 PC
The tiny, energy-efficient Linux-based Linutop 2 is a low-cost, minimalist PC that is eerily quiet to use

»  Sun technologist: SOAP stack a 'failure'
Tim Bray, co-inventor of XML, prefers REST mechanism over SOAP

»  Software piracy hurts the open-source community too
Many nations are beginning to see stolen proprietary software as a lost opportunity for open source software, whose development can encourage innovation and job growth

»  Intel readies slew of embedded chips based on Atom core
Intel is trying to increase performance and drop power consumption in more than 15 system-on-chips that use the Atom core

»  Microsoft surprise reorganization aimed at online woes
Microsoft's online troubles hint at larger vulnerability; the company is facing challenges in areas that have been a lock for many years

»  Attack code released for DNS bug
Security experts warn that this attack code may give cybercriminals a way to launch virtually undetectable phishing attacks




TAKE CONTROL OF YOUR CONTENT- LEVERAGE MICROSOFT SHAREPOINT
Microsoft Office SharePoint Server (MOSS) offers core content management designed for a broad user population. Attend this webcast to learn how to implement a strategy that allows for the coexistence of both MOSS and advanced ECM solution within the same IT environment. Sponsor: IBM

»  Click here to view this Webcast
  Zombie PCs Are Attacking Your LAN
A recent study showed that malware-infected zombie PCs are now a bigger threat to ISPs and Web infrastructure than DoS attacks. As this brand new IT Strategy Guide explains, an increased use of peer-to-peer techniques by the attackers has made it harder to fight back. Download now, compliments of Verio:

»  Click here to download now

- Special Advertising Partners -
WHITE PAPERS
 

» Technology White Papers Library

Technology White Papers by Topic

Technology White Papers E-mail Alert

Find out when the latest white paper is available:
 
 
INFOWORLD MARKETPLACE
 
» BUY A LINK NOW
 
SEE ALSO
• Simple advice for complex search solutions
• Google and Thunderstone deliver plug and search to the enterprise


FIND PRODUCTS AND COMPANIES
» COMPLETE PRODUCT GUIDE



TECHNOLOGY INDEX
• Applications
• Application Development
• Security
• Networking
• Wireless
• Platforms
• Hardware
• Data Management
• Storage
• Web Services
• Business
• Telecom
• Professional Services
• Standards

TECH WATCH 


What's the 411 on GOOG-411?
Just as Google has become synonymous with "performing a Web search," 411 is understood to mean "information" -- as in "what's the 411?" I was thus surprised to discover, from a billboard, no less, that the king of search is taking on the ...

Apple HTML source reveals 'iPhone Extreme'
"This one's a stretch..." reports AppleInsider. Um, yeah. Reporting on HTML code sightings of product names could be called a stretch, but iPhone Extreme has a ring to it. Now, that sounds like the product Apple should have released first, rather ...

COLUMNISTS

Unified under law
Ephraim Schwartz's Column and Blog (InfoWorld) - In the litigious world we live in, deploying a unified communications platform in your enterprise could...
» MORE COLUMNISTS

MORE INFOWORLD BLOGS


Open Sources 
Product Management
When I joined MySQL four years ago, there was quite a lot of debate about product management. We didn't actually have ...

Zero Day 
Botnet herders tending smaller flocks
New research backs up the theory that botnet operators are keeping their networks smaller in a continued effort to keep ...



• Advice Line
• Database Underground
• The Deep End
• Enterprise Mac
• Geeks in Paradise
• Grid Meter
• The Gripe Line
• InfoWorld Daily
• Inside IT
• IT Troubleshooter
• ITXtreme
• Open Sources
• ProdBlog
• Real World SOA
• Reality Check
• Security Adviser
• SMB IT
• The Storage Network
• Tech Watch
• Virtualization Report
• Zero Day

ADVERTISEMENT


RESOURCE CENTERadvertisement 

GOVERNMENT IT & POLICY
'If you don't go after the network, you're never going to stop these guys. Never.'
From the State Department, All the News for Inquiring Minds
TechPresident, the Internet Citizenry's New Consensus Taker



Sponsored Technology Links

 
 
 HOME  NEWS  BLOGS  PODCASTS  VIDEOS  TECHNOLOGIES  TEST CENTER  EVENTS  CAREERS   About | Advertise | Awards | RSS | Contact Us 

Copyright © 2008, Reprints, Permissions, Licensing, IDG Network, Privacy Policy, Terms of Service.
All Rights reserved. InfoWorld is a leading publisher of technology information and product reviews on topics including viruses,
phishing, worms, firewalls, security, servers, storage, networking, wireless, databases, and web services.

CIO :: ComputerWorld :: CSO :: Demo :: GamePro :: Games.net :: IDG Connect :: IDG World Expo
Industry Standard :: IT World :: JavaWorld :: LinuxWorld :: MacUser :: Macworld :: Network World :: PC World :: Playlist