Free Newsletters
Technology & Business Daily

InfoWorld
Log-in | Register

OpenPipeline seeks to ease document prep for search

Dieselpoint's open-source project helps companies standardize the processing of information before it gets pushed into a search-engine indexer


Enterprise search vendor Dieselpoint is behind a new open-source project centering on a document "pipeline" -- or as the Chicago company's CEO, Chris Cleveland, puts it, "all the boring stuff you need to make enterprise search work."

Enterprise search implementations often cover an array of document sources and components; pipelines allow companies to standardize the processing of information before it gets pushed into a search-engine indexer.

"We're connecting the crawler companies to the text analytic companies to the search engine companies," Cleveland said.

Dieselpoint was having trouble integrating its own pipeline with third-party document analyzers and content connectors, and has open-sourced it as a basis for the project, which is dubbed OpenPipeline.

Its Web site is scheduled to open to the public on Monday, and a fully functional version of the software will be downloadable under the Apache 2.0 license. It is available under a commercial license as well, according to the site.

The software features a point-and-click user interface and provides a number of connectors, including Web and SQL crawlers. It also supports a number of commercial connectors for products such as SharePoint, Exchange and a number of portals.

Dieselpoint is pursuing the project both to make bigger, more complex implementations easier and in hopes that it will draw some customers to its search engine.

"The single biggest barrier to adoption of enterprise search is doing integration," Cleveland said. "Of course, it means enormous consulting engagements, so it's a source of revenue for the industry, but it's a deterrent."

While major search vendors have pipelines, they are "all proprietary and all closed," he said.

A number of other vendors and consultants have signed on to the effort's advisory board. They include Alias-i, Applied Relevance and Raritan Technologies. Cleveland is anticipating more companies will join soon.

Conceptually, an open-source pipeline makes sense for the industry on the whole "because each component is worthless on its own," he suggested.

Guy Creese, an analyst with Burton Group, compared OpenPipeline to an existing project.

"IBM attempted to fix this issue with UIMA (Unstructured Information Management Architecture), its framework for letting multiple vendors work together on a text analytics pipeline. However, UIMA has not done especially well in the market," he said. "It's unclear whether that's due to the complexity of UIMA or the fact that the market isn't quite there yet (I believe it's the latter)."

"In short, OpenPipeline is an interesting, open-source alternative to UIMA. However, its appeal will still remain small in the market, as many enterprises aren't at the point where they need to mix and match text analytics modules," he added.

But Cleveland countered that even basic aspects of an enterprise search implementation can involve a lot of "drudgery," which OpenPipeline can help alleviate: "It's the simple stuff. 'Can I get [data] out of the system, add security to it and send it to the search engine?'"


Talkback:

commentPost a Comment

 

MOST COMMENTS

 
 





Virtualization: A Step by Step Approach to Success
Your virtual machines can be up and running in a matter of minutes. HP and Citrix have integrated XenServer with HP ProLiant servers and management tools, powered by hardware-assisted Intel Virtualization Technology to enable high- performance, cost-savings solutions for server consolidation and disaster recovery. Sponsor: HP

»  Click here to view this Webcast
  The Data Protection You've Been Looking For
Enterprise data is of supreme importance. If you can't find it quickly, it's worthless. If you lose it, it's a crisis. This IT Strategy Guide explores how to keep your data safe.

»  Click here to download now

- Special Advertising Partners -
WHITE PAPERS
 

» Technology White Papers Library

Technology White Papers by Topic

Technology White Papers E-mail Alert

Find out when the latest white paper is available:
 
 
INFOWORLD MARKETPLACE
 
» BUY A LINK NOW
 
 

Video

 
 
 

Podcasts

 
 
 

 

Columnists

 
 
 

Resource Center


Ads by techwords beta  [See your link here]
 




Sponsored Technology Links

 
 
 HOME  NEWS  BLOGS  PODCASTS  VIDEOS  TECHNOLOGIES  TEST CENTER  EVENTS  CAREERS  IT EXEC-CONNECT   About | Advertise | Awards | RSS | Contact Us 

Copyright © 2008, Reprints, Permissions, Licensing, IDG Network, Privacy Policy, Terms of Service.
All Rights reserved. InfoWorld is a leading publisher of technology information and product reviews on topics including viruses,
phishing, worms, firewalls, security, servers, storage, networking, wireless, databases, and web services.

CIO :: ComputerWorld :: CSO :: Demo :: GamePro :: Games.net :: IDG Connect :: IDG World Expo
Industry Standard :: IT World :: JavaWorld :: LinuxWorld :: MacUser :: Macworld :: Network World :: PC World :: Playlist