CONSIDER THE amount of information your company has generated over its lifetime, and the many formats in which that information resides. Now consider having to sift through the backlog of data to encode every page in an effort to build a unified, searchable database for customers and employees.

   ADVERTISEMENT
  

Free IT resource

Open Source Business Conference (OSBC) May 22-23, 2007

Sponsored by OSBC

Free IT resource

TechNet: More ways to know it, share it, and keep it running.

Sponsored by Microsoft

RELATED LINKS
»  Canonical chases deals to ship Ubuntu Server preinstalled
»  Sun delivers first UltraSparc T2-based servers
»  IT trainer offers master's degree for hackers
»  Platforms RSS feed 

IDG ENTERPRISE NETWORK
More Operating Systems News...  (ComputerWorld)
CrossOver Office aims to ease a switch to Linux  (ComputerWorld)

TOP NEWS 


IT SOLUTION SEARCH
That task is similar to one faced by the National Library of Medicine (NLM), which, as you can imagine, had a fair amount of information to handle.

"We're not talking about 200-word abstracts. Our largest book is 1,000 printed pages," says Maureen Prettyman, computer specialist and project leader at the NLM, in Bethesda, Md. "We're also talking about consumer pamphlets that were anywhere from two to 25 pages and clinical reference guides that are around 600 pages. That's a morass of paper to get through."

The NLM, a federally funded library under the auspice of the National Institute of Health, gathers a variety of government-sponsored medical information into several databases and publishes that information to the Web, creating what Prettyman describes as the "last resort for information" for medical professionals as well as average citizens.

In 1990, when a Congressional panel first required the information to be made available electronically in full-text form, the Agency for Health Care Policy and Research -- one of the library's largest data providers -- handed NLM "a lump of money" to set about the task of encoding its information.

It quickly became apparent that the task was too daunting for NLM's staff to handle alone.

"If you are building a project like this from scratch, there's a real learning curve to it, so your manpower costs are going to be substantial," Prettyman says. "At first, I was doing most of the encoding myself, writing programs to automate as much as I could and getting our other programmers to help. But we were getting the data in so many formats that we realized we couldn't worry about all the details."

Instead of hiring an army of programmers, Prettyman chose to outsource the task to Data Conversion Laboratory (DCL), in Fresh Meadows, N.Y. That didn't mean the project went away; but it was, Prettyman says, a fairly painless way of catching up on the backlog, and DCL has been an integral partner in keeping the databases updated.

"It has been our hoped-for policy to make a document available on the Web the same day it is announced to the public as a printed version," Prettyman says. "That means as little as a six-to eight-day turnaround for documents as large as 800 pages."

Due to DCL's familiarity with the library's systems, that hasn't been a problem.

Outsourcing the library's data conversion has freed Prettyman and her staff for other projects, notably designing a new Web interface, making the transition to an object-oriented database that will improve performance and provide a better way to update text, and building tools so that data providers can submit SGML-encoded data directly to the database.

The results speak for themselves, even if money is not the measure.

"In our case it has nothing to do with making money but with making the best and most current information available. It's a service," Prettyman says. "The first three years saw a significant jump in the number of hits, and there has been a steady incline since then -- up over 2 million a month now."