Free Newsletters
InfoWorld Daily

InfoWorld
Log-in | Register

Improve the quality of enterprise data

Ensuring data quality is always harder than it seems, but new tools are making the toughest task in IT a bit easier


When I was a young programmer at an investment bank, my desk was next to the department of “data integrity,” a small group with the thankless job of making sure that the databases held accurate records of stock transactions. The bank’s computers could process millions of transactions in seconds, but a mistyped key or a missing value could jam the entire assembly line for data.


When things were running smoothly, we would amuse ourselves with philosophical discussions about just what it meant for data to have integrity. At the time, the bank didn’t want insight or truth in their databases — they just wanted the books to balance and the system to hum along. It was almost as if data integrity were an afterthought.

That view has changed. Data integrity — or data quality, as the current parlance goes — has become a hot topic in many IT departments. The CEO who used to be impressed by the Web site with forms for customers to fill out is now wondering why the data is such a mess. The marketing group wants real leads backed by real data, not a bit dump filled with inconsistency and inaccuracies.

A number of software vendors is tackling the problem by offering tools and packages that treat data as more than a pile of bits: They are building sophisticated, logical frameworks for information and tossing around philosophical words such as "ontology" to describe their models for numbers and strings in the database fields. After all, the problems of data quality exist because bits can never be perfect reflections of the underlying information.

Scrubbing data clean
These systems often have a sophisticated gloss but are typically practical tools designed to help an IT shop remove the most glaring and expensive problems. So while the problems may be framed in elevated terms, the solutions generally take the form of plain old if-then-else statements. The systems scrub, or cleanse, the data by applying rules that remove all possibilities for false duplication. They might replace all instances of “Bob” with “Robert,” for example, or recognize that all old telephone numbers from Palo Alto, Calif., must now come with a 650 area code.

One of the oldest and most common applications for data quality software is address "cleansing," the process whereby a company takes a mailing list and ensures that all of the addresses are current, valid, and as complete as possible. Pitney Bowes Group 1 Software helped the U.S. Postal Service develop the technology for parsing and correcting — and now Pitney Bowes is selling it for more general applications. The technology aggregates rules for understanding addresses into a modular application that can recognize errors, correct them, and add the most complete ZIP code. It can distinguish between the two identical abbreviations in "St. Paul’s St." and understand that "Saint Pauls Street" is the same road.

Peter Wayner is contributing editor of the InfoWorld Test Center.
Continued
1 | 2 | 3 | NEXT PAGE » 


Talkback:

commentPost a Comment

 

MOST COMMENTS

 
 





MIGRATING TO VISTA
Join Windows Vista Expert, Richard Whitehead as he presents the benefits and challenges of migrating to Windows Vista. Sponsored by Novell

»  Click here to view this Webcast
  Planning For A Disaster
This new, comprehensive Solutions Guide is your one stop source for Disaster Recovery. In it you'll learn how to reduce the likelihood of a disaster and to create a rock solid business continuity plan should you face a disaster situation. Sponsored by Equallogic

»  Click here to download now

- Special Advertising Partners -
WHITE PAPERS
 

» Technology White Papers Library

Technology White Papers by Topic

Technology White Papers E-mail Alert

Find out when the latest white paper is available:
 
 
INFOWORLD MARKETPLACE
 
» BUY A LINK NOW
 
 

Video

 
 
 

Podcasts

 
IFW Daily 10/13/2008

Survey says SAP customers discontent with new Enterprise Support, Oracle...

 
 

 

Columnists

 
 
 

Resource Center


Ads by techwords beta  [See your link here]
 




Sponsored Technology Links

 
 
 HOME  NEWS  BLOGS  PODCASTS  VIDEOS  TECHNOLOGIES  TEST CENTER  EVENTS  CAREERS   About | Advertise | Awards | RSS | Contact Us 

Copyright © 2008, Reprints, Permissions, Licensing, IDG Network, Privacy Policy, Terms of Service.
All Rights reserved. InfoWorld is a leading publisher of technology information and product reviews on topics including viruses,
phishing, worms, firewalls, security, servers, storage, networking, wireless, databases, and web services.

CIO :: ComputerWorld :: CSO :: Demo :: GamePro :: Games.net :: IDG Connect :: IDG World Expo
Industry Standard :: IT World :: JavaWorld :: LinuxWorld :: MacUser :: Macworld :: Network World :: PC World :: Playlist