Free Newsletters
Technology & Business Daily

InfoWorld
Log-in | Register

The perils of dirty data

How important is data cleansing and validation? Read these tales of horror, and beware


Few IT projects are more frightening than data integration and reconciliation. Actually, let us rephrase that. One thing is more frightening -- when data integration goes bad.

Sometimes it's a problem of starting out with bad data, through user error or even deliberate sabotage. Sometimes the data starts out good but gets lost, truncated, or altered when it moves from one system or database to another. Your data may go stale, or it may become collateral damage in a turf war inside your organization -- everyone clinging to their own little piece of the data store, nobody willing to share. The task certainly isn't helped by the overwhelming volume of data companies generate each day.

Data projects can go bad in many ways. Here are five of the most common: what went wrong, what happened as a result, and what you can do to avoid having the same thing happen to you. The names of the companies involved have been obscured to protect the guilty. Don't let your own project become someone else's horror story.

1. The “Dear Idiot” letter
Be careful where you get your data – it may come back to haunt you. This tale of terror comes from the customer call center of a large financial services institution. As in nearly all help desks, service reps take calls and enter customer information into a shared database.

This particular database had a salutation field that was editable. Instead of being constrained to Mr., Ms., Dr., etc., the field could accept 20 or 30 characters of whatever the rep typed. As service reps listened to the complaints of angry customers, some of them began adding their own, not entirely kind, notes to each record, like, “what an idiot this customer is.”

This went on for years. No one noticed because no other system in the organization pulled data from that salutation field. Then, one day, the marketing department decided to launch a direct mail campaign to promote a new product. They came up with a brilliant idea. Instead of purchasing a list, why not use the service desk database?

So the letters went out: “Dear Idiot Customer John Smith.”

Strangely, no customers signed up for the new service. It wasn't until the organization began examining its outgoing mail that it figured out why. The moral of this story? 

“We don't own our data any more,” says Arvind Parthasarathi, vice president of product management and data quality for data integration specialists Informatica. “The world is so interconnected that it's likely someone will pick up your information and use it in a way you never anticipated. Because you're pulling data from everywhere, you need to make sure you have the right level of data quality management before you use it for anything new.”

Dan Tynan is contributing editor at InfoWorld.
Continued
1 | 2 | 3 | 4 | NEXT PAGE » 


Talkback:

commentPost a Comment

 

MOST COMMENTS

 
 





Solutions to the Toughest IT Challenges in Remote Offices
Though small in size, remote offices face many of the same IT challenges as larger central offices. This Webcast zeroes in on the top line challenges to deliver information that can provide immediate benefits to your business. Sponsor: AMD and Dell

»  Click here to view this Webcast
  Virtualization Solutions Guide
This comprehensive IT Strategy Guide covers Virtualization and puts you at the forefront of the discussion. You'll learn all you need to know from the cost of virtualization, how to implement it for your business, how to back it up safely and which products are best. Sponsored by Riverbed

»  Click here to download now

- Special Advertising Partners -
WHITE PAPERS
 

» Technology White Papers Library

Technology White Papers by Topic

Technology White Papers E-mail Alert

Find out when the latest white paper is available:
 
 
INFOWORLD MARKETPLACE
 
» BUY A LINK NOW
 
 

Video

 
 
 

Podcasts

 
 
 

 

Columnists

 
 
 

Resource Center


Ads by techwords beta  [See your link here]
 




Sponsored Technology Links

 
 
 HOME  NEWS  BLOGS  PODCASTS  VIDEOS  TECHNOLOGIES  TEST CENTER  EVENTS  CAREERS   About | Advertise | Awards | RSS | Contact Us 

Copyright © 2008, Reprints, Permissions, Licensing, IDG Network, Privacy Policy, Terms of Service.
All Rights reserved. InfoWorld is a leading publisher of technology information and product reviews on topics including viruses,
phishing, worms, firewalls, security, servers, storage, networking, wireless, databases, and web services.

CIO :: ComputerWorld :: CSO :: Demo :: GamePro :: Games.net :: IDG Connect :: IDG World Expo
Industry Standard :: IT World :: JavaWorld :: LinuxWorld :: MacUser :: Macworld :: Network World :: PC World :: Playlist