Free Newsletters
Technology & Business Daily

InfoWorld
Log-in | Register

The perils of dirty data

How important is data cleansing and validation? Read these tales of horror, and beware


The result? Individual companies ended up in the database 700 or 800 times, making the system even slower and less accurate.

Unfortunately, the application was so deeply embedded in the company's other systems that management was reluctant to spend the money to rip and replace. Finally, the carrier's IT department made the business case that the company's aging data app would ultimately prevent it from being able to add new customers, costing it $750,000 a day in new premiums.

At that point, the company used SSA-Name3 by Identity Systems to clean the data, ultimately weeding out 36,000 duplicate records.

Dupes are one of those problems that keep IT managers up at night. The larger your database, the worse the problem usually is, says Ramesh Menon, a director at Identity Systems, which provides identity searching and matching software for organizations such as AT&T, FedEx, and the Internal Revenue Service.

Unfortunately, nobody knows how big their problem is, he says. “If anybody tells you 'I have exactly 2.7 percent duplicates in my customer database,' they are wrong.”

There's no magic bullet, either. Menon says the solution lies in using data matching technology to isolate “the golden record,” a singular view of information across multiple data repositories. Even then, the hardest part may be getting all the vested parties in an organization to agree on what data they're willing to share, as well as what constitutes a match.

“Two different sections of the same organization may have completely different definitions of what a match or duplicate contact is,” he says. “These kinds of integrations fall apart because people can't agree about who owns the data or what information can be exchanged with others.”

4. When data decays
Remember text-based adventure games such as Zork? Apparently, somebody somewhere is still making these things. Worse, they're using data that's equally ancient.

MailChimp co-founder Ben Chestnut tells the story of an old-school games developer that used MailChimp’s e-marketing service to contact 10,000 previous customers, alerting them that he'd finally finished version two. Most of the addresses were at least 10 years old -- some of them Hotmail accounts discarded so long ago that Microsoft was using the addresses as spam traps. Within a day, all MailChimp e-mail was blacklisted by Hotmail's spam filter.

Fortunately for MailChimp, the developer had kept pristine records, down to the IP address each customer had used to download his games. That's what saved them, says Chestnut. “We fired off a quick note to Hotmail's abuse desk -- proved they were legitimate customers, just old. The next day we got delisted. That's pretty rare.”

All data ages quickly, but contact data ages faster than most.

“You have to make the assumption that data decays like a radioactive sample,” says Informatica's Parthasarathi. “You have to go into every system and periodically update it.”

Dan Tynan is contributing editor at InfoWorld.
Continued
« PREVIOUS PAGE | 1 | 2 | 3 | 4 | NEXT PAGE » 


Talkback:

commentPost a Comment

 

MOST COMMENTS

 
 





Take control of your content- leverage Microsoft SharePoint
Microsoft Office SharePoint Server (MOSS) offers core content management designed for a broad user population. Attend this webcast to learn how to implement a strategy that allows for the coexistence of both MOSS and advanced ECM solution within the same IT environment. Sponsor: IBM

»  Click here to view this Webcast
  The Path to Enterprise Security
This is your comprehensive guide to Enterprise Security. In it you'll find solutions to the most pressing security threats facing you and your company. Learn the latest on insider threats and how to effectively minimize risk within your organization. Sponsored by Nokia

»  Click here to download now

- Special Advertising Partners -
WHITE PAPERS
 

» Technology White Papers Library

Technology White Papers by Topic

Technology White Papers E-mail Alert

Find out when the latest white paper is available:
 
 
INFOWORLD MARKETPLACE
 
» BUY A LINK NOW
 
 

Video

 
 
 

Podcasts

 
IFW Daily 09/05/2008

Sun to craft software stack into NAS appliances, former CA CEO Sanjay...

 
 

 

Columnists

 
 
 

Resource Center


Ads by techwords beta  [See your link here]
 




Sponsored Technology Links

 
 
 HOME  NEWS  BLOGS  PODCASTS  VIDEOS  TECHNOLOGIES  TEST CENTER  EVENTS  CAREERS   About | Advertise | Awards | RSS | Contact Us 

Copyright © 2008, Reprints, Permissions, Licensing, IDG Network, Privacy Policy, Terms of Service.
All Rights reserved. InfoWorld is a leading publisher of technology information and product reviews on topics including viruses,
phishing, worms, firewalls, security, servers, storage, networking, wireless, databases, and web services.

CIO :: ComputerWorld :: CSO :: Demo :: GamePro :: Games.net :: IDG Connect :: IDG World Expo
Industry Standard :: IT World :: JavaWorld :: LinuxWorld :: MacUser :: Macworld :: Network World :: PC World :: Playlist