Dupes are one of those problems that keep IT managers up at night. The larger your database, the worse the problem usually is, says Ramesh Menon, a director at Identity Systems, which provides identity searching and matching software for organizations such as AT&T, FedEx, and the Internal Revenue Service.
Unfortunately, nobody knows how big their problem is, he says. “If anybody tells you 'I have exactly 2.7 percent duplicates in my customer database,' they are wrong.”
There's no magic bullet, either. Menon says the solution lies in using data matching technology to isolate “the golden record,” a singular view of information across multiple data repositories. Even then, the hardest part may be getting all the vested parties in an organization to agree on what data they're willing to share, as well as what constitutes a match.
“Two different sections of the same organization may have completely different definitions of what a match or duplicate contact is,” he says. “These kinds of integrations fall apart because people can't agree about who owns the data or what information can be exchanged with others.”
4. When data decays
Remember text-based adventure games such as Zork? Apparently, somebody somewhere is still making these things. Worse, they're using data that's equally ancient.
MailChimp co-founder Ben Chestnut tells the story of an old-school games developer that used MailChimp’s e-marketing service to contact 10,000 previous customers, alerting them that he'd finally finished version two. Most of the addresses were at least 10 years old -- some of them Hotmail accounts discarded so long ago that Microsoft was using the addresses as spam traps. Within a day, all MailChimp e-mail was blacklisted by Hotmail's spam filter.
Fortunately for MailChimp, the developer had kept pristine records, down to the IP address each customer had used to download his games. That's what saved them, says Chestnut. “We fired off a quick note to Hotmail's abuse desk -- proved they were legitimate customers, just old. The next day we got delisted. That's pretty rare.”
All data ages quickly, but contact data ages faster than most.
“You have to make the assumption that data decays like a radioactive sample,” says Informatica's Parthasarathi. “You have to go into every system and periodically update it.”
Jigsaw.com, an online contacts database geared toward sales professionals, takes a Wiki-style approach to data cleansing. Its 335,000 members get points for uploading their own contacts to Jigsaw and correcting others. Every record must be complete, and if Jigsaw users enter information that's incorrect or old, they lose points. Members spend their points by buying information for people they want to reach.
Jigsaw CEO Jim Fowler says an Atlanta-based technology company recently asked his firm to compare its contacts databases to Jigsaw's and weed out the bad data.
“They had 40,000 records,” he says. “Only 65 percent of them were current and 100 percent were incomplete. We're finding that most of our corporate customers have sets of data so cruddy no one can match to them. Corporations spend millions on CRM, and it's amazing how bad that data is.”
The real value is not the data itself, but the ability to keep up with how quickly it changes.
“The power of Jigsaw is complete data and self-cleansing,” says Fowler. “If our self-correcting mechanisms don't work, we're just another crappy data company.”