Dirty IT job No. 3: Data cleansing drone
Wanted: Detailed-oriented individual to pore over endless amounts of repetitive data looking for errors. Requires high tolerance for mindless drudgery; clinical diagnosis of obsessive-compulsive disorder a plus.
Data is a harsh mistress. The same name spelled two different ways or slight variations in addresses can wreak havoc with your inventory, screw up your billing, break the supply chain, make customer service a living hell, and cause the suits to make bad decisions. That's why thousands of organizations hire drones to comb through company files looking for inaccuracies, inconsistencies, discrepancies, duplicates, and other data glitches.
[ Beware the perils of dirty data. ]
"We call it the 'Monk factor,'" says Stefanos Damiankis, CEO of Netrics, a maker of data matching software. "Like the detective in the 'Monk' TV show, every organization has obsessive-compulsive guys who pore over the data and try to make it perfect."
Forget perfect data. Getting the data to where it's usable is hard enough, he says. "The job is dirty because the data is relentless. You're just sitting there looking at the same things over and over. It's mind-numbing, and the tools available to do the job are typically antiquated."
Even if the data is consistent across all fields, organizations still need people to figure out what it really means, says Leonard Dubois, senior vice president of marketing and sales support for Harte-Hankes Trillium Software, maker of data quality solutions.
"In large organizations there are hundreds of people poring over Excel spreadsheets and Word documents trying to determine what the business meaning of a specific term -- like 'customer' -- might be," says Dubois. "And every silo in the organization might have a different definition. If I order a book from Amazon for my wife, who's the customer? To the billing department, it's me. To marketing, it's my wife. To shipping, it's the address where the book got sent."
The data drone has to go in and figure out which definition is the correct one for each group -- an expensive and time-consuming process. Data quality software like Netrics' or Trillium's can automate many of these tasks, detect errors, and reduce guesswork. More often than not, though, you still end up with outliers that have to be handled by humans.
"They call it data cleansing for a reason," Dubois adds. "It's a tedious process to go through data files and figure out the meanings of each term."