"My solution is to have some sort of code of professional conduct that data scientists would voluntarily agree to follow to protect people's private data," says Walker. By creating a kind of Hippocratic Oath for analytics professionals, Walker says data scientists will have both the moral and legal grounds for refusing to slice and dice numbers in ways that threaten to violate consumer privacy rights.
Walker isn't the first to conceive of a code of ethics for analysts. Earlier this year, the Institute for Operations Research and the Management Sciences (INFORMS) drafted a code of ethics to accompany the launch of its Certified Analytics Professional (CAP) certification program.
Yet Davis believes that despite lofty intentions, it's far too easy for a code of ethics to wind up "written on a piece of paper and put in a drawer." The challenge, he says, "is that you have to get real about understanding what you actually do with your data and whether or not that aligns with the shared values in your organization." Unfortunately, he says, determining what your values are as an organization, and whether or not your data practices reflect these priorities, "is a very different conversation than what we're used to having in a business setting."
And then there are IT professionals who maintain that it's simply not a data scientist's job to protect privacy. Instead, "their job is to extract interesting insights from the data," says Ryan Kalember, chief product officer at WatchDox, a Palo Alto, Calif.-based vendor of security tools.
Market will drive answers
Whether privacy is the purview of consumers, corporate executives or data scientists, one thing is certain: Data privacy is a hot-button issue. Even the U.S. government is investigating organizations that collect and manage big data and pressuring them to provide consumers with appropriate control over their personal data. But industry observers aren't holding their breath for sweeping governmental action. "It's not like the Founding Fathers are getting together in Philadelphia," says Davis.
The ongoing revelations about the NSA's Prism data-collection program have, if anything, further eroded the public's confidence that the government will do anything to protect consumers' privacy. Indeed, Tawakol says that shifts in consumer awareness about data privacy (or lack thereof) are more likely than federal investigations to drive reforms in data collection practices.
"The market will provide a mechanism quicker than legislation will," he says. "There is going to be more and more control of your data, and more clarity on what you're getting in return. Companies that insist on not being transparent are going to look outdated."
Walker shares that vision of the future. "There are lots of benefits to having data analyzed and having companies narrowly tailor specific products and services to customer preferences. But it's actually in a company's best interest to respect people's private data," he says, adding that companies are going to lose customers "if consumers find out that a company has been spying on them and using data in a way that's unethical."
If a data scientist questions the quality of data or evidence, he must disclose this to the client. If a data scientist has offered material evidence and later comes to know that it is false, he shall take reasonable remedial measures, including disclosure to the client. A data scientist may disclose and label evidence he reasonably believes is false.
-- Cindy Waxer
Waxer is a Toronto-based freelance journalist. She has written articles for various publications and news sites, including The Economist, MIT Technology Review and CNNMoney.com.
Read more about applications in Computerworld's Applications Topic Center.
This story, "Big data blues: The dangers of data mining" was originally published by Computerworld.