Big data blues: The dangers of data mining

Big data might be big business, but overzealous data mining can seriously destroy your brand. Will new ethical codes be enough to allay consumers' fears?

More than simply bits and bytes, big data is now a multibillion-dollar business opportunity. Savvy organizations, from retailers to manufacturers, are fast discovering the power of turning consumers' ZIP codes and buying histories into bottom-line-enhancing insights.

In fact, the McKinsey Global Institute, the research arm of McKinsey & Co., estimates that big data can increase profits in the retail sector by a staggering 60 percent. And a recent Boston Consulting Group study reveals that personal data can help companies achieve greater business efficiencies and customize new products.

[ InfoWorld presents the Bossies 2013, the best open source software for clouds, mobile, developers, and more. | Get the latest insight on the tech news that matters from InfoWorld's Tech Watch blog. ]

But while harnessing the power of data analytics is clearly a competitive advantage, overzealous data mining can easily backfire. As companies become experts at slicing and dicing data to reveal details as personal as mortgage defaults and heart attack risks, the threat of egregious privacy violations grows.

Just ask Kord Davis. A digital strategist and author of Ethics of Big Data: Balancing Risk and Innovation, Davis says, "The values that you infuse into your data-handling practices can have some very real-world consequences."

Take Nordstrom, for example. The upscale retailer used sensors from analytics vendor Euclid to cull shopping information from customers' smartphones each time they connected to a store's Wi-Fi service -- a move that drew widespread criticism from privacy advocates. (Nordstrom is no longer using the analytics service.)

Hip clothing retailer Urban Outfitters is facing a class-action lawsuit for allegedly violating consumer protection laws by telling shoppers who pay by credit card that they had to provide their ZIP codes -- which is not true -- and then using that information to obtain the shoppers' addresses. Facebook is often at the center of a data privacy controversy, whether it's defending its own enigmatic privacy policies or responding to reports that it gave private user data to the National Security Agency (NSA). And the story of how retail behemoth Target was able to deduce that a teenage shopper was pregnant before her father even knew is the stuff of marketing legend.

Online finger-wagging, lawsuits, disgruntled customers -- they're the unfortunate byproducts of what many people perceive to be big data abuses. According to a September 2013 study from data privacy management company Truste, 1 of 3 Internet users say they have stopped using a company's website or have stopped doing business with a company altogether because of privacy concerns.

Honesty really is the best policy

But IT professionals are discovering that balancing the power of sophisticated algorithms with consumer rights is about more than avoiding bad publicity or lost sales. These days, it pays to be honest -- literally. "Organizations that are transparent about their use of data will be able to use that as a competitive advantage," predicts Davis. "People are starting to become very interested in what's going on out there with their data, so organizations that have practices in place to share that information ethically are going to be in a much better position to be trusted."

Yet many CIOs and data scientists are struggling with the question of how to derive real value and actionable insights from confidential data while still respecting consumers' rights and even earning their trust. As the store of data grows, and techniques for manipulating data multiply, some IT professionals are taking matters into their own hands with innovative approaches to preventing data abuse.

Retention Science is a perfect example. The Santa Monica, Calif.-based data analytics firm uses predictive algorithms and data such as aggregated household income, purchasing histories and credit scores to help companies predict a customer's purchase probability and build retention-marketing campaigns. In addition to the data supplied by a client, Retention Science also relies on the data it licenses from third-party providers to target the right consumers at the right time.

To create targeted campaigns while still respecting consumer privacy, Retention Science has established hard-and-fast rules governing its use of consumer data. For one, Retention Science refuses to share data across clients. For example, if Gap Inc. were a client, and had supplied Retention Science with consumer data, that information would never be shared -- even anonymously -- with other retail clients.

In another effort to preserve consumer privacy despite handling terabytes of confidential data, Retention Science insists that all of its data scientists, many of whom are professors and researchers, sign confidentiality agreements. "They are not allowed to share or use data anywhere else or for their own publications," says Retention Science CEO Jerry Jao.

In addition to holding its own employees accountable, Retention Science also "works only with businesses that are fully committed to getting their consumers' consent in advance to use their data," says Jao. "We don't want to include information from individuals if they didn't grant access in the first place."

Full cookie disclosure

While setting internal controls can help, you can go a step further by offering consumers a firsthand look at what's known about them. One company that has an open-book policy like that is BlueKai, a Cupertino, Calif.-based vendor that offers a data management platform that marketers and publishers can use to manage and activate data to build targeted marketing campaigns. In 2008, BlueKai decided to launch an online portal where consumers can find out exactly what cookies BlueKai and its partners have been collecting for them, item by item, based on their browsing histories.

Consider, for example, a woman who is shopping online for a red bicycle. As she visits different sporting goods sites that partner with BlueKai, a collection of anonymous cookies is stored on her browser. Based on this browsing history, BlueKai marketing partners will display behavioral ads on the woman's computer that are relevant to her bike-shopping quest.

These days, most online shoppers realize that it's not a coincidence when they see ads that are clearly tied to their browsing histories. But the BlueKai Registry makes the process more transparent, and even allows visitors to opt out of the registry altogether or update their anonymous profiles by changing their preferences.

BlueKai CEO Omar Tawakol says the thinking behind the registry is that, "If there's data known about you that's tradable between any two entities, it should be completely controlled by the consumer." For this reason, BlueKai also encourages its partners to post private versions of the registry on their own websites to allay consumers' concerns and promote greater transparency.

"The beauty of what we do is we don't know who you are," says Tawakol. "We don't want to know anybody's name. We don't want to know anything recognizable about them. All we want to do is show you that your cookies are accessible, and that they have these attributes associated with them."

BlueKai isn't the only big data rock star that's handing out backstage passes. Marketing technology company Acxiom recently made headlines by launching AboutTheData.com, a free site where people can view some of the information the Little Rock, Ark.-based company has gathered about them. Details range from marital status to what kind of vehicle you drive. Visitors simply enter key personal information to find out what data advertisers are using to help tailor their marketing messages.

The fact that powerful data brokers such as Acxiom are helping to demystify data-driven marketing initiatives is no surprise to BlueKai's Tawakol. He believes that companies have no choice now but to respond to changing consumer sentiment around data privacy. "Years ago, people built data companies in the shadows where consumers had no control," he says. "It's a different age now -- consumers should be in control."

Davis' perspective on the move toward greater transparency is more cynical. Noting that "organizations are starting to face increasingly close scrutiny around their data practices," he says companies have an ulterior motiving for coming clean about how they use information like ZIP codes and credit scores: Doing so helps them avoid legal entanglements and bad press. What's more, Davis says, many initiatives that are touted as offering people insight into how they're being tracked are more about public relations than full disclosure. "What they're still not telling me is who's buying that data and what they're doing with it," he says.

Policies under fire

Unfortunately, greater transparency doesn't always translate into greater understanding. The privacy policies of industry titans such as Facebook and Google have recently come under fire for being hard to understand and far too long to slog through. Presented as 70-page novellas filled with vague terms like "non-personally identifiable information," some policies have even sparked probes by regulators at the Federal Trade Commission.

In fact, the results of an April 2012 survey by strategic branding firm Siegel+Gale indicated that users have little understanding of how Facebook and Google track, store and share their information. Survey participants were asked to review Facebook's and Google's privacy policies and then rate how well they understood them on a scale of zero to 100 (with 80 indicating good comprehension). Facebook scored 39 and Google 36 -- indications of poor comprehension.

"People don't understand what they're agreeing to," says Davis. "Organizations make it a lot more complicated than it should be." Besides, he adds, "reading all of the terms of services that we receive would take us 76 days a year."

That's not to suggest that privacy policies have no value in the world of big data. Rather, says Nans Sivaram, a client partner at IT consultancy and outsourcer Infosys, instead of sharing terms and conditions, companies need to "[communicate] the value consumers will receive if they part with certain information."

In a recent Infosys global survey, 39 percent of the respondents said that they consider data mining invasive. And 72 percent said they don't feel that the online promotions or emails they receive speak to their personal interests and needs. Yet, Sivaram says, "consumers are willing to part with personal information, provided there's good reason to."

The result is a high-tech Catch-22: On the one hand, consumers want to receive highly targeted and personalized products and services. On the other hand, they don't want to feel as if their personal data is up for commercial grabs.

"Retailers need to do a much better job of using the data that they already have to reach their customers," says Sivaram. "At the same time, they have to be careful about being seen as invasive because they don't want to get into trouble and lose the trust of their customers."

So what's the solution? According to Sivaram, the answer is for big data collectors "to establish the right incentives" for people to divulge their personal details. For example, by showing people that sharing their information can earn them loyalty points or discounts, companies can create greater value for their customers while converting consumer trust into a competitive advantage.

The same rule of reciprocity applies to online content as well. Says BlueKai's Tawakol: "When we have asked people in surveys, 'Would you prefer to pay for your content or would you prefer to have targeted ads alongside your content?' it's usually in the high 90 percent of people who would prefer sponsored content."

Setting a code of conduct

However, not everyone believes that the burden should be placed on consumers to blithely agree to share their data, decipher confusing privacy policies or swap credit scores for grocery coupons. For example, Michael Walker says that big data professionals should adopt a code of ethics. A managing partner at Rose Business Technologies, a Denver-based systems integrator and IT services provider, Walker has drafted a 12-page data science code of professional conduct covering everything from the role of data scientists to their daily responsibilities (see story below).

"Companies are starting to understand the danger of secondary uses of information and how people's personal data can be abused," says Walker. "Once they start to think about it, they're very much in favor of an ethical code."

In fact, in an August 2013 survey conducted by statistical software company Revolution Analytics, 80 percent of the respondents said they agreed that there should be an ethical framework for collecting and using data. And more than half of data scientists surveyed agreed that ethics already play a big part in their research.

"My solution is to have some sort of code of professional conduct that data scientists would voluntarily agree to follow to protect people's private data," says Walker. By creating a kind of Hippocratic Oath for analytics professionals, Walker says data scientists will have both the moral and legal grounds for refusing to slice and dice numbers in ways that threaten to violate consumer privacy rights.

Walker isn't the first to conceive of a code of ethics for analysts. Earlier this year, the Institute for Operations Research and the Management Sciences (INFORMS) drafted a code of ethics to accompany the launch of its Certified Analytics Professional (CAP) certification program.

Yet Davis believes that despite lofty intentions, it's far too easy for a code of ethics to wind up "written on a piece of paper and put in a drawer." The challenge, he says, "is that you have to get real about understanding what you actually do with your data and whether or not that aligns with the shared values in your organization." Unfortunately, he says, determining what your values are as an organization, and whether or not your data practices reflect these priorities, "is a very different conversation than what we're used to having in a business setting."

And then there are IT professionals who maintain that it's simply not a data scientist's job to protect privacy. Instead, "their job is to extract interesting insights from the data," says Ryan Kalember, chief product officer at WatchDox, a Palo Alto, Calif.-based vendor of security tools.

Market will drive answers

Whether privacy is the purview of consumers, corporate executives or data scientists, one thing is certain: Data privacy is a hot-button issue. Even the U.S. government is investigating organizations that collect and manage big data and pressuring them to provide consumers with appropriate control over their personal data. But industry observers aren't holding their breath for sweeping governmental action. "It's not like the Founding Fathers are getting together in Philadelphia," says Davis.

The ongoing revelations about the NSA's Prism data-collection program have, if anything, further eroded the public's confidence that the government will do anything to protect consumers' privacy. Indeed, Tawakol says that shifts in consumer awareness about data privacy (or lack thereof) are more likely than federal investigations to drive reforms in data collection practices.

"The market will provide a mechanism quicker than legislation will," he says. "There is going to be more and more control of your data, and more clarity on what you're getting in return. Companies that insist on not being transparent are going to look outdated."

Walker shares that vision of the future. "There are lots of benefits to having data analyzed and having companies narrowly tailor specific products and services to customer preferences. But it's actually in a company's best interest to respect people's private data," he says, adding that companies are going to lose customers "if consumers find out that a company has been spying on them and using data in a way that's unethical."

If a data scientist questions the quality of data or evidence, he must disclose this to the client. If a data scientist has offered material evidence and later comes to know that it is false, he shall take reasonable remedial measures, including disclosure to the client. A data scientist may disclose and label evidence he reasonably believes is false.

-- Cindy Waxer

Waxer is a Toronto-based freelance journalist. She has written articles for various publications and news sites, including The Economist, MIT Technology Review and CNNMoney.com.

Read more about applications in Computerworld's Applications Topic Center.

This story, "Big data blues: The dangers of data mining" was originally published by Computerworld.

From CIO: 8 Free Online Courses to Grow Your Tech Skills
Join the discussion
Be the first to comment on this article. Our Commenting Policies