How to build a great data science team

Select the mix of skills right for your data analytics business goals

Enterprises that want to launch big data initiatives -- or even more ambitiously, seek to create an "analytics culture"  -- invariably should answer a handful of critical questions before spending money and allocating resources: What's the business case for analytics? Which big data tools should we use? Should we hire a data analytics vendor to handle everything? If we build an in-house team, where do we get the analytics talent?

The last question is rooted in the reported (and sometimes disputed) shortage of data scientists needed to meet growing demand as enterprise and consumer data continue to increase exponentially. But if an enterprise is fully committed to data analytics, it will find or develop the talent.

[ Download InfoWorld's Big Data Analytics Deep Dive for a comprehensive, practical overview of this hot topic. | Cut to the key news for technology development and IT management with our once-a-day summary of the top tech happenings. Subscribe to the InfoWorld Daily newsletter. ]

Beyond talent acquisition, the fundamental challenge facing enterprises trying to build an effective data analytics team is determining the optimum combination of skills, background, and personality.

Two senior data scientists who lead their own respective data science operations talked with CITEworld about what enterprises should consider when assembling a team.

"The first step is to define a clear business goal, or at least one the company is working toward," says Kevin Lyons, senior vice president of analytics for eXelate, a digital marketing data management platform vendor. "If you can't identify it, there's no way you'll be able to achieve it."

Data scientists for companies such as Google and Facebook, for example, must produce analytics for computers that "learn" about consumers and can predict behavior. These types of data scientists typically have strong mathematical and computational skills.

Conversely, data scientists charged with producing analytics for humans to make product or operational decisions usually need stronger "soft" skills.

"You need at least one person who can communicate," says Claudia Perlich, chief scientist at Dstillery, a marketing company that analyzes web browsing data to help brands target ads. "Somebody who can sit down with the CTO or CMO or CEO and have a good enough understanding of the business problems to help frame what role and what specific task data science should work on."

As essential as soft skills are to data scientists who must interact with colleagues in business units and executive suites, Perlich emphasizes that they need some fundamental technical chops.

"They don't need super coding skills, but they need to be able to access data," she says. "They need at least a scripting language, say Perl or Python, in order to manipulate data once it's out of wherever they found it. And they need a practical understanding of statistics. They don't need probability theory, but they need to understand empirical distributions of data and how the mean can be super misleading when you have a long-tail distribution."

Lyons goes a step further, saying he is "something of a purist" regarding the technical knowledge of data scientists.

"If you're going to have a successful data science team, you need the necessary data science skills," he says. "By that I mean a solid foundation in something like computer science and modeling statistics, probably a master's or better, a familiarity with procedural languages like Java or C, scripting languages such as Python, and familiarity with Unix and Linux."

Lyons also suggests a functional approach to building your data team, one that is followed by eXelate.

"Every data project has four components," he says. "One is understanding the business need. The second is gathering and massaging and preparing the data. The third is doing the modeling and fourth is operationalizing the outcome.

"We have people here who have a very good business sense who can understand what the business needs and turn it into a plan," Lyons says. "We have people with data management roles who can prepare the data in either an ad hoc or automated fashion. We have people who can do the modeling and do the visualization of that modeling. And we have people who can write the code that turns those things into automated systems that we can put live."

Likewise, Perlich says the Dstillery team also features members with specific strengths that cover all the roles (communicator, statistician, coders) required of an effective data analytics operation.

Both Perlich and Lyons champion diversity on data science teams.

"I try to have as much diversity on my team as possible," Lyons says. "Currently we have someone with a Linux administration background, someone with a computational finance background, someone with a geography background who is one if the best data visualization experts I've ever encountered, someone from actuarial science, someone with data management and someone with agency and training desk experience."

"There are a lot of really curious smart people who have learned data and who come from extremely diverse backgrounds," says Perlich.

Finally, one qualification that many enterprises may be looking for in a data scientist is entirely unnecessary, according to Perlich.

"They don't need to understand the industry you're in," she says. "If they're smart enough to be data scientists, trust me, they can learn about your industry in a month or so. Don't worry about it."

This story, "How to build a great data science team" was originally published by CITEworld.

Copyright © 2014 IDG Communications, Inc.

How to choose a low-code development platform