A career path that began with studying infectious diseases and led to analyzing terabytes of game data may seem a circuitous route. For Brendan Burke, though, the applied math skills he picked up as an undergraduate biology and political science major, the programming skills he added as a bioengineering graduate student, and his use of the two as a research scientist led to a job in the booming IT field of data science.
"A lot of the skill set I developed very specifically for biology could be applied in very commercially viable ways," says Burke, who earned both of his degrees from Stanford University and worked at the California school as a scientist. As head of player science at Playnomics, a Silicon Valley company that uses game data to develop player analytics, the math and computer science skills he used to determine how many touch points a virus requires to spread across a population now help him understand how people interact with games.
[ Also on InfoWorld: Get out of your career rut! Check out the tech jobs still begging to be filled and the 10 U.S. cities with the highest-paying IT jobs. | Get sage advice on IT careers and management from Bob Lewis in InfoWorld's Advice Line blog and newsletter. ]
"Something in data science gets your creative juices flowing when you see something that you built for an entirely different purpose can be used in all of these other ways," he says.
Data science also excites companies that want to use the data they've amassed to make strategic decisions that will benefit the bottom line.
A range of industries are using data to guide business decisions and bring in revenue, says Laura Kelley, a vice president at technology staffing firm Modis. "Companies are using this information to launch products and services. Whether it's what customers are buying, what products or services get the better ROI, [data] comes into strategic decisions."
Businesses, though, are struggling to find employees to handle big data, the term assigned to gathering and analyzing massive quantities of information. This field is relatively new to enterprise IT, and although many companies are exploring data science programs, the necessary talent is still maturing, say technology and staffing executives.
This places people with applicable skills in demand now and in the future, say hiring experts. The U.S. faces a substantial shortage of workers with data science skills, according to a much-talked about report published last year by consulting firm McKinsey and Company. The report predicted that by 2018, the country will lack 1.5 million analysts who can make strategic decisions using big data and between 140,000 to 190,000 workers with the proper data-processing technology skills.
"There [will be] more career opportunities in the future for this type of strategic analysis," says Kelley, who has seen the business intelligence analyst job change into a data scientist position in the last 18 months. "We've always used information, but not to this level. With the amount of data companies are capturing on everything and everybody, it's just amazing what can be done with that."
Colleges have realized the need to train people for those careers and are developing degree and certification programs targeting undergraduate and graduate students as well as IT professionals. To address immediate data-science staffing needs, which include technical and business roles, companies have adopted assorted tactics.
To handle the more than 100TB of data processed each week by BrightEdge, a San Mateo, California, startup that helps companies manage their search engine rankings, CEO and founder Jim Yu wants data workers who grasp the entire scope of big data processing.
People know how to query databases, but there is "an extra layer of understanding" when handling large data sets, which at BrightEdge includes tracking data on more than 150 billion URLs. Experience working with traditional SQL relational databases helps, but big data's scale requires a different processing mindset, he said.
"There's a nuanced leap there when you move into this big data environment," Yu says. "You're really looking at the optimal configuration of taking these massive processing jobs and figuring how do you distribute this load on servers that are much less monolithic and much more distributed."
In addition to database knowledge, Yu notes that strong backgrounds in computer science, algorithms, and OSes are helpful bases to a BrightEdge data science career.
"If they have a good foundation in that, then you pair that up with a [training] program that allows them to understand how to translate into this new architecture," he says. This buddy system, which matches workers who have worked on the big data stack with people who are learning the system, leads to knowledge sharing, he said.
This method also helps people new to extreme data crunching learn which data processing jobs call for big data technology and when to use traditional relational databases, Yu says.
"With big data, one of the advantages is the scale of what you can do," he explains. "But it also means you don't have the same speed of development from having the really simple, flexible standardized SQL language that you can apply to the data set. There are tradeoffs that you're making. It's important for the technology staff to have a good appreciation of that."
DataXu, a Boston company that offers a product for managing online advertising campaigns, also takes a team approach to filling data science jobs, says CTO Bill Simmons. Big data workers there have strong math and coding skills and some business savvy, he says.
People who excel in one area, are strong at a second, and have a grasp of the third allow the company to form teams based on different strengths.
Possessing "two out of the three is what you need to get the job done," he says, adding that finding people who have a strong background in one of those areas is fairly easy. Standouts in all three skills are harder to come by. "I would be delighted finding someone who is a star in all three areas."
Employers also seek workers whose software skills and data backgrounds match their work environments.
Companies select database software that can handle their data sets, which can be complex, says Rob Byron, a principal consultant in the information technology division of staffing firm Winter, Wyman. Employees, for their part, prefer to stick with the software they know.
"The general outlook is if we have a SQL server data warehouse I'm looking for Microsoft [skills]. Oracle people need not apply. And vice versa. And quite frankly, a lot of candidates don't want to learn new skills," he says.
Given the amount of data companies are dealing with, they only want candidates who have handled that volume, says Modis' Kelley. A person's data experience, not the industry they're in, is what matters to employers, she notes.
"Data is data," she says. "Industry vertical really isn't going to be the key driver. Its going to be what did you do with the data, how large of an environment was it."
Firms will avoid candidates who have only worked in smaller environments because at "very large enterprise big data programs ... you're talking about huge amounts of data, and that could be very overwhelming to someone."
While IT professionals have a grasp of what traits work for data science's technical positions, defining what backgrounds make for a good analyst proves difficult.
These positions go beyond possessing strong technology skills, so being a solid developer does not necessarily translate to an analyst job. Companies need employees who can make the data work for the business.
"Companies are really looking for higher level quantitative skill sets for these roles," says Kelley. "It's not every developer you come across. It's someone with that business acumen who can parlay those skills into strategic decision making."
Filling the data analysis roles at BrightEdge entails finding candidates who "understand the right questions to ask around this data and how to tease this into actions that result in business outcomes for our costumers," says Yu. The challenge is finding those who can "artfully bridge the technical capabilities of the cool things you can do and connect the dots to the value. The challenge there is not so much it's new but that's just a harder skill set to find."
DataXu's data analysts need to understand how to boil petabytes of data down into two charts that get to the essential information, says Simmons. "It is an emerging skill that no one learns in college. You have to learn this on the job. We hire smart people and train them."
Playnomics' Burke learned about translating data analysis into tangible business information on an early assignment that involved predicting user longevity on games. His model identified a segment of people that was 85 percent likely to spend significant time playing a game. When executives and marketing staff asked Burke to define that group, his reply of "cluster 32" was not the response they wanted.
"They were looking for us to describe what does the average player look like in that cluster in terms they could understand," he says. "Being a data specialist requires you to not just be somebody who's good at statistics but also good at using those statistics to tell a story."
Burke encourages research scientists whose backgrounds may translate into a data science career to consider the field. Trading academia for the private sector allowed him to continue to solve puzzles with data. Now instead of solving theoretical problems on how disease travels, his models have the real-world effect of helping game companies better understand their business.
"All problems in biology aren't necessarily commercial ones," he says. "The things that we try solve may be purely for academic or knowledge reasons. It's easy to get caught up in those because they're very interesting. But for anyone who is in research science and is inspired by real work applications of that stuff, working in the private sector is an excellent experience."
And the data science jobs will be there for them. Winter, Wyman's Byron has seen an uptick in clients looking for people who can build and maintain large data warehouses.
"The demand outweighs the supply because the demand keeps growing," he says. "It's not like this has been around forever ... so your pool of candidates is smaller."