"Hadoop is only now moving from R&D domain to mainstream corporate arena. There are not very many professionals out there in the market," notes Timothy Diep, business development manager at DCKAP, a technology consulting company that provides Hadoop development and consulting. "There is a premium on people who know enough about the guts of Hadoop."
Diep says customers ask for three main skill sets: data analysts/data scientists, data engineers, and data management professionals.
Analysts should have experience in SAS, SPSS, and programming languages like R. "These are the professionals who will generate, analyze, share, and integrate intelligence gathered and stored in Hadoop environments," he says.
How-to: cascading, open source Java framework, can ease big data hiring pain
Data engineers, meanwhile, are responsible for creating the data processing jobs and building the distributed MapReduce algorithms that the data analysts use, Diep explains. Finally, data management personnel do three things, he says: Make the call on whether to deploy Hadoop either in the cloud or using on-premises and selected vendors and distributions, determine the size of the cluster, and decide whether the cluster will be used for running production applications or for quality-testing purposes.
Lunexa, a boutique technology consulting firm, also pursues the Hadoop space. The company focuses on helping customers develop business solutions on top of the Hadoop platform, says David Cole, partner at Lunexa.
CIOs and IT managers aren't just looking for Hadoop skills. They also seek knowledge of how Hadoop interacts with business intelligence tools, Cole says, adding that "virtually every BI vendor is putting together a Hadoop story."
Lunexa launched eight years ago as a traditional ETL (extract, transform, load) and data warehousing company. The company deployed an in-house Hadoop cluster about a year ago, Cole says, and invested in Hadoop training for its consultants at the same time.
Much of Lunexa's Hadoop business is in developing MapReduce code for complex analytics or ETL, Cole says. Demand cuts across a range of industries, but Cole notes that Lunexa sees a bit more work coming from large media companies and financial services firms, where large data volumes are common.
Lunexa also works with Hive, a Hadoop data warehousing system, while a consulting firm may also find a role in Hadoop administration. "There are multiple facets of working on Hadoop," Cole says.
Training is another Hadoop service-and one that distributors provide. Hortonworks, for instance, offers courses in developing solutions using and administering Hadoop. The developer class spans four days; the administrator course takes two days.
Renee Beckloff, senior director of global delivery services for Hortonworks, cites the scarcity of Hadoop professionals. "I've been in education for the last 15 years, and this is probably the shortest supply of quality personnel for any type of platform or any type of technology that I've seen," she says.
MapR Technologies offers training for administrators, data modelers, and developers. Cloudera also offers a combination of developer and administrator training.
Omer Trajman, vice president of technology solutions for Cloudera, says the company has trained about 12,000 people since launching its training program three years ago. The company is currently training Hadoop personnel at the rate of 1,500 per month.
"Training is in high demand, and the skills are in high demand," Trajman says. "Not enough people have learned how to use Hadoop."
John Moore has written on business and technology topics for more than 20 years. His areas of focus include mobile app development, health IT, cloud computing, government IT and distribution channels. Follow everything from CIO.com on Twitter @CIOonline, on Facebook, and on Google +.
Read more about data management in CIO's Data Management Drilldown.