Such people may be in short supply. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills, as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions, McKinsey and Company estimated.
Another skill you will need to have on hand is the ability to wrangle the large amounts of hardware needed to store and parse the data. Managing 100 servers is a fundamentally different problem than handling 10 servers, Maitland pointed out. You may need to hire a few supercomputer administrators from the local university or research lab.
No. 4: Big data doesn't require organization beforehand
CIOs who are used to rigorously planning out every sort of data that would go into an Enterprise Data Warehouse (EDW) can breathe a little easier with big data setups. Here, the rule is collect the data first, and then worry about how you will use it later.
With a data warehouse, you have to lay out the data schema before you can start laying in the data itself. "This basically means you have to know what you are looking for beforehand," said Jack Norris, vice president of marketing for MapR. As a result, "you are flattening the data and losing some of the granularity," he said. "Later on, if you change your mind, or want to do a historical analysis, you've limited yourself."
"You can use a [big data repository] as a dumping ground, and run the analysis on top of it, and discover the relationships later," Norris said. Many organizations may not know what they are looking for until after they've culled the data, so this kind of freedom "is kind of big deal," he said.
No. 5: Big data is not only about Hadoop
When people talk about big data, most times they are referring to the Hadoop data analysis platform. "Hadoop is a hot-button initiative, with budgets and people being assigned to it" in many organizations, Kobielus pointed out. Ultimately, however, you may go with other software.
Recently legal research giant LexusNexus, no slouch at big data analysis itself, open-sourced its own platform for analysis, HPCC Systems. MarkLogic has also outfitted its own database for unstructured data, the MarkLogic Server, for Big Data style jobs as well. Another tool gaining favor is the Splunk search engine, which can be used to search and analysis data generated by machines, such as the log files from a server. "Whatever data you can extract from your logs, there is a good chance that Splunk can help," noted Curt Monash of Monash Research.