The now-trendy concept of Big Data usually implies ever-growing hordes of data, including unstructured info posted on Facebook and Twitter, and ways of gleaning intelligence from all of it to create business opportunities. The concept, however, also carries with it risks for anyone opening up about themselves on the Internet and raises questions about who exactly owns all this data.
Big Data is associated with technologies such as the Apache Hadoop distributed computing platform and is prompting some technology companies, including IBM, to make major acquisitions. But the term "Big Data," claims GigaOm analyst Derrick Harris, is a bit of a misnomer; it's really about data from different sources, including social networks and even cell phones. "It's coming from sensors, it's coming from computers, it's coming from the Web," he says.
[ Read InfoWorld's primer "The big promise of Big Data." | See how IBM views Big Data in Eric Knorr's interview "A conversation with IBM's Mr. Big Data." | Learn about the emerging BI and Big Data trends in depth with InfoWorld's interactive iGuide. ]
The strong interest by both IT and business units in Big Data is "about being able to harness it, and it's about being able to do something with it" -- in essence, analyzing it, says Harris. "The great thing about Big Data is we accumulate this amount of information and we have systems in place where we can use that for good," such as analyzing human genome information or making government data available, says Mozilla developer evangelist Christian Heilmann. Business analysts can study large data sets by renting servers for an hour, using technology such as Hadoop, he says.
Given that growing interest, it's no surprise that vendors are starting to make moves to take advantage of Big Data. Harris cites IBM's recent $1.7 billion acquisition of Netezza, which offered data warehouse appliances. Meanwhile, Teradata is buying data warehousing startup Aster Data Systems, which offered advanced analytics and management of unstructured data.
Mining the social networks' Big Data
Companies such as Echo and Cloudera are seeking their niche in the Big Data and social network data spaces. "The Big Data play right now for these big multi-million-dollar companies is around activity data," says Chris Saad, vice president of strategy at Echo. Both enterprise IT and individual users are sure to see a growing menu of Big Data services available as data gathering grows in prominence.
For example, serving ventures such as media companies and ad agencies, Echo StreamServer pulls in social media data relevant to a client into a single stream. Echo, which cites companies such as Reuters as customers, captures data about clients on sites such as Twitter and Facebook, as well as from the client's own sites. The clients can then create real-time experiences out of the data, Saad says. Clients get a "big unified data set" to develop applications such as forums and live blogging.
Cloudera offers its own distribution of Hadoop that serves as a platform for data management, and its Cloudera Enterprise provides large-scale data storage and analysis. Amr Awadallah, Cloudera's CTO, says the Hadoop distribution enables organizations to collect and combine social data and store it in a centralized data store. Users can then run MapReduce jobs to analyze this data for insight and factors such as new relationships.