At last week's SVForum Big Data Event, you could have been forgiven for thinking that it would be all Hadoop, all the time. Even though Hadoop was featured prominently, it was by no means the only topic.
It's no secret that Hadoop and big data are, well, big. The explosion of new data sources such as social media and website clickstreams offers new types of information and, potentially, the opportunity to generate new insights. Hadoop has been front and center in this trend, offering the MapReduce method across distributed data collections to make it possible to analyze far larger amounts of data than traditional approaches can manage.
However, it's obvious that Hadoop, despite its obvious benefits, is not a business intelligence magic bullet. A number of speakers noted the shortcomings of its batch nature; it can take hours to get return results that, once examined, are clearly not what was desired, thus setting off another batch submission.
A number of other speakers promoted real-time BI, which appears to be associated with mining sentiment streams like Twitter and Facebook. This sort of real-time analysis is undoubtedly valuable, but it's unlikely to be the main source of insight for most companies going forward. It's more likely to be an adjunct to other forms of insight generation, and perhaps will be most valuable as offering interesting areas to examine.
Big data investments will be open source
I moderated a panel discussing the investment opportunities in big data. While investors on these kind of panels are always cagey -- after all, they're not likely to trumpet an area they're about to invest in, are they? -- my takeaway from their remarks indicated something very important about the big data space, and, more importantly, the nature of IT infrastructure innovation for the future.
All three panelists agreed that the new infrastructure offerings in big data were not going to be venture-backed proprietary products but, rather, will be open source-licensed, shared development products. This is in part because it is so expensive to bring a proprietary infrastructure product to market-$200 million was mentioned as the right level of funding-and in part because, today, innovation is dispersed, making it difficult to create a fundable entity that can capture and contain this kind of innovation.
Where will big data proprietary investment make sense? The panelists concurred that verticals that leverage big data will be areas of fruitful outcomes, and that they will be offered as SaaS. The only question is whether these verticals will build out their own computing infrastructure or leverage Amazon Web Services.