Why you should jump into big data

At a low cost of entry, the emerging technologies that define the big data trend are already delivering value, so first consider the problems you need to solve -- then dive in

1 2 3 Page 2
Page 2 of 3

Problem No. 2: I can't get what I want from BI
Business intelligence always seems to rank among the top few technology priorities for big companies. Yet year after year, few seem very happy with the results.

It all boils down to the questions you want to ask. If you have queries related to, say, the regional distribution of your transactions or trends in the costs of your materials -- or if you want to make some predictions about how all that may play out next year -- conventional business intelligence and analytics systems probably remain your best bet.

But if you want to ask something like, "How are millions of customers using my Web applications and how might I improve them?," you're better served by a solution built around an engine to handle semistructured data such as Hadoop. In fact, Hadoop is mentioned in the same breath with big data so often you'd think the two were interchangeable.

Hadoop was purpose-built to process very large quantities of semistructured data, such as the clickstream of events left by Web users. InfoWorld's Andrew Lampitt has explored some great examples of this, including the use of Hadoop by Facebook, Experian, and Evernote. Hadoop has two components: HDFS (Hadoop Distributed File System), which provides cheap scale-out storage; and MapReduce, the data processing layer that provides a framework for developing analytics applications.

But it's important to note what the CEO of Cloudera, Mike Olson, told InfoWorld recently: "Nobody stands up Hadoop by itself. It's usually next to a relational database and maybe in service of a document system." Hadoop tends to sit on the back end of big data solutions, delivering results to other databases or applications. For example, at Evernote, Hadoop connects to a ParAccel MPP analytics system, with JasperReports ultimately delivering insights to Evernote staffers.

While anyone can download and play with a Hadoop distribution, be aware that you're ultimately going to need to grow or acquire experts to get Hadoop to do the tricks you want it to do. This is a moving target, thanks to a parade of new SQL querying interfaces, prebuilt applications, and Hadoop distributions preintegrated with conventional RDBMS offerings. Soon a lot more people will be able to query vast stores of semistructured data to get the answers they need.

Problem No. 3: Help! I can't move fast enough!
In the good old days, databases were a lot easier to spec out. If you were a big enterprise scoping out a new order entry system, you probably had a solid idea of how many people would use it, when the peak demand would be, and how frequently (or infrequently) the data model would change.

That was before the "agile" days of the Web. Now companies experiment with all kinds of new applications, many of them public-facing Web apps. Some wither quickly because no one finds them compelling; others may explode in popularity and turn the database into a bottleneck overnight. Moreover, shifts in customer needs, brainstorms for new enhancements, and so on demand a fluid data model.

1 2 3 Page 2
Page 2 of 3