A conversation with IBM's Mr. Big Data

Rod Smith, IBM's vice president of emerging Internet technologies, tells InfoWorld about IBM's exploits in Big Data -- this year's hottest trend

1 2 3 4 Page 2
Page 2 of 4

Knorr: And was this using MapReduce techniques?

Smith: Yes, this was all Hadoop-based. But then we heard back from customers, like the BBC. The first try of giving an application to an end-user wasn't very good. But they gave us lots of feedback, such as, "Here's what we had in mind. Think about it maybe as a spreadsheet more. We know spreadsheets. How close could you get to doing that?" That's the type of iteration you go through with customers when you're not part of a product team, because if they don't like it they're very forward with it and lay it out. And that's what they want because they know that time is valuable, they appreciate our talking to them, they appreciate the kind of insights we bring, but if it's not going to work, they don't want us to lose time.

Knorr: So how would you put Big Data in context with the larger sphere of business intelligence? Obviously it's unstructured versus structured SQL data, but what about in terms of applications? To me, it seems there's a higher failure rate in business intelligence projects than in other IT projects.

Smith: I'll give you some interesting facts. IBM put out a CIO study two years ago, and business intelligence was the No. 1 thing CIOs cared about. Virtualization second, mobile third. Now, you'd think the CIO would make every effort to get his data into the hands of the BI experts. The opposite was true. We asked them, "Do you make your data easily available for the lines of business?" It was like 12 percent. So the poor guy at the top of the business who says, "I should be able to use BI to get good answers," doesn't realize the IT department is not really helping them. So the poor BI folks are trying to extract enough information, but in many cases it's kind of on a shoestring.

That's No. 1. I think the part that you're seeing around Big Data is you'd like to ... discover repeatable business patterns. Then you want to go to BI or Cognos and say, "Ah, now how do I model this?" And that up-front preparation, without the new technologies we're talking about, can mean a long cycle. You have to sit with the customer and figure out what they're doing and figure out the data, and it takes a lot of time. So I think what Big Data starts to offer us is a discovery dimension. I'd like to be able to discover patterns and actionable insights, so I can now turn to my experts in business analytics and business intelligence and say, "I can now describe this better to you. I can tell you which data is going to be important. I can tell you how I'm thinking about how you should figure your models or build your models around this."

Knorr: So you see it as an exploratory tool.

Smith: I think it's an exploratory tool that we haven't had before that adds to this whole area of business analytics/business intelligence. It's been very labor-intensive up front, and now we're able, with technologies like BigInsights, to answer the question of how do we help people sift through data where maybe 90 percent of it is not very useful?

Knorr: Well, and you don't have to worry about the usual cleansing and coherency and all that other stuff because by nature it's dirty, it's all over the place.

Smith: And in many cases, when data isn't clean, it tells you more stories about itself. We did a proof of concept with a customer -- and then figured out how we could do it live with real data when the iPhone 4 came out. So we went to Twitter, and over 36 hours we collected 375,000 tweets. And then we used a sediment analysis to go through and find only those people who were interested in certain phones and whether they were interested in buying or purchasing -- you look for certain words for that.

And then we did a tag cloud. You look and say, "OK, which phones were more popular?"

So what happens with dirty data? Well, in the tag cloud you saw Android -- and Android misspelled. But if you had cleaned the data, you might have removed that entry and the weight factor wouldn't have been the same. But now that you see it, you can go back and correct it and get a more realistic view. Traditional data folks say cleansing is important, and I agree, but I don't want to cleanse out the context. And context is going to be the shell around this that allows you then to say, "Yes, now I know how I want to work!" -- and prep the data for other types of intelligence.

1 2 3 4 Page 2
Page 2 of 4