Knorr: Well, is that its only role for Big Data, as the front end of a longer, more conventional business intelligence process?
Smith: I don't think so. For example, there's the idea that you can ... take more than one data source and put it together very quickly. If the database has to join, that's a lot of work. For us it's putting it into a distributed file system; then we know how to go through and read it at that point. So there's combining data, there's sifting through the data, working with a very broad array, depending on where the customer wants to go. So it will build up the analytics more interpreted -- rather than fixed, which costs a lot to change.
Knorr: The scenarios you always hear about are Yahoo and Facebook. Clickstream data -- terabytes of it. Never having seen these applications, I'm thinking that maybe there's some sort of visualization that emerges from a Hadoop process that you can just use in and of itself and say, "We'd better change our stupid security stuff" on Facebook or whatever.
Smith: I'll give you an example. You have to put data in the context of business patterns. So a person who is a chief legal officer does mergers and acquisitions. One such person asked us, "Could you read in all the patent office information?" So if you wanted to do that in a database, you could, but you'd have a lot of stuff to manage and keep around as opposed to what we actually did, which was read it in. Gee, then look just for the patents, and maybe one percent is that company's patents. We ranked them according to how many people were referencing them. And the customer says, "That's interesting. That gives some weight value to it."
Only a couple of patents turned out to be referenced more than once. And then the customer says, "You know what? Could you pull in Federal 9th District Court information so I could see if anything is being litigated around these?" But it's much more of an "ah ha!" I mean, "If you can do that, then can you go get that other piece of information? I'd like to interpret more information." So it's much more of a collaboration with a line of business to determine what they're after, because they don't know until they see something.
Knorr: It sounds like a different mindset.
Smith: It's very different. What most folks have been used to is you ask IT to do it, they go off for a long period of time, and they come back with it. And not to be unfair to them -- they're trying to think about it in the context of other IT applications -- but the end result might have no value to me at all. I'd like to know quickly if it's going to be good or bad, not have you build it like it is going to be good, and I look like an idiot because it's not very good.
Knorr: Put Big Data in the context of IBM's business intelligence acquisitions. You've got Cognos, you've got Netezza, you've got SPSS -- there have been a bunch of acquisitions in the past several years.
Smith: That's for sure. I think we'll see Big Data as a resource for all those types of solutions. It depends on where the customer is in their business cycle, if you will.
Let's say a customer is in the discovery phase. Then things like BigInsights that we're doing helps them with that. But then they find that it's a good repeatable pattern; they'd like to export that data they sifted down into Cognos or into SPSS. And SPSS and Cognos can do more of the modeling around those things at that point.
As for Netezza, you can think of it as the appliance that you can put a BigInsights on and really crank up the processing. It's really solving business problems that you wouldn't have thought were traditional analytics or intelligence. And I think that's the part that we like: How can we help you look at data early on and get some insights on how to change your business? And then how do we help you with Cognos or SPSS or other things to work through the different stages of that?
Knorr: And when you say, "How do we help you?," do you anticipate using this as a tool in, say, consulting engagements or applications?
Smith: I think it'll be services, as well as what we're going to do from a software standpoint. But I think both of them are important topics.