Hubba hubba: Get ready for data hubs

Data hubs promise real-time connections between data updates and application workflow events

Data hubs — also called data repositories or master data records — are the next evolutionary step in solving the problem of data integration.

The idea stems from the fact that traditional ways of parsing data are no longer competitive. In the old days, for example, customer data mainly consisted of a record of what each customer bought and when. Today, the thinking goes that a company needs to know about not just purchases, but about every interaction it may have had with a customer.

Companies want to know if a customer called the help hotline, or abandoned an online purchase after eight seconds, or inquired about the interest rate on the credit card they are using to pay off the purchase. All of this data is relevant and necessary in order to prevent customers from taking their dollars elsewhere.

The traditional solution is to aggregate data from the various sources into a data warehouse for analysis. That works fine with static, duplicated data from the various systems — SAP ERP, Siebel CRM, and so forth — pulled into some kind of OLAP cube and analyzed. However, a data hub has a different goal — and this is what the cynics have to keep in mind before they dismiss the concept.

The value of a hub lies not in the reporting. If all we have is cleansed data sitting in a master repository, there really isn’t much difference between a data hub and a data warehouse. What we are talking about here, though, is a system whereby changes in that master data record trigger workflow events in the application that sits above it.

This dynamic, real-time connection between data and business processes is something new, and IBM and Siebel are on the right track. Solutions from these companies update data in both directions dynamically.

Siebel calls its technology CDI (customer data integration). It consists of a customer data repository surrounded by a business process integration capability, which Siebel calls UAN (Universal Application Network).

UAN creates actionable events based on data. If a customer logs into the company Web site to change his or her mailing address, that change is detected by CDI through UAN. Instantly, all of the feeder systems — such as Siebel or SAP call centers, for example — receive the update for their own records.

“CDI is always the single source of truth. So when this change occurs, it takes the change and brings it to CDI which will de-dupe, then feed it back into the system,” says Nimish Mehta, group vice president of customer data integration at Siebel Systems.

IBM, on the other hand, has what Leon Katsnelson, program director for information management at IBM Toronto Lab calls a “distributed optimizer.” It leaves the data wherever it is — say, in an SAP ERP application on top of an Oracle database. A query result from that database, however, might actually be joined to the results from other data sources, like an Excel spreadsheet, an Exchange server, or a DB2 database. Changes made to the data in SAP are then reflected in external, linked data sources, Katsnelson says.

This year you’ll be hearing a lot about hubs, repositories, and master records. You’ll also hear all the vendors promise one version of the truth, but that’s only half the story. Make sure the other half includes the promise that changes to data can trigger workflow events in your applications.