No one likes data integration. It’s painstaking, hard to automate, and hard to measure in terms of business ROI. Yet it’s required for making systems work together, whether as a result of an acquisition, as part of a migration to new tools, or in an effort to consolidate existing assets.
“The first question is always, ‘What database are we going to use as our customer source?’ ” notes John Kolodziejczyk, IT director at Carlson Hotels Worldwide. Rather than keep asking -- and answering -- that question, the hospitality company devised a common data architecture, and a platform for managing it, for all its applications as part of the migration to a service-oriented architecture. Similarly, ball-bearing manufacturer GGB decided it needed a central product information hub to ensure consistent data mapping among its Oracle e-Business Suite and three aging ERP systems, rather than try to maintain a raft of point-to-point connectors, says Matthias Kenngott, IT director at GGB.
Much enterprise data is either locked away in data stores or encapsulated within applications. Traditionally, applications “know” what the data means and what the results of their manipulations mean, in essence creating a consistent data model, at least locally. As modern enterprises mix and match functions across a variety of applications, however, the data models get mixed together as well -- often without the IT developer being aware of it.
“The more you distribute the data, the more likely there will be problems,” says Song Park, director of pricing and availability technology at Starwood Hotels. The result could be what Don DePalma, president of the Common Sense Advisory consultancy, calls “frankendata,” calling into question the accuracy of the results generated by the services and applications.
“There’s always a context to data. Even when a field is blank, different applications impose different assumptions about what that means,” notes Ron Schmelzer, senior analyst at SOA research company ZapThink.
Ultimately, frankendata can make a set of integrated applications or a vast web of services both unreliable and hard to repair. Many relationships must be traversed to understand not only the original data components but how they were transformed along the way. The antidote to frankendata is to provision data needed for multiple applications as a service -- incorporating contextual metadata where needed and reconciling discrepancies among distributed data sources.
The SOA imperative
A twofold advantage of SOA is that creating services that perform oft-used functions reduces redundant development -- and increases agility by making application functionality available across a variety of systems using standardized interfaces and wrappers. The loosely coupled, abstracted nature of SOA has profound implications for the data that the services use, manipulate, and create.
“Do you divvy it up, or do you provide a central service?” asked Starwood Hotels’ Park when the company began its SOA effort. That question led it down a path many enterprises must travel en route to SOA: a services approach to data based on knowing what data means no matter where it comes from. “SOA raises the fact that data is heterogeneous,” Schmelzer says.
As services exchange data, the potential for mismatches and unmapped transformations grows considerably. “SOA propels this problem into the stratosphere,” Common Sense’s DePalma says. “Put together your first three- or four-way data service,” and you’ll quickly discover the pain of data management. Without an initial data-architecture effort, an SOA won’t scale across the enterprise, says Judith Hurwitz, president of Hurwitz Group.
The solution, according to analysts and consultants, is to develop a data services layer that catalogs the correct data to use and exposes its context to other services. This approach decouples the data logic from the business logic and treats data access and manipulation as a separate set of services invoked by the business processes. Without such a scheme, enterprises will find themselves with loosely coupled business processes that rely on tight data dependencies, eliminating SOA’s core benefit of loose coupling.
This effort is a change from past data integration approaches. “We used to solve data integration by imposing controls at critical choke points,” ZapThink’s Schmelzer recalls. “SOA eliminates these choke points, so I now have a data integration problem everywhere. That means every data access point has to be able to transform and manage data,” he says.
“Data integration and process integration are inexorably linked,” says Henry Morris, group vice president of integration systems at IDC. “You need to think of services to manage data. Think about the processes that affect the master data wherever it lives,” he advises.
SOA also raises concurrency issues, notes Nikhil Shah, lead architect at the Kanbay International consultancy. For example, how data changes during the process may affect the results, especially in a composite application, as old data is propagated through the process, or when multiple services access the data at different times. Shah recommends that IT implement monitoring services -- or at least services that notify other services when changes occur -- so that they can determine whether to restart the process or adjust their computations.
Moreover, the more granular the data services, the greater the impact orchestration overhead has on processes, which could slow response time and create synchronization issues, Shah says. He advises IT to model data management requirements before a service can consume that data. Generally speaking, the more transactional the service, the more the specific data manipulation should be hard-coded into the business logic, he says.
Another SOA data issue is the “snowplow effect,” which occurs when services pass on the context about their data manipulations to subsequent services in a composite application, says Ken Rugg, vice president of data management at Progress Software, which provides caching technology for data management in SOA environments.
Publishing those transformations can help later services understand the context of the data they are working with, IDC’s Morris says. But that can also flood the system with very large data files and slow down each service. IT needs to consider carefully how much context is passed through as aggregated parameters versus limiting that metadata and having the service interface look for exceptions, Kanbay’s Shah says.
The return of master data
The rise of SOA has given vendors reason to revisit their tools to simplify data management, for both SOA and non-SOA environments. Many are now promoting MDM (master data management) tools to help ensure that applications or services use only correct, current data in the correct context. “Master data” incorporates not only the data itself but attributes, semantics, and context (metadata) needed to understand its meaning for proper use by a variety of systems. (Some vendors call these systems enterprise information integration, or EII, tools.)
Although not new, the concept was largely relegated to after-the-fact data systems such as data warehouses and business intelligence, notes Bill Swanton, research director at AMR Research. Before SOA, enterprises could largely get away without worrying about master data because most information resided in application suites, where the vendors had at least an implicit, internal master data architecture in place. IT could thus focus just on transmitting processed or raw data between application suites -- by creating connectors -- and allowing the applications to handle most of the contextual issues, he notes.
SOA’s many-to-many architecture no longer allows IT to leave the problem to application vendors and to limited integration conduits. Even non-SOA environments, though, benefit from moving from the one-off approach of creating connectors to a more ration-alized data architecture that makes integration simpler, Swanton says.
Some providers -- including IBM, Informatica, Oracle, and Siperian -- approach the issue as an operational data warehouse, providing one or more data hubs that services access both from stores of cleansed data and from services that generate validated data from other applications as a trusted broker. These emulate the hub-and-spoke architecture common to traditional enterprise environments. Others -- such as BEA Systems, i2 Technologies, and Xcalia -- approach the issue at a more federated level to better mirror the loosely coupled, abstracted nature of an SOA.
Analysts and consultants warn that today’s technology is very immature and at best can help only specific data management processes. “There is no silver bullet,” says Shawn McIntosh, senior manager at consultancy AGSI. For example, Starwood’s Park notes that his IT group is hopeful that IBM’s planned Systems Integration Bus will provide a way to manage the data services in the hotelier’s SOA. “But we can’t wait for the tools to come out,” he says.
Many of the data hubs currently offered are geared to one data subject, such as customer or product information. That’s fine as an initial building block; later, however, IT will have to generalize the hub or work with a federation of specific data hubs, says Satish Krishnaswamy, senior director of MDM business at i2. “We won’t ever get to one single hub, so IT should instead work toward a standard canonical [hierarchical] view” of data across its sources, IDC’s Morris says.
To make the scope manageable, IT organizations generally define the rules and context for one subject area and then extend the system out to other subject areas over a period of time. That’s what Carlson Hotels is doing, starting with the customer-oriented hub IBM acquired from DWL. According to Carlson’s Kolodziejczyk, however, the hospitality company is not yet sure whether it will extend that hub to include product data or use the product hub IBM acquired from Trigo.
Deciding whether to start with a subject-specific system -- such as product information within SCM -- or a generalized system depends on how targeted the integration efforts are to specific application suites. It may make more sense to start with a subject-specific hub if your focus is on interactions with an ERP or SCM system, whereas a generic hub makes more sense if your focus is on an SOA in which services interact with a wide variety of applications.
Building a data architecture
MDM tools can help, but they do little good if the enterprise doesn’t understand its data. “I see a fair amount of hype around the concept of master data management,” says Fred Cummins, a fellow at EDS. Because centralized data stores deal typically with after-the-fact results, not with states and transactions, the more an MDM system looks like a traditional data warehouse or master database, the less likely it meets the needs of a transactional system, whether in a traditional or SOA environment, Cummins says. “It’s unrealistic to expect that there is one master database that everything reads or feeds. Some of the data is transactional,” concurs Paul Hernacki, CTO of consultancy Definition 6.
For an SOA, MDM tools that simply repackage EAI tools are not very helpful, Cummins says. That’s because an SOA should be driven by business processes, whereas EAI typically focuses on connecting applications together without worrying about the underlying data context for each. Even for traditional integration efforts, “you can’t just put in middleware and off you go,” adds Brian Babineau, an information integration analyst at Enterprise Strategy Group.
“Primarily, it’s a design issue,” echoes ZapThink’s Schmelzer. “We have great tools for databases, messaging, transformation, etc.,” to implement the design, he adds. Designing the architecture and the specific services correctly requires that developers understand all the data used and generated by services and the applications they interact with -- a labor-intensive process.
That’s why IT needs a commonly accessible set of data services or at least data mappings. “At some point this will have to be formalized as a repository,” Common Sense’s DePalma says. Critical for an SOA, this approach is also very useful in traditional environments, he adds.
With those mappings created, IT can then focus on building the connectors or services that implement them. IT must understand which mappings should be available to multiple services and applications -- and thus implemented as separate processes -- and which are endemic to specific business logic and should be encapsulated with that business logic, consultant Hurwitz says.