No one likes data integration. It’s painstaking, hard to automate, and hard to measure in terms of business ROI. Yet it’s required for making systems work together, whether as a result of an acquisition, as part of a migration to new tools, or in an effort to consolidate existing assets.
“The first question is always, ‘What database are we going to use as our customer source?’ ” notes John Kolodziejczyk, IT director at Carlson Hotels Worldwide. Rather than keep asking -- and answering -- that question, the hospitality company devised a common data architecture, and a platform for managing it, for all its applications as part of the migration to a service-oriented architecture. Similarly, ball-bearing manufacturer GGB decided it needed a central product information hub to ensure consistent data mapping among its Oracle e-Business Suite and three aging ERP systems, rather than try to maintain a raft of point-to-point connectors, says Matthias Kenngott, IT director at GGB.
Much enterprise data is either locked away in data stores or encapsulated within applications. Traditionally, applications “know” what the data means and what the results of their manipulations mean, in essence creating a consistent data model, at least locally. As modern enterprises mix and match functions across a variety of applications, however, the data models get mixed together as well -- often without the IT developer being aware of it.
“The more you distribute the data, the more likely there will be problems,” says Song Park, director of pricing and availability technology at Starwood Hotels. The result could be what Don DePalma, president of the Common Sense Advisory consultancy, calls “frankendata,” calling into question the accuracy of the results generated by the services and applications.
“There’s always a context to data. Even when a field is blank, different applications impose different assumptions about what that means,” notes Ron Schmelzer, senior analyst at SOA research company ZapThink.
Ultimately, frankendata can make a set of integrated applications or a vast web of services both unreliable and hard to repair. Many relationships must be traversed to understand not only the original data components but how they were transformed along the way. The antidote to frankendata is to provision data needed for multiple applications as a service -- incorporating contextual metadata where needed and reconciling discrepancies among distributed data sources.
The SOA imperative
A twofold advantage of SOA is that creating services that perform oft-used functions reduces redundant development -- and increases agility by making application functionality available across a variety of systems using standardized interfaces and wrappers. The loosely coupled, abstracted nature of SOA has profound implications for the data that the services use, manipulate, and create.
“Do you divvy it up, or do you provide a central service?” asked Starwood Hotels’ Park when the company began its SOA effort. That question led it down a path many enterprises must travel en route to SOA: a services approach to data based on knowing what data means no matter where it comes from. “SOA raises the fact that data is heterogeneous,” Schmelzer says.
As services exchange data, the potential for mismatches and unmapped transformations grows considerably. “SOA propels this problem into the stratosphere,” Common Sense’s DePalma says. “Put together your first three- or four-way data service,” and you’ll quickly discover the pain of data management. Without an initial data-architecture effort, an SOA won’t scale across the enterprise, says Judith Hurwitz, president of Hurwitz Group.
The solution, according to analysts and consultants, is to develop a data services layer that catalogs the correct data to use and exposes its context to other services. This approach decouples the data logic from the business logic and treats data access and manipulation as a separate set of services invoked by the business processes. Without such a scheme, enterprises will find themselves with loosely coupled business processes that rely on tight data dependencies, eliminating SOA’s core benefit of loose coupling.
This effort is a change from past data integration approaches. “We used to solve data integration by imposing controls at critical choke points,” ZapThink’s Schmelzer recalls. “SOA eliminates these choke points, so I now have a data integration problem everywhere. That means every data access point has to be able to transform and manage data,” he says.
“Data integration and process integration are inexorably linked,” says Henry Morris, group vice president of integration systems at IDC. “You need to think of services to manage data. Think about the processes that affect the master data wherever it lives,” he advises.
SOA also raises concurrency issues, notes Nikhil Shah, lead architect at the Kanbay International consultancy. For example, how data changes during the process may affect the results, especially in a composite application, as old data is propagated through the process, or when multiple services access the data at different times. Shah recommends that IT implement monitoring services -- or at least services that notify other services when changes occur -- so that they can determine whether to restart the process or adjust their computations.
Moreover, the more granular the data services, the greater the impact orchestration overhead has on processes, which could slow response time and create synchronization issues, Shah says. He advises IT to model data management requirements before a service can consume that data. Generally speaking, the more transactional the service, the more the specific data manipulation should be hard-coded into the business logic, he says.
Another SOA data issue is the “snowplow effect,” which occurs when services pass on the context about their data manipulations to subsequent services in a composite application, says Ken Rugg, vice president of data management at Progress Software, which provides caching technology for data management in SOA environments.
Publishing those transformations can help later services understand the context of the data they are working with, IDC’s Morris says. But that can also flood the system with very large data files and slow down each service. IT needs to consider carefully how much context is passed through as aggregated parameters versus limiting that metadata and having the service interface look for exceptions, Kanbay’s Shah says.
The return of master data
The rise of SOA has given vendors reason to revisit their tools to simplify data management, for both SOA and non-SOA environments. Many are now promoting MDM (master data management) tools to help ensure that applications or services use only correct, current data in the correct context. “Master data” incorporates not only the data itself but attributes, semantics, and context (metadata) needed to understand its meaning for proper use by a variety of systems. (Some vendors call these systems enterprise information integration, or EII, tools.)
Although not new, the concept was largely relegated to after-the-fact data systems such as data warehouses and business intelligence, notes Bill Swanton, research director at AMR Research. Before SOA, enterprises could largely get away without worrying about master data because most information resided in application suites, where the vendors had at least an implicit, internal master data architecture in place. IT could thus focus just on transmitting processed or raw data between application suites -- by creating connectors -- and allowing the applications to handle most of the contextual issues, he notes.
SOA’s many-to-many architecture no longer allows IT to leave the problem to application vendors and to limited integration conduits. Even non-SOA environments, though, benefit from moving from the one-off approach of creating connectors to a more ration-alized data architecture that makes integration simpler, Swanton says.
Some providers -- including IBM, Informatica, Oracle, and Siperian -- approach the issue as an operational data warehouse, providing one or more data hubs that services access both from stores of cleansed data and from services that generate validated data from other applications as a trusted broker. These emulate the hub-and-spoke architecture common to traditional enterprise environments. Others -- such as BEA Systems, i2 Technologies, and Xcalia -- approach the issue at a more federated level to better mirror the loosely coupled, abstracted nature of an SOA.
Analysts and consultants warn that today’s technology is very immature and at best can help only specific data management processes. “There is no silver bullet,” says Shawn McIntosh, senior manager at consultancy AGSI. For example, Starwood’s Park notes that his IT group is hopeful that IBM’s planned Systems Integration Bus will provide a way to manage the data services in the hotelier’s SOA. “But we can’t wait for the tools to come out,” he says.
Many of the data hubs currently offered are geared to one data subject, such as customer or product information. That’s fine as an initial building block; later, however, IT will have to generalize the hub or work with a federation of specific data hubs, says Satish Krishnaswamy, senior director of MDM business at i2. “We won’t ever get to one single hub, so IT should instead work toward a standard canonical [hierarchical] view” of data across its sources, IDC’s Morris says.
To make the scope manageable, IT organizations generally define the rules and context for one subject area and then extend the system out to other subject areas over a period of time. That’s what Carlson Hotels is doing, starting with the customer-oriented hub IBM acquired from DWL. According to Carlson’s Kolodziejczyk, however, the hospitality company is not yet sure whether it will extend that hub to include product data or use the product hub IBM acquired from Trigo.
Deciding whether to start with a subject-specific system -- such as product information within SCM -- or a generalized system depends on how targeted the integration efforts are to specific application suites. It may make more sense to start with a subject-specific hub if your focus is on interactions with an ERP or SCM system, whereas a generic hub makes more sense if your focus is on an SOA in which services interact with a wide variety of applications.
Building a data architecture
MDM tools can help, but they do little good if the enterprise doesn’t understand its data. “I see a fair amount of hype around the concept of master data management,” says Fred Cummins, a fellow at EDS. Because centralized data stores deal typically with after-the-fact results, not with states and transactions, the more an MDM system looks like a traditional data warehouse or master database, the less likely it meets the needs of a transactional system, whether in a traditional or SOA environment, Cummins says. “It’s unrealistic to expect that there is one master database that everything reads or feeds. Some of the data is transactional,” concurs Paul Hernacki, CTO of consultancy Definition 6.
For an SOA, MDM tools that simply repackage EAI tools are not very helpful, Cummins says. That’s because an SOA should be driven by business processes, whereas EAI typically focuses on connecting applications together without worrying about the underlying data context for each. Even for traditional integration efforts, “you can’t just put in middleware and off you go,” adds Brian Babineau, an information integration analyst at Enterprise Strategy Group.
“Primarily, it’s a design issue,” echoes ZapThink’s Schmelzer. “We have great tools for databases, messaging, transformation, etc.,” to implement the design, he adds. Designing the architecture and the specific services correctly requires that developers understand all the data used and generated by services and the applications they interact with -- a labor-intensive process.
That’s why IT needs a commonly accessible set of data services or at least data mappings. “At some point this will have to be formalized as a repository,” Common Sense’s DePalma says. Critical for an SOA, this approach is also very useful in traditional environments, he adds.
With those mappings created, IT can then focus on building the connectors or services that implement them. IT must understand which mappings should be available to multiple services and applications -- and thus implemented as separate processes -- and which are endemic to specific business logic and should be encapsulated with that business logic, consultant Hurwitz says.
Many enterprises have avoided such data architecture efforts because there’s no obvious ROI, notes Common Sense’s DePalma. Some remember earlier-generation efforts such as custom data dictionary creation, which also involved understanding the organization’s data architecture; by the time they were completed, they were already outdated. Fortunately, IT can approach the data understanding incrementally, creating the rules and metadata around the information used for specific applications’ or services’ needs, says Marcia Kaufman, partner at Hurwitz Group. Over time, the enterprise will build up a complete data architecture. “It’s a long-term journey,” says Hurwitz.
That data architecture will typically include multiple data models, each oriented to a specific type of subject or process, notes Paul Patrick, chief architect at BEA Systems. That actually helps IT by allowing the data architecture to be developed in stages, plus it limits the mapping required between data models. (A unified data model must account for all possible mappings, whereas a federated model does not.)
Furthermore, IT should concentrate on dealing with exceptions, says ZapThink’s Schmelzer. For example, IT should develop services that check for data that are out of normal bounds, rather than try to develop an enterprisewide ontology that maps out every possible state or relationship, he says.
Ultimately, the enterprise should build up layers of data services in which master data is distributed, says William McKnight, senior vice president of data warehousing at consultancy Conversion Services International, although the infrastructure and tools to deliver on this goal aren’t yet mature.
Roll up your sleeves
Provisioning data sources as services across an organization is a monster undertaking. For a traditional integration effort, it means understanding the context within each application and how data is transformed for delivery to other apps. For an SOA, it requires understanding the multiple relationships and dependencies data can have with various business processes. “There are so many variables here,” notes Common Sense’s DePalma.
Analysts and consultants agree that this complexity requires both an upfront investment in modeling data architecture and an ongoing effort to systematically think through data dependencies and context. Discovering the data models and relationships among your systems to create the mappings is about 70 percent of the effort in an SOA’s data architecture, says IDC’s Morris. At GGB, IT director Kenngott said the modeling and discovery effort was about 30 percent of the data-integration effort within its ERP consolidation project.
That initial push is well worth it, argues Starwood’s Park. “Otherwise, you can get pretty far along with your project and discover that you have 10 fields that you don’t need, 10 that you do but didn’t know when you designed the service, and five that are different than you thought. When you have a complex system with hundreds of services, these interfaces have to be nailed down.”
In most organizations, the tough slog of codifying interfaces and reconciling distributed data models is long overdue. But today, with the majority of large organizations pushing ahead with some sort of SOA initiative, the natural inclination to avoid this ugliest of hairballs can no longer be sustained. “The problem is too big to sweep under the rug any more,” Conversion Service’s McKnight says.