Enterprises have always been concerned with data quality and integration. But the interest in improving data and content management is clearly on the rise, as companies are increasingly focusing on unifying their enterprisewide data and on designing architectures to maximize the usefulness and accessibility of that data.
The reasons are at least twofold. First, the costs of error-ridden, inconsistent, and obsolete data are high, in terms of slowing business processes and hindering automation. Second, business leaders are keen to take more information into account -- either structured or unstructured, from both transactional and content systems -- when making decisions, and too much information remains locked away in silos.
For many large companies, a data-centric architecture starts with rationalizing the “master data” -- the identities and attributes of customers, products, employees, and other core reference data -- at the heart of the business. In a global enterprise, customer or product data is typically spread across dozens, even hundreds, of implementations of CRM, ERP, and other systems, often from different vendors.
Each set of data is typically tailored to a specific business need -- engineering, sales, or marketing -- and location. The result, from the top-down view, is a sea of fragmented data that leads inevitably to faulty BI.
The emerging class of master data management solutions from Oracle, SAP, Siebel, and other enterprise application vendors attempts to bring order to this chaos. Oracle’s Enterprise Data Hubs, for example, combine a publish-and-subscribe mechanism, process automation based on configurable rules, and a knowledge base that helps data managers reconcile differences among source systems. Some solutions, such as Siebel’s, throw in business analytics capabilities. But all master data management solutions aim to create a canonical master data set that gets pushed to all kinds of data repositories -- mainframes, transactional systems, data warehouses -- throughout the organization.
The goal is not merely to synchronize data across systems but to improve data quality and to deliver as a service accurate, consistent data to transactional and operational systems. “It isn’t simply a matter of connecting the plumbing between many different data sources,” says Robert Shimp, vice president of technology marketing at Oracle. “There’s a quality function that has to be applied, to clean, dedupe, and reconcile all of this information. You don’t just need data; you need services-based information.”
In addition to mastering the master data, enterprises are also beginning to bridge the gaps between structured and unstructured data sources, as new technologies and techniques -- especially XML, SOAs, and enterprise search -- are making it easier and less expensive to do so. IBM’s WebSphere Information Integrator, for example, can combine SQL-, object-, and content-oriented access methods -- as well as enterprise search techniques -- to perform queries across relational databases, XML stores, mainframes, file servers, content management systems, even e-mail systems.
According to Eric Sall, IBM Software Group’s program director of information integration, the benefits go beyond the obvious operational advantages, such as a user of a CRM application being able to view an open trouble ticket in the customer service system. The pervasive, on-the-fly querying capabilities of enterprise search also extend the capabilities of traditional BI to include real-time data not yet loaded into the data warehouse.