Data quality can be thought of as the degree to which data reflects accurate, timely, valid, and consistent information in relation to the portrayal of the actual scenario. Put simply, does a customer named Robert Smith really live at 123 Main Street with telephone number 555-1212, bank account X, and preference Y?
The extent to which data can be trusted has often set apart successful projects from failed ones. Inaccurate data creates missed opportunities, customer satisfaction issues, and a host of other problems, ranging from the mundane to the problematic. But never before have the stakes been so high for data quality as they will be in 2018, as the Global Data Protection Regulation (GDPR) comes into effect in Europe—and likely spills over into the US.
GDPR
Any business with a single customer, employee, or other party living in the EU will be required to explain what data they have; locate it; correct it; explain where they got it; and, if requested, delete it—or face a potentially huge fine. In so doing, data quality moves from a “nice to have” marketing component to a “keep us out of jail” business requirement.
Myriad capabilities make up data quality
A number of capabilities around good data management are often abridged as “data quality.” The overall requirement is referred to as master data management (MDM), which should include governance, profiling, matching, enrichment, integration, and workflows. Tightly related to these are data cataloging, metadata management, and hierarchy management. Software vendors have been carving up this space into niches for years, vying for invested capital and new clients all over the world.
Today’s GDPR challenge collapses all of those various disciplines into a central requirement to master data for any party (a generic term for customer, supplier, patient, organization, etc.) in a single, cohesive package. A complete record—in this case the master record of an individual’s information—needs to have all of those data disciplines applied to it in unison, and managed by anyone with rights and privileges to correct, update, or delete that information.
Shouldn’t it be easy?
This may sound simple, but have you ever thought deeply about how difficult it would be for an organization with which you interact—your bank, for example—to tell you exactly what they know about you, where they got it, why they have it, and correct or delete it at your request? Would you know whom to call to get the data? Would that person know where to look for it? The truth is, data about you sits in myriad systems, some of which may not even be owned by the bank, just accessed by them, with copies and duplicates stashed away in systems from here to any cloud computing environment.
To get this right, the various data disciplines can be grouped under MDM and brought together. Your contact information is integrated from a source system, which has been profiled for the type of data it holds. The record is then matched and merged with duplicates for completeness, and perhaps enriched with additional information, such as address fixes and geocoding information. The record would be subject to a data governance process, into which it would be modified via workflows through individuals or systems at the bank with permission to curate that data. That record is cataloged in a system, and the type of data about what data is in that system (i.e., metadata) needs to be managed and organized into a meaningful hierarchy so it can be understood. For example, your account is in your state, under your branch, within your account type.
Given that consultancies, software solutions, and data analysts have sought to break down these disciplines into chunks they can consider themselves expert in, it’s small wonder that the components around data quality have devolved into projects that take years and often fail to recognize full value.
Going forward
What is needed today is a return to the basic tenets of MDM, where the myriad of other subcapabilities is thought of as part of the whole. Managing the data from the many source systems which hold it, while understanding where it came from (data lineage), and auditing who has changed it are an absolute requirement. Curating that information, with enrichment and a collaborative approach to data governance, will empower organizations to treat it as GDPR requires, while at the same time offering new heights of potential customer service.
To the extent an organization can get back to the basics of mastering their data, they will not only meet the new requirements imposed by GDPR, but treat that data with the respect it deserves. If “data is the new oil,” that must be treated as an asset, then data quality is the tenets by which it can be handled property for the benefit of all parties involved.