Proper data governance practices postulate that every data point collected, every record stored, should have an identifiable origin and a stated purpose. Back in the days of the mainframe and of client-server, enforcing these practices wasn’t always easy, but it was at least a plausible goal for a data governance team. If proper design documentation of loading interfaces and adequate metadata management did not exist, reverse engineering data movement programs usually provided reasonably reliable data lineage to trace the origin of data. As far as purpose goes, this was typically enforced by the hierarchical dependency between applications and data -- the now-outdated perspective that data belonged to an application.
The main impact of knowing where certain data comes from, and why it is being stored, is that it sets parameters as to who should be able to access this data and what can be done with it.
A bank customer, for example, may provide explicit agreement to receive electronic communications regarding transactions but opt-out (or not opt-in) of promotional emails, therefore his email address should not be available to marketing, and the purpose of such address should be stated as such in the bank’s systems of record. Agreement can also be implicit however, and this is where ethics comes into play. The same bank customer would typically not be surprised that his branch manager is able to view his monthly salary deposit, but may not appreciate a phone call from a credit card rep at the same bank, who would explicitly refer to his income. Does this mean the bank should refrain from pitching credit cards to highest-income customers? This is where we enter ethics land, in which nothing is all black or all white...
In the days of big data, data lakes, advanced analytics, and cloud, it has become a lot easier to lose sight of the origin and purpose of data that one can technically access. Without getting into considerations around security and intrusion, many business analysts today have “legitimate” access to vast pools (lakes?) of data -- a lot more data than in the “old days”. By “legitimate”, I mean that this access was not gained by malice or fraud. Which does not mean that it is proper, or ethical, for these analysts to have this access. And because big data technology development cycles have initially focused on building a fast, scalable, reliable platform, many of the supporting components have been left on the sidelines. A lot of focus has been placed on security lately (and rightfully so), but metadata management, lineage, permission management -- basically, the foundation for governance -- are nowhere close to where they should be.
As a result, the burden is carried by the user of data to make an informed use of the new resources placed at his disposal. In the past, either one had access, and was therefore explicitly permitted to use data within specified guidelines, or one did not have access. Today, we should assume any access may be by default and not by design, guidelines may not exist or may be vague, and users are forced to make judgment calls. What’s scary is that some of these judgment calls may place the company in the headlines, or send someone to jail.
This article is published as part of the IDG Contributor Network. Want to Join?