How machine learning can be a pathway to compliance

With insights from data science and machine learning, and with compliance under control, organizations can propel themselves with data-driven decisions

artificial intelligence / machine learning / robot reading stack of books
Credit: Thinkstock

In less than a year, the General Data Protection Regulation (GDPR) will go into effect across Europe. This regulation, commissioned by the European Union, will mandate a broad set of data management requirements that dictate how organizations within the EU, as well as those that conduct business in the EU, handle people’s personal information.

Set to take effect May 25, 2018, GDPR changes dramatically how companies must handle and process any data that can be used to identify an individual or face stiff penalties. For example, failure to provide regulators and citizens of a data breach within 72 hours will result in fines of up to 20M € or 4% of the organization’s total annual revenue.

Yet, despite these risks and the looming deadline, many organizations are still not yet prepared. Ovum Research, for example, reported recently that 52% of business executives surveyed think non-compliance to GDPR will result in business fines for their company.

There is, however, a path through this latest regulatory forest. Robust data governance solutions that combine the analytic powers of data science and machine learning can help ease compliance and lead to a deeper understanding of the information. With insights derived from these solutions in hand, and compliance under control, organizations can begin to make data-driven decisions to propel their businesses.

Machine learning algorithms consume greater amounts of data all the time, and support greater variability and complexity in the data. Most importantly, they are more forgiving of changing parameters and data points.

Collaboration is a key component for machine learning. As in any major transition, you will need smart people working together to ensure a successful process, resulting in the right output. Only, in this case, the smart workers will be data scientists, data engineers, IT architects, developers, system administrators, business users, data mining experts, executives, etc.

That’s where solutions like IBM’s Data Science Experience come into play. This online collaborative environment is a destination for data scientists within — and external to — an organization can come to share code, ideas and expertise to build advanced analytic models with machine learning capabilities that fuel the development of smarter applications — that learn as they are used. When this occurs, the more that the applications are used, the faster analytic insights are generated and actions can be taken, all without manual input.

When considering these attributes, the value of applying such capabilities to governance and data protection compliance becomes clear. With data governance, understanding what data is most critical to the business and who has access to it, setting policies and rules, as well as monitoring it all, are natural fits for tackling compliance. And though some organizations, like healthcare, for example, may have a good idea about their critical data, for other organizations and industries, that data may not be as obvious.

By implementing machine learning algorithms as the first step of the data governance process, a company can determine what data makes the biggest impact and is most valuable. Once identified, the data scientists and business analysts can prioritize the classification system ensuring that the most valuable company data reaches compliance first.

Data classification is the next critical step to achieving data governance. Not only does this help with overall governance and GDPR requirements, but it also simplifies the process for internal stakeholders to find and retrieve data. For example, if the CFO of a large technology enterprise is looking for data on the purchase history of its Fortune 500 customers, proper data classification can cut the data mining process from hours to minutes.

Leveraging a machine learning algorithm for this stage of the data governance process easily identifies similar data sets and groups them together for faster data search and retrieval. Taking this a step farther, an algorithm can be created to match the specific GDPR data requirements, making it easier to classify data properly.

Another step to the data governance process that can cause anguish, is matching the data sets to the appropriate internal stakeholder. This aspect in particular will become an important piece to governance as GDPR takes effect. For example, a company’s human resources department will need access to personal employee information, such as previous work experience, home address and medical information, for insurance reasons. Yet, with the new regulation coming down, this data should not be accessible by the CEO or marketing department for privacy reasons.

Machine learning systems can be put to work to ensure the correct people within an organization have access to the information they need to complete their jobs. This not only allows for GDPR compliance, but it means that the necessary people are connected to the right data with the proper context so they can do their jobs better.

For many companies across the world, the GDPR will challenge organizations’ existing data management processes and cultures. However, with a solid approach to data governance that is fueled by data science and machine learning, this time-consuming, anxiety-filled procedure can lead to not only more manageable compliance, but analytic insights to drive strategic decision-making.

This article is published as part of the IDG Contributor Network. Want to Join?