As companies continue to embrace big data, more and more sensitive and regulated data is being collected and stored. And of course, all of this data becomes a high-value target for hackers for several reasons:
- With so much data records from various origins being brought together in the data lake, the reward of breaking into it becomes very high, replacing the need to break into numerous individual systems to collect the same data.
- The data lake contains not only raw data, but also enriched and reconciled data, which carry much more potential for malicious users to gain insight into the secrets of individuals or corporations.
- Security technologies applied to big data in general, and Hadoop in particular, have yet to match their counterpart on traditional systems (ERPs, databases, etc.) and it makes it relatively easier for hackers to penetrate the data lake.
Securing big data requires the combination of specific but also very traditional security technologies, but it is also a process and policies issue. One of the top risks is the creation of new silos for application identity -- separating big data security from the rest and soon creating a divergence between systems. The consequence being often runaway admin privileges but also constraints on an organization's abilities to meet compliance or to mitigate risks.
As, like Vs, go in threes
It is key to integrate identity and access management across the full IT infrastructure -- traditional systems and big data alike. Organizations must look into incorporating big data into what is often referred to as the 3 As of security -- Authentication, Authorization, and Accounting:
- Authentication is the way of identifying a user, for example through a user name and password combination.
- Authorization is the process of enforcing policies that determine what activities, resources and services a user is permitted to perform or use.
- Accounting measures the resources a user consumes and identifies deviations from typical behavior -- often a sign that a user's access has been compromised (stolen credentials, Trojan horse program, etc.).
NoSQL, Hadoop, HDFS & co.
Where this all becomes more complicated is because many technologies coexist in the data lake. Historically, Hadoop was simple: an HDFS storage layer and MapReduce processes running over it. Then, Spark became the most popular processing engine. Kudu was introduced to replace HDFS. More NoSQL databases were deployed on top of Hadoop -- or alongside it.
Security standards for Hadoop also started to emerge -- Kerberos for example. But newer big data technologies don't all support the (sometimes self-proclaimed) "standard".
Add to this the fact that many organizations already have security frameworks, enforcing the 3 A's in their traditional systems. The last thing they want or need, is for big data to require a complete overhaul of their security technologies and policies!
There aren't many options: Big data must integrate with existing technology frameworks. Or, another way to view this is that technology frameworks must evolve to integrate with big data technologies. And it's not going to be a simple endeavor to keep up!