Big data security is a big mess

No one questions that the Hadoop/Spark ecosystem can yield business-changing insights. Yet few seem willing to face up to the sorry state of big data security

Big data security is a big mess
Credit: Thinkstock

Given the pace at which big data software is released, coupled with the sheer volume of data under management, the big data market is ripe for massive security breaches. It’s only a matter of time.

In fact, as a Gartner survey last year uncovered, very few companies have taken security seriously for essential infrastructure like Hadoop. At that time, a mere 2 percent of respondents cited Hadoop security as a significant concern, causing Gartner analyst Merv Adrian to exclaim, “The nearly non-existent response to the security issue is shocking.”

CIOs, in other words, may be willing to close their eyes and pray for big data security, but until they make it a priority, such “prayers” are vain.

What, me worry?

For years enterprises have taken a somewhat blase approach to security in big data infrastructure such as Hadoop, despite the size of big data leading to “origins [that] are not consistently monitored and tracked.” In early 2014, Adrian, noting a lack of interest in Hadoop security, queried, “Can it be that people believe Hadoop is secure? Because it certainly is not. At every layer of the stack, vulnerabilities exist, and at the level of the data itself there numerous concerns.”

A year later, Adrian’s colleague, Nick Heudecker, lamented, “Less than 5 percent of Hadoop inquiries covered by the Info Mgmt team in 2014 discussed security. This has to change in 2015."

It didn’t -- not much, anyway. For example, one security engineer, Ray Burgemeestre, suggested that more and more people are asking, “After enabling all security settings in Hadoop/Spark, how would I know my cluster is actually secure?” The answer, he acknowledged, is “not completely satisfying,” insisting that “more work needs to be done in the Hadoop community to raise its security profile.”

Another interested participant in Hadoop security, Bolke de Bruin, Head of Research & Development for ING bank, indicates that while the Hadoop community is increasingly aware of the need to protect data confidentiality within Hadoop clusters, it continues to give limited attention to data integrity (“maintaining and assuring the accuracy and completeness of data over its entire lifecycle”). He goes on to note that even the security native to Hadoop often doesn’t get implemented due to “perceived complexity” or is purposefully ignored because things like Apache Ranger are “slapped on security” that are “usable, but barely.”

Worried yet? Hadoop is the godfather of big data infrastructure, with the most time and attention paid to it over the past few years. If it can’t muster sufficient security, despite petabytes of sensitive data pouring into its clusters, then we have a very serious security problem across the board.

Who has time?

The problem is time ... or, rather, the lack thereof.

As MobileIron highlights in a recent report on mobile security, “[W]ith any software, the longer it is in market, the more likely it is that vulnerabilities will be identified.” This should be particularly true of open source software, which offers the ability to dig into source code before or (more likely) after vulnerabilities emerge.

The big data infrastructure market, however, doesn’t sit still long enough for these vulnerabilities to be found. Indeed, in a December 2015 Gartner report, the authors advise enterprise buyers: “Don't base Hadoop assessment on analysis or trials more than a year old; existing pieces are maturing and new ones are emerging at a rapid pace.”

While that “rapid pace” may sound great (innovation ftw!), it’s also ripe for security problems, as mentioned. As Adrian warns, “We will see major problems as Hadoop goes mainstream.” And not only Hadoop: as enterprises build on Hadoop, Spark, Kafka, and a host of other exceptional, fast-moving data infrastructure, “[W]e are building skyscraper favelas in code -- in earthquake zones,” as Zeynep Tufekci has detailed.

In response, we are already seeing the Hadoop vendors like Cloudera and Hortonworks seek to differentiate themselves based on security. I suspect we’ll see this enterprise-grade security come with an enterprise-grade price tag, but it will be worth it.

To comment on this article and other InfoWorld content, visit InfoWorld's LinkedIn page, Facebook page and Twitter stream.
From CIO: 8 Free Online Courses to Grow Your Tech Skills
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.