Word broke last week of 40,000 instances of MongoDB that were found to be almost completely unsecured, among them a database for a French telecom with millions of customer records.
It's easy to point fingers, but it's better to see this as an indicator of the security issues posed by NoSQL systems. As more data moves into and through NoSQL systems and more of those systems are deployed with a public-facing component (or even when they're not), the different facets of security in a NoSQL environment are coming under greater scrutiny. It isn't a simple picture.
Here are three key ways NoSQL security problems are likely to manifest, along with some of the solutions first- and third-party vendors are pursuing to keep those problems at bay.
Dataflow protection/data governance
One persistent problem with NoSQL systems like Hadoop is the management of data moving through the system. Hadoop didn't have a data governance model when it was first created, but since becoming a major enterprise presence that's trusted with tons of data, it certainly needs one. A consortium, the DGI (Data Governance Initiative), led by Hortonworks is being assembled to figure out how to create the model (a tough task) and get enterprises to adopt it (a tougher task).
One thing MongoDB already has in this vein, as of version 2.6, is auditing capabilities to track changes. With dataflow in Hadoop, though, that's likely to be implemented via two projects already under way, Apache Ranger and Apache Falcon. Other projects in this vein involve allowing data in Hadoop to be controlled by existing enterprise data-governance tools, but so far the effort is still very prototypical. In one regard, that's good news: Anyone worried about the future of data governance in their Hadoop environment ought to look up the DGI and keep abreast of what existing tooling and technology it will support.
There are multiple facets to using encryption to protect data at rest inside a NoSQL system as well. With MongoDB, for instance, encryption isn't (yet) built into the product. However, various Ruby gems are available to perform field encryption, such as mongoid-encrypied-fields, and the mongoose-encryption package for NPM provides encryption for individual fields or entire documents.
Version 2.6 of Hadoop added at-rest encryption by way of encryption zones, which prevent the data from being moved around. Hadoop prefers to do this by adding encryption at the HDFS level. Thus, encryption is transparent to applications, but fine-grained controls can be added later on -- which will become essential as the DGI advances.
In some ways, this is the most straightforward part of the picture: Using existing authentication systems like Active Directory to enforce access to a NoSQL system from the outside. With MongoDB, you can use LDAP. With Hadoop, though, the picture is more complex because Hadoop by itself is remarkably trusting and doesn't have security mechanisms turned on by default. You can configure Hadoop to use a Kerberos server, then set up a one-way trust relationship between the Kerberos server and your AD repository. Again, that's not enabled by default.
As Hadoop grows in popularity, its trusting-by-default setting may prove to be the wrong choice. It's more likely that the third-party vendors of Hadoop, rather than the underlying Hadoop projects, will be the ones who do something about it.
[This article was edited to add notes about MongoDB's auditing functionality.]