But the U.S. intelligence agency needed some security of its own, so it developed a NoSQL data store called Accumulo with built-in policy enforcement mechanisms that strictly limit who can see its data.
[ InfoWorld presents the Bossies 2013, the best open source software for data centers, clouds, mobile, and more. | Get the latest insight on the tech news that matters from InfoWorld's Tech Watch blog. ]
At the O'Reilly Strata-Hadoop World conference this week in New York, one of the former National Security Agency developers behind the software, Adam Fuchs, explained how Accumulo works and how it could be used in fields other than intelligence gathering. The agency contributed the software's source code to the Apache Software Foundation in 2011.
"Every single application that we built at the NSA has some concept of multilevel security," said Fuchs, who is now the chief technology officer of Sqrrl, which offers a commercial edition of the software.
The NSA started building Accumulo in 2008. Much like Facebook did with its Cassandra database around the same time, the NSA used the Google Big Table architecture as a starting point.
In the parlance of NoSQL databases, Accumulo is a simple key/value data store, built on a shared-nothing architecture that allows for easy expansion to thousands of nodes able to hold petabytes worth of data. It features a flexible schema that allows new columns to be quickly added, and comes with some advanced data analysis features as well.
Accumulo's killer feature, however, is its "data-centric security," Fuchs said. When data is entered into Accumulo, it must be accompanied with tags specifying who is allowed to see that material. Each row of data has a cell specifying the roles within an organization that can access the data, which can map back to specific organizational security policies.
It adheres to the RBAC (role-based access control) model. This approach allowed the NSA to categorize data into its multiple levels of classification -- confidential, secret, top secret -- as well as who in an organization could access the data, based on their official role within the organization. The database is accompanied by a policy engine that decides who can see what data.