Oracle's take on the distributed key-value data store is fast, flexible, and enterprise-grade serious
For the last few years, the world of NoSQL databases has been filled with exciting new projects, ambitious claims, and plenty of chest beating. The hypesters said the new NoSQL software packages offered tremendous performance gains by tossing away all of the structure and paranoid triple-checking that database creators had lovingly added over the years. Reliability? It's overrated, said the new programmers who didn't run serious business applications for Wall Street banks but trafficked in trivial, forgettable data about people's lives. Tabular structure? It's too hidebound and limiting. If we ignore these things, our databases will be free and insanely fast.
Alas, just as the summer of love ended and reality set in, the boundary-free experimentation with NoSQL databases is slowly being brought down to earth. Oracle, the developer of top-notch, bulletproof SQL databases, has arrived at the hippie fest with a solid, practical, and very Oracle-like NoSQL server. While the crazy dreamers can continue to craft NoSQL data stores, serious people will want to take a look at Oracle's version. It offers many of the features that make NoSQL fun but also the solid performance promises that tend to come from big, serious teams of engineers. NoSQL pioneers will want to tell themselves that imitation is the sincerest form of flattery.
[ Also on InfoWorld: NoSQL standouts: New databases for new applications | 7 ways to do big data right using the cloud | Get the latest insight on the tech news that matters from InfoWorld's Tech Watch blog. ]
The arrival of this product might be a surprise to NoSQL fans who have listened to old-school DBAs talk with pride about Oracle databases, but Oracle has been slowly moving down this path for some time. Five years ago, the company bought Sleepycat Software, the creators of the open source Berkeley DB, a tool with a long and rich tradition of flexible, key-value storage for C and lately Java programmers. This same Berkeley DB technology is said to be at the core of Oracle NoSQL Database, although it seems to be a complete rewrite.
Oracle NoSQL: Practically ACID
The fun part of Oracle NoSQL is the key-value structure. You don't need to define a schema or lock yourself into a big tabular architecture. You just create keys and attach a bag of bits to them. You might link your key to a string or an image file or anything. The database accepts the bytes and doesn't think much about the contents.
Oracle breaks up the key into major and minor parts. You can think of the major part as the object pointer and the minor part as the fields in the record. So you might put a name and Social Security number into the major parts of the key and other strings like the street address and ZIP code into the minor parts. It's comparable to the way that some other NoSQL tools let you think of the value in the pair as being an object with multiple fields. Oracle just uses the term "minor key" for the names of the fields.
The serious part of Oracle NoSQL is a practical approximation of ACID compliance, the standard that SQL databases like to offer. ACID means "Atomic, Consistent, Isolated, Durable transactions," and there's a robust debate about just what this translates to in excruciating detail. Most NoSQL systems promise a different acronym, BASE, which stands for "Basically Available, Soft State, and Eventually Consistent." In other words, you'll probably get the right answer except when you don't.
There will be plenty of debate about whether Oracle NoSQL offers real ACID compliance. The promises aren't as all-encompassing as they are with SQL databases. You only get an ACID promise when you write data attached to the same major part of the key. For example, you could change the address and ZIP code of the same person and get an ACID guarantee because both parts are stored under the same major key. But you get no guarantee that changes to two separate people will remain consistent. In other words, a bank could use Oracle NoSQL to store personnel records, but not to safely transfer cash between accounts because there's no ACID guarantee that the money won't get lost along the way.
Oracle NoSQL is able to make this promise because it guarantees that one master machine will hold all of the minor keys associated with a major key. Attach any collection of fields to a major key defining a person, and all of this data will end up in the same node in the cluster. But the data from different major keys could end up on different machines, and Oracle NoSQL doesn't have a mechanism to ensure that the data will be written to both simultaneously.
You can also add replication and sharding, which Oracle calls "partitioning." In essence, you arrange the nodes in a rectangle where the sharding occurs along one axis and the replication occurs across the other. If you want more reliability and faster reads, you add more machines along the replication axis. If you want less contention, you add more machines along the partitioning axis. Oracle NoSQL handles most of this configuration for you.
Again, this structure stores data with Oracle-grade seriousness. If you don't want the slacker-grade promise of eventual consistency offered by so many other NoSQL stores, Oracle NoSQL will deliver absolute consistency across all of the machines replicating a node. You'll pay for this in write performance, of course, but it's your choice.
This is more than a binary decision, by the way. You can tell Oracle NoSQL to sign off on the write after one, all, or a simple majority of the nodes are finished sending the data to disk. The documentation calls this feature a durability policy.
Some of this flexibility is available to you, the programmer, if you have the time to worry about it. All of the key-value pairs come with a version number, which you can watch yourself if you want to play your own games with replication. This can be helpful if you're trying to goose performance when modifying records.
Eventual consistency: The great debate
It's worth noting that an intriguing debate about eventual consistency broke out on the blog of Daniel Abadi, a professor of computer science at Yale. He pointed out that in some situations a new pair written to the master could get lost if the master gets cut off from the replicas that will go off and elect a new master that knows nothing of the pair. A different spin came from Margo Seltzer, a professor of computer science at football rival Harvard, as well as an Oracle employee. She joined with the acquisition of Sleepycat, which she helped found.
Seltzer argues that it all depends upon what you mean by "eventual consistency." The database owners choose to take their chances with the durability policy. If the owners want to make sure that a pair never gets lost, they need to ask all writes to wait until all replicas get updated. The debate hinges largely on what "eventual consistency" requires, and it won't be as easily decided as the annual football game.
To test the speed of Oracle NoSQL, I concocted a low-end test that put more stress on the database engine than on the networking. I started up the single-node NoSQL server, then stuffed in 358,400 keys attached to values that were strings with about 30 characters in them. This ran in about 119 seconds on an old, underpowered Mac. Using an older machine with a small amount of RAM is one way to test performance under limited resources.
As a comparison, I stuffed the same pairs into a new version of Voldemort, an open source Java-based NoSQL server from LinkedIn that doesn't offer ACID promises. It took 180 seconds on the same machine.
I was happy with this admittedly simple test because storing data in Oracle NoSQL seems to involve a bit of overhead. Creating the keys requires building arrays of strings, and object instantiation is often the bottleneck for Java code. It didn't seem to matter in these tests.
In all, Oracle NoSQL was a pleasure to try because it offered so many serious features developed by a company with a deep history of serious data management. There are dozens of small ways in which the tool is more thorough and sophisticated than the simpler NoSQL projects. You get a number of different options for increasing the durability in the face of a node crash or trading that durability for speed. The documentation is solid and written by working engineers with deep experience in storing data for enterprise customers.
Oracle NoSQL might not offer the heady fun and "just build it" experimentation of many of the pure open source NoSQL projects, but that's not really its role. Oracle borrowed the best ideas from these groups and built something that will deliver good performance to the sweet spot of the enterprise market.
There is one way, though, that Oracle NoSQL Database departs from Oracle's long tradition. I've always found it difficult and occasionally impossible to install Oracle's main database and get it running. The open source community, by contrast, has always done a better job of smoothing this process. Some say the most important thing MySQL did right was testing and retesting the installation until it was bulletproof and simple.
The Oracle NoSQL Database clearly came from a development team with experience in open source tradition. The only installation headache I had went away when I changed
127.0.0.1. That's quite an improvement. I would trade SQL joins for simpler installation any day.
This article, "First look: Oracle NoSQL Database," was originally published at InfoWorld.com. Follow the latest developments in data management and cloud computing at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.
Microsoft buried a Get Windows 10 ad generator inside this month's Internet Explorer security patch for...
Hot or not? From the Web to the motherboard to the training ground, get the scoop on what's in and...
Microsoft’s 'Fall Update' promised to put the finishing touches on Windows 10 -- it doesn’t
Sponsored by Intel
This baker's dozen of lean and mean Node.js frameworks can help streamline your development of fast...
In a simple batch processing test, Google Cloud Dataflow beat Apache Spark by a factor of two or more,...
The Core Infrastructure Initiative's Best Practices Badge program will help businesses identify which...