NoSQL standouts: The best key-value databases compared

Aerospike, Hazelcast, Memcached, Microsoft Azure Cosmos DB, and Redis put different twists on fast and simple data storage

NoSQL standouts: The best key-value databases compared

Most applications need some form of persistence—a way to store the data outside the application for safekeeping. The most basic way is to write data to the file system, but that can quickly become a slow and unwieldy way to solve the problem. A full-blown database provides a powerful way to index and retrieve data, but it may also be overkill. Sometimes, all you need is a quick way to take a freeform piece of information, associate it with a label, stash it somewhere, and pull it back out again in a jiffy.

Enter the key-value store. It’s essentially a NoSQL database, but one with a highly specific purpose and a deliberately constrained design. Its job is to let you take data (a value), apply a label to it (a key), and store it either in-memory or in some storage system that’s optimized for fast retrieval. Applications use key-value databases for everything from caching objects to sharing commonly used data among application nodes.

Many relational databases can function as key-value stores, but that’s a little like using a tractor-trailer to go on grocery runs. It works, but it’s dramatically inefficient, and there are far lighter ways to solve the problem. A key-value store, like other NoSQL databases, provides just enough infrastructure for simple value storage and retrieval, integrates more directly with applications that use it, and scales in a more granular way with the application workload.

Key-value NoSQL database features compared

Five widely used products (including one cloud service) are worth your consideration; they are explicitly billed as key-value databases or offer key-value storage as a central feature. Their basic differences:

  • Hazelcast and Memcached tend toward minimalism, and don’t even bother to back up the data on disk.
  • Aerospike, Cosmos DB, and Redis are fuller-featured, but still revolve around the key-value metaphor.

Aerospike key-value NoSQL database in depth

If Redis is Memcached on steroids, Aerospike could be said to be Redis on steroids. Like Redis, Aerospike is a key-value store that can operate as a persistent database or a data cache. Aerospike is designed to be easy to cluster and easy to scale, to better support enterprise workloads.

Features unique to Aerospike

Much in Aerospike echoes both other key-value stores and other NoSQL databases. Data is stored and retrieved via keys, and the data can be kept in a number of fundamental data types, including 64-bit integers, strings, double-precision floats, and raw binary data serialized from a number of common programming languages.

Aerospike also can store data in complex types—lists of values, collections of key-value pairs called maps, and geospatial data in the GeoJSON format. Aerospike can perform native processing on geospatial data—such as to determine which locations stored in the database are closest to each other by just performing a query—making it an attractive option for developers of applications that rely on location.

Data stored in Aerospike can be organized into several hierarchical containers. Some NoSQL systems are document-oriented, meaning data is encapsulated in some kind of object, typically JSON. With Aerospike, containers are roughly like documents, but with functions and behaviors specific to Aerospike. Each kind of container lets you set different behavioral properties on the data inside it.

For example, the topmost level of containers, namespaces, determines whether the data is stored on disk, in RAM, or both; whether the data is replicated in the cluster or across clusters; and when or how data is expired or evicted. Through namespaces, Aerospike lets developers keep the most frequently accessed data in memory for the fastest possible response.

How Aerospike handles storage and clustering

Aerospike can keep its data on almost any file system, but it has been written specifically to take advantage of SSDs. That said, don’t expect to drop Aerospike on any old SSD and expect good results. Aerospike’s developers maintain a list of approved SSD devices, and they have created a tool called ACT to rate the performance of SSD storage devices under Aerospike workloads.

Aerospike, like most NoSQL systems, uses a shared-nothing architecture for the sake of replication and clustering. Aerospike has no master nodes and no manual sharding. Every node is identical. Data is randomly distributed across the nodes and automatically rebalanced to keep bottlenecks from forming. If you want, you can set rules for how aggressively data is rebalanced. You can configure multiple clusters, running in different network segments or even different datacenters, to synchronize with one another.

Scripting in Aerospike

Like Redis, Aerospike allows developers to write Lua scripts, or UDFs (user-defined functions), that run inside the Aerospike engine. You can use UDFs to read or alter records, but it’s best to use them to perform high-speed, read-only, map-reduce operations across collections, or “streams,” of records on multiple nodes.

Where to get Aerospike

Aerospike’s community edition can be downloaded directly from Aerospike’s website. This includes server editions for Linux, desktop versions for Apple’s MacOS and Microsoft’s Windows, cloud editions for Amazon EC2, Azure, and Google Compute Engine, and Docker containers. The enterprise edition of Aerospike is available via Aerospike’s Quick Start program, which provides an unlimited 90-day trial version.

The source code is available on GitHub.

Hazelcast IMDG key-value NoSQL database in depth

Hazelcast comes billed as an “in-memory data grid,” essentially a way to pool RAM and CPU resources across multiple machines to allow data sets to be distributed across those machines and manipulated in-memory.

NoSQL databases offer key-value, graph, or document features. Hazelcast concentrates on key-value functionality, emphasizing speedy access to distributed data. According to its makers it can also be used as  an alternative to products like Pivotal Gemfire, Software Terracotta, and Oracle Coherence.

Hazelcast can be run as a distributed service or be embedded directly inside a Java application. Clients are available for Java, Scala, .Net, C/C++, Python, and Node.js, and one for Go is in the works.

Features unique to Hazelcast

Hazelcast is built with Java and has a Java-centric ecosystem. Each node in a Hazelcast cluster runs an instance of Hazelcast’s core library, IMDG, on the JVM. How Hazelcast works with data is also closely mapped to Java’s language structures. Java’s Map interface, for example, is used by Hazelcast to provide key-value storage. As with Memcached, nothing is written to disk; everything is kept in-memory at all times.

One benefit Hazelcast can provide in a distributed environment is “near cache,” where commonly requested objects are migrated to the server making the requests. This way, the requests can be performed directly in-memory on the same system, without requiring a round trip across the network.

Aside from key-value pairs, you can store and distribute many other kinds of data structures through Hazelcast. Some are simple implementations of Java objects, like Map. Others are specific to Hazelcast. MultiMap, for example, is a variant on key-value storage that can store multiple values under the same key. These features make it possible to emulate some behaviors of other NoSQL systems, such as organizing data into documents, but the empasis is on structures that allow data to be distributed and accessed quickly.

How Hazelcast handles clustering

Hazelcast clusters have no master/slave setup; everything is peer-to-peer. Data is automatically sharded and distributed across all members of the cluster. You can also designate certain cluster members as “lite,” which hold no data at first but can later be promoted to full members. This lets some nodes be used strictly for computation, or to distribute data gradually through a cluster while it’s being brought online.

Hazelcast can also ensure that operations proceed only if at least a certain number of nodes are online. However, you have to configure this behavior manually, and it works only for certain data structures. As of Hazelcast Version 3.9, you can reconfigure data structures across a cluster without having to first take it offline.

Where to get Hazelcast

Hazelcast is available for download directly from the Hazelcast site. It is typically deployed as a collection of Java .JAR files. Docker images are also available at the official Docker registry.

You can download the enterprise edition of Hazelcast directly from Hazelcast. You can also get a 30-day free trial key for Hazelcast.

Memcached key-value NoSQL database in depth

Memcached is about as basic and fast as key-value storage gets. Originally written as an acceleration layer for the blogging platform LiveJournal, Memcached has since become a ubiquitous component of web technology stacks. If you have many small fragments of data that can be associated with a simple key and don’t need to be replicated between cache instances, Memcached is the right tool.

Features unique to Memcached

Memcached is most commonly used for caching queries from a database and keeping the results exclusively in memory. In that respect, it’s unlike many other NoSQL databases, key-value or otherwise, since they store data in some persistent form. 

Memcached does not back its data store to anything. All keys are held only in memory, so they evaporate whenever the Memcached instance or the server hosting it is reset. Thus, Memcached can’t really be used as a substitute for a NoSQL database.

What it can be used for, though, is a high-speed way to stash commonly used data that might take orders of magnitude more time to query from a source.

Any data that can be serialized to a binary stream can be stashed in Memcached. Values can be set to expire after a certain length of time, or on-demand, by referencing the keys to the values from an application. The amount of memory you devote to any given instance of Memcached is entirely up to you, and multiple servers can run Memcached side by side to spread out the load. Furthermore, Memcached scales linearly with the number of cores available in a system because it is a multithreaded application.

Most popular programming languages have client libraries for Memcached. For example, libmemcached allows C and C++ programs to work directly with Memcached instances. It also lets Memcached be embedded in C programs.

How Memcached handles clustering

Even though you can run multiple instances of Memcached, whether on the same server or on multiple nodes across a network, there is no automatic federation or synchronization of data among instances. The data inserted into a Memcached instance is available only from that instance, period.

Where to get Memcached

Memcached’s source code is available for download from GitHub and from the official Memcached site. Linux binaries are available in the repositories for most Linux distributions. Windows users can build it directly from source; some unofficial binaries have been built in the past but do not appear to be reliably available.

Microsoft Azure Cosmos DB key-value NoSQL database in depth

Most databases have one overarching paradigm: document store, key-value store, wide column store, graph database, and so on. Not so Azure Cosmos DB. Derived from Microsoft’s NoSQL database as a service, DocumentDB, Cosmos DB is Microsoft’s attempt to create a single database that can use multiple paradigms.

Features unique to Azure Cosmos DB

Cosmos DB uses what’s called an atom-record-sequence storage system to support different data models. Atoms are primitive types such as strings, integers, and Boolean values. Records are collections of atoms, like structs in C. Sequences are arrays of either atoms or records.

Cosmos DB uses these building blocks to replicate the behavior of multiple database types. It can reproduce the behavior of tables found in conventional relational databases. But it can also reproduce the functionality of data types found in NoSQL systems—schemaless JSON documents (DocumentDB and MongoDB) and graphs (Gremlin, Apache TinkerPop).

Table storage is how Cosmos DB provides its key-value functionality. When you query a table, you use a set of keys—a partition key and a row key—to retrieve data. You can think of partition keys as bucket or table references, while row keys are used to retrieve the row with the data. The row can have multiple data values, but there’s nothing that says you can’t create a table with only one type of data stored in any particular row. You can retrieve data via .Net code or REST API call.

How Azure Cosmos DB handles replication and clustering

1 2 Page 1
Page 1 of 2