How Aerospike achieves low latency and strong consistency across multiple sites

Aerospike Database 5’s multi-site clustering eliminates the trade-off between data consistency and high performance for large-scale, always-on, globally distributed transaction systems

How Aerospike achieves low latency and strong consistency across multiple sites

In today’s global digital economy, organizations need to have applications that are always on and that perform in real time. Applications such as digital payment systems, real-time inventory tracking, and online gaming rely on resilient systems with fast access to data centers distributed across the world. For applications like these, it is unacceptable to compromise data consistency for any transaction, whether the data is stored in a private cloud, a public cloud, or any combination of both.

But operating a cluster across geographically distributed data centers or cloud regions introduces high costs, data inconsistencies, and limited resiliency. To overcome these obstacles, Aerospike has developed a multi-site clustering feature in Aerospike Database 5 that enables enterprises to operate a single database cluster across multiple locations without risking data loss or restricting data availability.

Multi-site clustering provides an active-active data architecture

An active-active data architecture spans multiple regions and services application requests at all locations. Each location is “active.” Data records are replicated across regions so that reads may be processed at any location. In some architectures, writes of a given data record are handled only at a single master location; other architectures allow such writes to occur at multiple locations. Each approach has its challenges involving availability, consistency, and performance.

In the past, organizations made trade-offs between data consistency and high performance. Aerospike Database 5 with multi-site clustering eliminates these trade-offs. Multi-site clustering combines strong consistency with support for globally distributed transactional applications that can relax the write latency, which varies based on the distance between sites of a cluster, while still delivering sub-millisecond read latency at high throughput.

How Aerospike multi-site clusters operate

In Figure 1 below, a single Aerospike cluster is arranged in the form of three racks distributed across three sites. The sites could be a data center, a cloud region, or even different cloud regions such as Amazon Web Services, Google Cloud, or Microsoft Azure. Applications identify this geographically distributed environment as a single system, and read/write requests are handled seamlessly. For optimal performance, reads process locally while writes route to remote locations if needed.

aerospike 01 Aerospike

Figure 1: Aerospike multi-site clustering – three replicas, three sites.

Rack awareness is an important capability that allows Aerospike clusters to deploy across distant data centers or cloud regions. In a multi-site cluster, Aerospike’s rack awareness feature enables replicas of data records grouped in data partitions to be stored on different racks. Through data replication factor settings, each rack can be configured to store a full copy of all data to maximize data availability and local read performance.

In Figure 1, a replication factor of 3 instructs Aerospike to maintain copies of all data in each rack. Only one node in one rack of the cluster maintains a master copy of a given data partition at any time; other racks have nodes that store replicas of this partition. Aerospike synchronizes the master copy with the replicas on different racks/nodes.

Aerospike maintains a roster to keep track of this information. In Figure 1, the roster master copy is on Node 3 of Rack 2, and the replicas are on Node 1 of Rack 1 and Node 2 of Rack 3. This cluster will preserve strong consistency, avoid data loss, and preserve availability on single-site failures.

How Aerospike multi-site clusters recover from failure

Natural disasters, power outages, hardware failures, and network failures can cause one or more components of a multi-region cluster to become inaccessible. Resiliency is a critical requirement of any multi-region operational database.

In Figure 2 below, a network failure has caused Rack 3 to become disconnected from Racks 1 and 2 to create a split-brain scenario, which is when some portions of the system are not able to communicate with others. Rack 3 is still up, with all three nodes forming a sub-cluster. In this case, Racks 1 and 2 easily discover that Rack 3 is out and form a cluster with six nodes. This becomes the majority sub-cluster and has complete availability since it has two copies of data within the sub-cluster. A third copy is automatically created on every write as the system proceeds to make transactions.

aerospike 02 Aerospike

Figure 2: Aerospike multi-site clustering – site disconnected.

Every transaction that was committed in Rack 3 is also committed in Rack 1 and Rack 2, and only then will the transactions go forward. Local apps on Rack 1 and Rack 2 continue to work fine. The local apps on Rack 3 will become unavailable. Using the strong consistency algorithm of Aerospike, Rack 3 can determine from a combination of the roster, and the fact that it can talk to Racks 1 and 2, that it is a minority sub-cluster and is unavailable for application reads and writes. And when Rack 3 comes back or gets reconnected to the other two racks, the extra copies of data that have been created in Racks 1 and 2 for writes that have happened will be merged back into Rack 3 so it can start taking over its portion of the load. All of this happens with no operator intervention, preserving strong consistency with no data loss and complete availability during the split-brain event.

Meeting the demands of the always-on global economy

The always-on nature of today’s global digital economy demands database systems that operate without disruption or risk of data loss. Aerospike’s multi-site clustering capability allows organizations to deploy a single cluster across multiple locations with 24/7 availability and strong consistency. New types of applications involving globally distributed transactions are now possible to implement.

Srini Srinivasan is founder and chief product officer at Aerospike, a leader in next-generation, real-time NoSQL data solutions. He has two decades of experience designing, developing, and operating high-scale infrastructures. He also has more than 30 patents in database, web, mobile, and distributed systems technologies. He co-founded Aerospike to solve the scaling problems he experienced with internet and mobile systems while he was senior director of engineering at Yahoo.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to

Copyright © 2020 IDG Communications, Inc.