Sizzling SQL databases

Review: ClustrixDB scales out — way out

Clustered relational database outperforms Amazon Aurora for low-latency, high-transaction-rate scenarios

At a Glance
  • ClustrixDB 7.5

Editor's Choice

As I was working on my review of DeepSQL I received an inquiry from another company claiming to have a scale-out relational database that can handle large-scale (“Cyber Monday shopping traffic”), write-heavy loads, while maintaining ACID compliance and scaling linearly with little to no latency. In other words, this company, Clustrix, had another database more scalable than Amazon Aurora for some scenarios.

My response: “There seems to be a lot of that going around.” I must be getting cynical in my old age. Nevertheless, ClustrixDB turned out to be real, a MySQL-compatible transactional database that can be scaled out “hot” -- that is, without database downtime. ClustrixDB and DeepSQL take different approaches, but both can outperform Aurora for some use cases -- for example, ClustrixDB for e-commerce, and DeepSQL for bioinformatics.

How ClustrixDB works

How does ClustrixDB perform its magic? As you might guess from the name, it’s a clustered database solution. You need a minimum of three peered nodes and a load balancer to cluster ClustrixDB, then you can add nodes as you like.

Each node has a query compiler, a data map, a database engine, and a slice of the data, typically on fast SSDs. The data map is replicated on all nodes, so when a query comes into one node, the query compiler can send out compiled query fragments to where the data resides or, more specifically, to the ranking replica for that data. A rebalancer makes sure that data slices are properly spread out across nodes and every slice exists on at least two nodes. When you add a node, the rebalancer spreads out the slices to make use of the new resource; when you drop a node, the rebalancer adds replicas to any data slices that need them. The diagram below provides an overview of the system.

clustrix diagram

ClustrixDB queries multiple database slices using a distributed query planner and compiler, along with a distributed shared-nothing execution engine.

You could usefully imagine three or more database nodes behind a load balancer, looking a lot like a web server farm. Because of the need to autoslice and rebalance the data in the background, as well as perform distributed queries, you can imagine double-ended arrows between each pair of nodes representing low-latency data flow paths.

Perhaps surprising, the idea here is the distributed query analyzer sends the code to the data, instead of bringing the data to the code. As we’ll see, it works well.

ClustrixDB advantages and disadvantages

Clustrix claims several benefits for its database. First, the shared-nothing architecture eliminates the potential scaling bottlenecks caused by shared-disk or even shared-cache architectures. That isn’t controversial. Software developers -- especially database developers -- have been struggling with contention issues in highly parallel servers for decades.

Second, Clustrix claims the rebalancer ensures optimal data distribution across all nodes. That’s likely to come out in the wash when you do a serious benchmark. Also, Clustrix has some patent intellectual property in it, for its data distribution and slicing.

Third, Clustrix claims the query optimizer achieves maximum parallelism on individual queries and maximum concurrency on simultaneous queries. Again, that’ll come out in the wash when you do the transaction processing benchmark. On a related point, the evaluation model parallelizes queries and sends the fragments to the node that holds the data. For consistency and concurrency control, readers get lock-free snapshot isolation and writers get two-phase locking.

ClustrixDB does have several incompatibilities with MySQL. The two biggest, in my mind, are the lack of spatial extension types and full-text search. There are many important use cases for each. Because DeepSQL is essentially a distribution of MySQL with the Deep engine added, you can work around its lack of geographic and full-text search logic on a table-by-table basis with the InnoDB engine. ClustrixDB contains no MySQL code at all, so you can’t work around these omissions within a database. Instead, you would have to put the tables that need geographic and full-text searches in a separate database, probably plain MySQL, outside the cluster.

ClustrixDB administration

As we see in the figure below, the Clustrix Health screen shows you a current snapshot of the cluster’s performance along with historical performance charts.

clustrix health

You manage ClustrixDB from a web control panel. Here we are looking at the general overview of a cluster’s health.

Beyond the snapshot, the queries, graphs, and compare screens allow you to dive deeper into the performance. Here we see a deep dive into two points in time on the Compare screen, giving such details as which queries were running and how many rows were read.

What’s the benefit of comparing points in time? Suppose the database was struggling on Monday morning but fine Tuesday morning. You’d want to dive in and figure out what was different on Monday so that you could avoid problems the following Monday. You might guess it was higher utilization, but that’s not the same as knowing from the data that there were a bunch of reports running Monday, along with an above-average number of users running standard queries from the employee web app.

clustrix compare

You can compare the state of the Clustrix database at two points in time and drill down into the actual queries running at each point.

The Flex screen shown below is the place where you manage a cluster. You can install the Clustrix software on potential nodes, add nodes to the cluster, and remove nodes from the cluster. None of these operations incurs database downtime.

You can add and remove nodes at will using Flex, as long as you maintain a quorum. If a node dies, the health monitor notices and the rebalancer kicks in. If you add a node, the rebalancer kicks in. Again, all of that happens without database downtime.

clustrix flex

Flex is the feature that lets you add and remove nodes from a cluster. Three nodes is the minimum for a quorum. When you add a new node, the rebalancer will start taking advantage of the extra resources.

When you remove a node from the cluster, the software basically marks it as “soft-failed,” and the rebalancer starts to replicate slices onto “healthy” nodes as needed. Once all the database slices have sufficient replicas, the node can actually be dropped.

ClustrixDB performance benchmarks

As I’ve emphasized every time I’ve reported on benchmarks, they are merely an attempt to simulate and measure what might happen in real life. The benchmarks that Amazon did on Aurora to compare it to RDS for MySQL (and that I reproduced) used Sysbench and emphasized separate read-only and write-only loads. Those were good tests to show the raw transaction rate improvement possible with Aurora, but they did not attempt to model realistic loads for any particular scenario, nor did they impose a limit on the latency.

ClustrixDB has been designed and tuned for the transactional loads of high-volume e-commerce sites. Such sites impose a limit on the latency of each transaction, as the tendency of potential customers to leave a slow site has been well documented. If you restrict the database’s contribution to the latency to 20ms, you come to different conclusions about the maximum available transaction rate than you would if you allowed 30ms or 40ms latency.

In addition, according to Clustrix, the typical mix of reads to writes in the database of an e-commerce site is about 90:10. Depending on how locking and indexing have been implemented in the database, throwing in those 10 percent writes could create severe bottlenecks and slowdowns in the worst case or have almost no performance impact in the best case.

Accordingly, Clustrix worked with an independent lab to run a series of OLTP benchmarks on Amazon RDS for MySQL, Amazon Aurora, and ClustrixDB running on Amazon, as shown in the figure below. It’s worth noting that the ClustrixDB nodes ran in smaller VMs (c3.2xlarge) than the MySQL and Aurora databases (db.r3.8xlarge). This offsets the need to run ClustrixDB in at least three nodes and reflects the fact that Clustrix nodes don’t need (and can’t use) as much RAM as MySQL and Aurora.

clustrix sysbench

This figure summarizes the Sysbench results for Clustrix, Aurora, and MySQL running on Amazon. Measurements that show lower latency and higher throughput are better, but typically require more nodes.

I reviewed Clustrix’s benchmark numbers, scripts, and methodology. I did not try to reproduce the work of the independent lab, although Clustrix offered to make a cluster available to me for that purpose.

Note that the benchmarks I worked on with DeepSQL primarily explored heavily indexed data ingestion (iiBench) and analytics (TPC-H) performance, while these benchmarks primarily explored transactional (TPC-C) performance. Trying to directly compare those DeepSQL apples to these ClustrixDB oranges wouldn’t be too enlightening.

Comparing costs

It is useful to compare costs between ClustrixDB and Aurora. I was a little surprised by two important factors. First, you don’t need to pay for IOs and RDS storage when you use ClustrixDB on AWS. All storage is local, so all IOs are also local. By contrast, IOs and RDS storage are significant parts of the bill for running an Aurora database.

Second, if you have a seasonal load, you can save a lot of money running ClustrixDB in the cloud and using Flex to scale up only in the “high season,” which for an e-commerce site might be November through January. In one cost model that Clustrix ran at my behest, the costs of running ClustrixDB looked like the following:

The net savings in this model for using Flex in the cloud is about $88,000. That’s nothing to sneeze at. A good purchasing agent could probably get an additional volume discount from Clustrix to reduce the licensing fees, although Clustrix understandably won’t tell me exactly how much of a discount you might expect.

Note that Amazon Aurora currently can’t do 40,000 transactions per second (tps) with 15ms latency if you mix reads and writes. Aurora can scale out for reads, but that doesn’t necessarily get you where you want to be for an e-commerce scenario.

Overall, ClustrixDB is another, more scalable alternative to Amazon Aurora for people who need extremely high transactional performance. Of course, as I always say, the benchmarks don’t tell the whole story. You need to try it yourself on your site or your other loads.

Clustrix offers a free supported trial of ClustrixDB to qualified parties. The download is available as a Linux install or an AMI; you can apply for the trial at the Clustrix website.

InfoWorld Scorecard
Management (25%)
Performance (25%)
Availability (20%)
Scalability (20%)
Value (10%)
Overall Score (100%)
ClustrixDB 7.5 9 10 10 9 9 9.5
At a Glance
  • ClustrixDB is a more scalable alternative to Amazon Aurora for people who need extremely high transactional performance.


    • Scale-out, high-availability clustered MySQL-compatible transactional database
    • Can add or subtract servers as needed to handle load within desired latency
    • Able to handle a mix of reads and writes
    • Can scale out to very high transactional loads while maintaining low latency


    • Minor SQL and DDL syntax differences between ClustrixDB and MySQL
    • ClustrixDB lacks spatial extensions and full-text search

Copyright © 2016 IDG Communications, Inc.