Two of the developers behind the KVM and OSv projects have now released and open-sourced a direct replacement for the Apache Cassandra NoSQL database that they say is an order of magnitude faster.
ScyllaDB is meant as a substitute for Cassandra in the same way that MariaDB can be swapped in for MySQL without blinking. ScyllaDB is written in C++ as opposed to Cassandra's Java, and its creators, Avi Kivity and Dor Laor, claim its sharded architecture provides the kinds of parallelism and speed-up on a single computer that was previously only available in a cluster.
Shard it up, speed it up
According to the ScyllaDB site, the database was written to take advantage of a modern multicore architecture. Each CPU core has its own dedicated database shard, complete with its own NUMA-friendly memory segment and NIC stack for maximum parallelism. All the changes are transparent to the end user, so Cassandra data and application code work as before with no modification.
Kivity and Laor's work in ScyllaDB stems from their work with OSv, a Linux-compatible OS optimized for cloud workloads. For that project they created Seastar, an application framework that employs the same sharded, shared-nothing architecture, which they claimed achieved a similar linear growth in performance across multiple nodes.
Many of the big data tools from the Hadoop ecosystem are built in Java, and as a result more performance bottlenecks in applications written for the JVM are surfacing. Databricks, key developer of the in-memory processing framework Spark, is trying to address some of those limitations by doing an end run around how the JVM handles things like memory allocation and garbage collection. ScyllaDB's approach is to avoid the JVM altogether and use C++14, which supports many programming constructs also found in Java -- e.g., lambdas.
Cassandra's not sitting still, though
The upcoming version 3.0 of Cassandra will boast improvements to its feature set. But several truly major changes for improving performance will not make it into this release, giving ScyllaDB time to win over early adopters.
To gain a foothold, the company will need to nab a big-name Cassandra users -- like Netflix. ScyllaDB could also gain momentum by replacing Cassandra in application stacks. Mesosphere's Mesosphere Infinity is one example of a turnkey big data stack with Cassandra as a core component.
It's not clear that outfits that have built a business model around Cassandra will take the ScyllaDB bait. DataStax, providers of an enterprise-grade Cassandra distribution that integrates with Spark for further speed improvements, might deign to offer an alternate edition of its product with ScyllaDB at the core. But unless ScyllaDB creates a groundswell of demand, it's unlikely the company will succeed in switching development away from a widely supported and well-known project to an relative upstart.