Light a fire under Cassandra with Apache Ignite

The Apache Ignite in-memory computing platform not only boosts performance, but also adds SQL queries and ACID compliance

Apache Cassandra is a popular database for several reasons. The open source, distributed, NoSQL database has no single point of failure, so it’s well suited for high-availability applications. It supports multi-datacenter replication, allowing organizations to achieve greater resiliency by, for example, storing data across multiple Amazon Web Services availability zones. It also offers massive and linear scalability, so any number of nodes can easily be added to any Cassandra cluster in any datacenter. For these reasons, companies such as Netflix, eBay, Expedia, and several others have been using Cassandra for key parts of their businesses for many years.

Over time, however, as business requirements evolve and Cassandra deployments scale, many organizations find themselves constrained by some of Cassandra’s limitations, which in turn restrict what they can do with their data. Apache Ignite, an in-memory computing platform, provides these organizations with a new way to access and manage their Cassandra infrastructure, allowing them to make Cassandra data available to new OLTP and OLAP use cases while delivering extremely high performance.

Limitations of Cassandra

A fundamental limitation of Cassandra is that it is disk-based, not an in-memory database. This means that read performance is always capped by I/O specifications, ultimately restricting application performance and limiting the ability to attain an acceptable user experience. Consider this comparison: What can be processed on an in-memory system in a single minute would take decades on a disk-based system. Even using flash drives, it would still take months.

While Cassandra offers very fast data write performance, achieving optimal read performance requires that the Cassandra data be written to disk sequentially, so that on reads, the disk head can scan for as long as possible without the latency of the head hopping from location to location. To achieve this, the queries need to be simple, without any JOINs, GROUP BYs, or aggregation, and the data must be modeled for those queries. Hence, Cassandra offers no ad hoc or SQL query capability at all.

DataStax, a company that develops and provides support for a commercial edition of Apache Cassandra, added an ability to connect Cassandra to Apache Spark and Apache Solr to support analytics. However, this strategy provides limited benefit because using connectors is a very expensive way to access a subset of the data. The data still has to be laid down sequentially or the performance will be poor because Cassandra would need to do a full table scan, which is a scatter/gather approach involving a great deal of disk latency.

Another potentially important limitation of Cassandra is that it only supports eventual consistency. Its lack of full ACID compliance means it cannot be used for applications that move money or require real-time inventory information.

As a result of these limitations, organizations wanting to use the data they have stored in Cassandra for new business initiatives often struggle with how to do so.

Enter Apache Ignite

Apache Ignite is an in-memory computing platform that can help overcome these limitations in Cassandra while avoiding the overhead costs of the connector approach. Apache Ignite can be inserted between Apache Cassandra and an existing application layer with no changes to the Cassandra data and only minimal changes to the application. The Cassandra data is loaded into the Ignite in-memory cluster, and the application transparently accesses the data from RAM instead of from disk, accelerating performance by at least 1,000x. Data written by the application is written first to the Ignite cluster for immediate, ongoing consumption. It is then written to disk in Cassandra for permanent storage with either synchronous or asynchronous writes.

Apache Ignite also has the same write strategy as Apache Cassandra, so it will feel familiar to Cassandra users. Like Cassandra, Ignite is open source and its users benefit from a large and active community, with support available through a number of community websites. As an in-memory computing platform, however, Apache Ignite enables organizations to do much more with their Cassandra data—and do it faster. Here’s how.

  • More data options—ANSI SQL-99 and ACID transaction guarantees

    Powered by an ANSI SQL-99-compliant engine, Apache Ignite offers ACID transaction guarantees for distributed transactions. Its In-Memory SQL Grid provides in-memory database capabilities, and ODBC and JDBC APIs are included. By combining Ignite with Apache Cassandra, any type of OLAP or complex SQL query can be written against Cassandra data that has been loaded into Ignite. Ignite can also be operated in multiple modes from eventual consistency to real-time, full ACID compliance, allowing organizations to use the data stored in Cassandra (but read into Ignite) for a host of new applications and initiatives.
  • No remodeling of Cassandra data

    Apache Ignite reads from Apache Cassandra and other NoSQL databases, so moving Cassandra data into Ignite requires no data modification. The data schema can also be migrated directly into Ignite as is.
  • Greater speed for data-intensive applications

    Moving all of the Apache Cassandra data into RAM offers the fastest possible performance and greatly improves query speed because the data is not constantly being read from and written to disk. It is also possible to use Apache Ignite to cache only the active portion of the Cassandra data to achieve a significant speed boost. Ignite’s indexes also reside in memory, making it possible to perform ultrafast SQL queries on the Cassandra data that has been moved into Ignite.
  • Simple horizontal and vertical scaling

    Like Apache Cassandra, Apache Ignite easily scales horizontally by adding nodes to the Ignite cluster. The new nodes instantly provide additional memory for caching Cassandra data. However, Ignite also easily scales vertically. Ignite can utilize all of the memory on a node, not only the JVM memory, and objects can be defined to live on or off heap and use all the memory on the machines. This way, simply increasing the amount of memory on each node automatically scales the Ignite cluster vertically.
  • Increased availability

    Like Apache Cassandra, the peer-to-peer Apache Ignite computing platform is always available. The failure of a node does not prevent applications from writing to and reading from defined backup nodes. Data redistribution is also automatic as an Ignite cluster grows. Because Ignite offers sophisticated clustering support, such as detecting and remediating split brain conditions, the combined Cassandra/Ignite system is more available than a standalone Cassandra system.
  • Simpler and faster than Hadoop

    Many organizations that would like to make SQL queries into their Apache Cassandra data consider loading the data into Hadoop. The downside of this approach is that, after solving the ETL and data syncing challenges that arise, the queries into Hadoop would still be relatively very slow. While combining Cassandra and Ignite will also result in some small performance hit because of the additional system and caching, queries nevertheless execute with blazing speed, making the solution perfect for real-time analytics. And managing the relationship between Ignite and Cassandra data is much simpler.

Challenges to implementing Cassandra and Ignite

As noted above, combining Apache Cassandra and Apache Ignite does involve costs. You naturally incur a hit in the performance—and cost and maintenance—of having two networks (as you would with the addition of any other solution). There is a hardware cost for new commodity servers and sufficient RAM, and perhaps a subscription cost for an enterprise-grade and supported version of Apache Ignite. Further, implementing and maintaining Ignite may require some organizations to hire additional expertise. As a result, a cost/benefit analysis is warranted to ensure that the strategic benefits of any new use case, along with the performance gains, outweigh the costs.

In making this determination, the following considerations are important. First, unlike the previous generation of in-memory computing solutions, which required cobbling together multiple products, Apache Ignite is a fully integrated, easy-to-deploy solution. Integrating Ignite with Apache Cassandra is typically a very straightforward process. Ignite slides between Cassandra and an application, such as Apache Kafka or other client, that accesses the data. Ignite includes a prebuilt Cassandra connector, which simplifies the process. The application then reads and writes out of Ignite instead of Cassandra, so it is always accessing data from memory instead of from disk. Ignite automatically handles the reads and writes out of and into Cassandra.

Second, while many still think of in-memory computing as prohibitively expensive, the cost of RAM has dropped approximately 30 percent per year since the 1960s. Although RAM is still pound for pound more expensive than SSDs, the performance benefit of utilizing terabytes of RAM in an in-memory computing cluster, especially for large-scale, mission-critical applications, may make in-memory computing the most cost-effective approach.

Finally, Apache Ignite is a safe bet with a mature codebase. It originated as a private project in 2007, was donated to the Apache Software Foundation in 2014, and graduated to a top-level project about a year later—the second-fastest Apache project to graduate after Apache Spark.

Apache Cassandra is a solid, proven solution that can be a vital element of many data strategies. With Apache Ignite, Cassandra data can be made more useful.The Apache Ignite in-memory computing platform is an affordable and effective solution to make Cassandra data available for new OLTP and OLAP use cases while meeting the extreme performance demands of today’s web-scale applications. The combined solution maintains the high availability and horizontal scalability of Cassandra, while adding ANSI SQL-99 compliant query capabilities, vertical scalability, more robust consistency with ACID transaction guarantees, and more—all while delivering performance that is 1,000x faster than disk-based approaches.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.