With Hadoop HBase, Splice Machine breathes new life into old RDBMS

Wait, don't they have SQL databases for that? Splice Machine mashes up scalability of Hadoop with RDBMS transactionality

Splice Machine advertises itself as "the only Hadoop RDBMS." The idea is to give you a transactionally correct database that has the underlying scalability features of HBase. According to its creators, Splice Machine behaves like a normal SQL RDBMS.

Constructed from a combination of plug-ins to one of Hadoop's column family databases, HBase, Splice Machine uses the coprocessor extension API, a modified version of Apache Derby, and some custom proprietary code. Splice Machine is also distribution-agnostic and can install on pure Apache, Hortonworks, or even Cloudera flavors of Hadoop.

[ Work smarter, not harder -- download the Developers' Survival Guide from InfoWorld for all the tips and trends programmers need to know. | Keep up with the latest developer news with InfoWorld's Developer World newsletter. ]

Why bend Hadoop into an RDBMS?

In interviewing Splice Machine CEO and co-founder, Monte Zweben, it became clear that the folks at Splice Machine are likeable, smart people and this is not their first rodeo (or startup, in this case). Nonetheless, I wasn't able to nail down a superclear use case.

Monte is a polished interviewee, which means it's difficult to get him off his talking points. When I asked about use cases, he gave me a few (mainly generic ones from Splice Machine's whitepaper), but I never experienced an aha moment that left me thinking "oh, a better RDBMS" or "I see, Hadoop for people who don't want Hadoop." Then again, Splice Machine is very early stage, as is much of the NewSQL space where Splice Machine resides.

At the moment, Splice Machine touts itself as both operational and analytical, which may repeat the RDBMS mistakes of the past. Few technologies have defined the industry or sold as well as the almighty RDBMS, but few have been so widely misused either.

Supposedly, the RDBMS is both operational and analytical, but it failed to scale (affordably, simply, reasonably) in the age of everything being on the Web. It also failed to scale (affordably, simply, reasonably) to big data size for analytics. It also failed to ever make anyone happy with so-called real-time capabilities (when businesspeople say "real time," they mean faster than overnight, weekly, or monthly batches).

A new implementation of the basic underlying principles of an RDBMS can solve one of these really well, but I'm doubtful it will be a great solution on all three fronts. It's possible it doesn't have to be. It's possible it can be OK and familiar and win the business, even if it loses in a head-to-head benchmark.

In the long run, Splice Machine's alternative architecture could hit Teradata and MPP systems harder than, say, operational datastores like MongoDB or Cassandra. As tools like Spark and Kafka mature, it is hard to see the real-time nitch belonging to a database (Splice Machine) built on a database (HBase) built on a distributed filesystem (HDFS).

Inside the machine

The basic idea of a better RDBMS built on HBase is that it will use HBase and the underlying HDFS filesystem to scale while providing you with the transactional correctness you love. There is some math and magic here; "joining" data efficiently in a distributed system is not simple.

Doing transactions without too heavy a burden on locking challenges scale. You can say "shared nothing," but you can't tell the truth the whole time. When you do both at the same time, you have a big challenge. Then you need to scale up (more memory) on the node level to do this well, which is a bit counterintuitive if you're doing a pure HBase/Hadoop system. Regardless, I have seen 1,000-node systems with 8GB of memory perform well and, obviously, scale brilliantly.

What Monte described to me was a combination of memory use and the compression of stored data in the underlying HBase structure, as well as parallelizing and distributing the join algorithm. This is also how it gets along with a column family database, HBase, in order to store the contents of a relational database. The catch is you can not query it with your HBase toolset as is -- but Splice Machine provides plug-ins into Hadoop to let you work with its data using the Hadoop toolset.

Does a better RDBMS have a place in the future?

The main question I have about Splice Machine is whether it's a stop-gap solution or one with staying power. Just because people are comfortable with and invested in the RDBMS paradigm for now, will they continue to interact with data in this way in the future? Even if they don't, when Monte says the RDBMS isn't going away anytime soon, you have to agree with him.

Splce Machine could be your add-on engine, albeit a proprietary one. Your reluctant boss deciding whether he can handle Hadoop might bite on it if he can have a one-to-one translation from your RDBMS. No matter how well Hadoop and its ecosystem may fit your needs, a migration project from RDBMS technologies is expensive.

To use a strained car analogy, the future may or may not belong to Tesla, but the present belongs to the Prius. Splice Machine may have hit on a viable hybrid.

This article, "With Hadoop HBase, Splice Machine breathes new life into old RDBMS," was originally published at InfoWorld.com. Keep up on the latest news in application development and read more of Andrew Oliver's Strategic Developer blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.

Copyright © 2014 IDG Communications, Inc.