A better mousetrap: A JSON data warehouse takes on Hadoop

Sure, a NoSQL or JSON data warehouse sounds faddish, but SonarW is a better solution for many

Using Teradata sucks -- there, I said it. It's painful to convene the data committee to deal with adding a column to the data warehouse, then having to do a cost/benefit analysis for every meager gigabyte of data you want to store because the thing costs more than my house.

On the other hand, what are your alternatives? I don’t mean Netezza or Exadata, which are different flavors of the same pain. I mean something different: In the brave new world of Hadoop, you have Hive, which is slow, and Impala, which doesn't scale well. They're not viable data warehouse replacements much of the time. 

However, SonarW could be. It promises to keep the “shall we add this column” committee at bay. It comes from JSonar, the company that MongoDB users know for JSONStudio. It is built from the ground up on JSON and is compatible with MongoDB; anything that talks to MongoDB talks to SonarW.

But there's even more to SonarW. Like Hive or Impala, SonarW can use HDFS, the Hadoop distributed file system, to scale. And SonarW should perform far better than Hive and Impala.

For architecture and speed, SonarW is a data warehouse similar to massively parallel processing (MPP) data warehouses. According to the demo I attended, it ran fast on one machine, and the company claims it runs even better on many more. If you’ve had any experience with Hadoop, you know this scheduling issue is a pain point. Giving me 200 rows from one table takes far too long, and many of your workloads even for a big data project are not that large, especially when setting up the major part of the job.

In other words, Hadoop always tries to maximize resource utilization. But sometimes you need to go grab something real quick and you don’t need 100 nodes to do it.

SonarW can of course connect standard business-intelligence tools, but you’ll lose some of the advantages of MongoDB’s aggregation framework and pipelining. At the same time, people who use data warehouse are typically not familiar with JSON tools and MongoDB’s aggregation framework.

That gap between the data warehouse world and the MongoDB/JSON world is the key challenge for SonarW. The company's answer to this challenge is SQL compatibility via a plug-in to MariaDB’s MaxScale. That lets you connect to SonarW all your favorite SQL tools that connect to MySQL or MariaDB (which includes anything ODBC or JDBC).

SonarW is hardly the only provider looking to bridge data warehousing and Hadoop via SQL or OLAP, such as AtScale. My inbox is full of such announcements, with many claiming to be the first to do so (they are not).

Even the Mongo analytics field is starting to be a thing. Which begs the question: Are there enough paying customers for MongoDB who will use it for analytics to support SonarW’s offering?

What could work to SonarW's advantage is its simplicity and lower cost (starting at $15,000 per terabyte) compared to traditional data warehouses and MPP systems. That might motivate even non-MongoDB-oriented companies to at least kick the tires.

However, I suspect that those who are on Teradata are stuck on Teradata. Moving from an entrenched technology means retraining staff and paying for migration -- it's usually easier to keep paying your dealer than go to rehab.

Even so, maybe there is room for a rebel base in these organizations, an alliance between the NoSQL team and analysts who are willing to learn something new and managers who don’t want to blow their budget on a few more bytes in Teradata, Netezza, or Exadata.