Most companies realize they need to become more data driven in order to make better decisions and identify new opportunities. Many also recognize the need for new tools to analyze their data, much of it stored in operational systems.
At the same time, for their operational systems, a growing number of companies have adopted NoSQL databases, the most popular of which is the document database MongoDB. Unfortunately, document databases are nobody’s first choice for analytics, so people end up using ETL to move data from MongoDB to an RDBMS or Hadoop for analysis. ETL processing adds latency, however -- perhaps too much latency if you want your business to be "data driven."
Now a new open source analytics tool, SlamData, has arrived to operate directly on MongoDB data. I spoke with Jeff Carr, the CEO of SlamData, about his new offering. He contrasts his open source solution with that of Pentaho, a traditional BI tool that supports MongoDB -- but does so by transforming the document database into an RDBMS. According to Carr, "Pentaho is built for relational data to make the data look like tables. That is not an easy thing to do."
I asked Carr about his target market. At the moment, because the tool is still in its early stages, SlamData's user base mostly consists of developers. As the tool matures, he hopes for adoption by business analysts and/or nondevelopers who at least know SQL.
The latest version of SlamData allows SQL-fluent users to gather results based on queries of MongoDB collections of documents, which you manage through a GUI that uses a simple notebook metaphor. The front end is browser-based, so there's no annoying client-side install. Already, in the unreleased GitHub version, SlamData has added charting features to the mix.
In order to deal with the difference between documents and tables, SlamData extends SQL with an XPath-like notation. Rather than querying from a table name (or collection name), you might query
FROM person[*].address[*].city. This should represent a short learning curve for SQL-loving data analysts or power business users, while being inconsequential for developers.
The power of SlamData resides in its back-end SlamEngine, which implements a multidimensional relational algorithm and deals with the data without reformatting the infrastructure. The JVM (Scala) back end supplies a REST interface, which allows developers to access SlamData’s algorithm for their own uses.
There’s overlap between the back end of the project and Apache Drill. According to Carr, Drill is Hadoop-based and has only fledgling support for MongoDB. He also stated that it had scant commercial support (only MapR) and is “not very active if you look at the commit logs.” (I looked at the commit logs and Drill seems pretty active to me: 17 contributors made more than 100 commits last month.)
Both the SlamData front end and SlamEngine are on GitHub and offered under the GNU Affero GPL. For now, this is all free. The company plans to pursue a hybrid coffee shop free Wi-Fi model and sell “enterprise grade features” like LDAP integration. The company will also sell support for both the open source and proprietary version. New releases of the open source version will add Charting and support for other JSON-speaking NoSQL databases such as Cassandra.
Clearly, there’s room for NoSQL-specific analytics tools. Consider the effort of getting MongoDB data into Hadoop with the Mongo connector or into Hive in order to query it with a JDBC/SQL-speaking BI tool. There’s all that ETL involved, mapping from documents to tables. With the likes of SlamData, you could turn your analysts on the production database (bad idea) or create a replica -- which, provided the infrastructure is available, is nearly a push-button affair in MongoDB’s management tool.
SlamData is developing rapidly. It’s probably not ready for analysts to use, but it might be an interesting tool for developers who want to show what can be done with NoSQL data. It’s also further evidence of the new maturity of the NoSQL space, particularly MongoDB. As Carr puts it: “Data guys have done a great job of innovating, but the analytics people are behind.” Maybe SlamData will evolve enough to be a first step in catching up.