Clementine 8.1 melds BA with BI foundation

Data mining platform uses workflow diagrams, graphical interfaces to streamline analysis

Innovation, such as that required to create and deploy BA (business analytics) solutions, is generally an easier process for smaller, focused development groups. So I’m seriously impressed by what SPSS has been able to accomplish in the BA tool area with the newest version of its data mining workbench, Clementine 8.1.

Given SPSS’ role in the market, I expected a more pro-forma approach — the two behemoths of statistical analysis, SPSS and the SAS Institute, dominate the user base for sophisticated BI and data mining applications. I was pleasantly surprised by the attention SPSS paid to both usability and breadth of features, aspects that big companies with large installed bases tend to cut corners on.

Clementine 8.1 has a sensible design and eminently practical user interface. The BA features neither degrade what’s already there nor disappear into the massive capabilities that anchor the Clementine data mining product family.

The underlying workbench design uses a graphical representation of the analyst’s own process workflow. The data mining workflowrequires formulating the right cluster of questions to ask, identifying a subset of data from the warehouse or mart that addresses the questions, cleaning and restructuring the data, loading it, running it iteratively until you have a predictive model, and then saving the work for reuse.

Clementine supports all of this work except the purely human-expertise task of creating the right set of questions. That makes the goal of the data mining client — attacking large stores of collected data and pulling out meaningful relationships that hint at or even sometimes scream out actions to take — easier to achieve. For shops already committed to SPSS infrastructure, choosing Clementine is a no-brainer; for those with mixed platforms, Clementine’s virtues make it a very strong choice.

Going graphical

Clementine’s tabbed tools palettes sequentially collect related steps in the workflow process, grouping them into “nodes.” An analyst-user drags these nodes to the work window, connecting them in a structured, graphical sequence to create workflows that SPSS calls “streams”; multiple, related streams form a project. Clementine maintains a logical structure to manage these work products, with tabbed storage areas to store and display them. Users may also draw from previously created work modules.

In its tersest expression, a stream need consist only of a data source node, a process node, and some deliverable, either a model or a graphical output. In reality, analysts will export the models and procedures to one of the many output formats Clementine supports, including SPSS, SAS, and SQL. And they’ll use the tools to put a significant slab of the data preparation back into the database so the work needn’t be re-executed in future data mining.

This workflow diagram model is eminently practical because it follows the standard professional analyst’s structure, and because the analysts trained for these positions tend to have mastered this form of structured thinking. This makes Clementine’s face to the user a gloriously productive one. The tabbed palettes of nodes are organized in a way that dedicated analytical pros will “get” instantly, and those who do a range of work, including analysis, will pick it up quickly.

The tabbed organization of streams, outputs, and trained models also makes it very convenient to reuse them in other projects or export them to C code or to PMML (Predictive Model Markup Language), an XML-based language for defining and sharing predictive models between compliant vendors’ applications.

Clementine’s work structure is supported, albeit unevenly, by real-time error messages. When laying down nodes on the work area, the client won’t allow you to connect things that can’t be connected logically as a sequence and creates an error message to alert you.

On the other hand, some of the error messages you get at run time in the thorough event log entries will alert you that there was a failure, but not specify it closely enough to remind you of what you did incorrectly. For that, you have to go to Clementine’s documentation, a beautifully executed manual and deep, linked, on-line help with a search function and indexing.

For all its elegance, I’m relieved SPSS hasn’t claimed in its marketing or positioning documents that this software can be an equally powerful tool for non-dedicated staff. It won’t be: The documentation is comprehensive and factual but doesn’t presume to teach more than the minimum about the craft and statistical tests and models of this platform. The ideal user for this software is still the staffer whose job is dedicated to analytics and statistics.

No small commitment

On the BA side, SPSS made it easier to trigger iterative efforts by providing more visual muscle to models with graphical cross-tabs and better visualization of cluster graphics. A data audit node and reclassification capabilities support quicker data retuning, which in turn supports more exhaustive, iterative engagement with the analysis. A new utility, Cleo, also deploys models to the Web for viewing and interaction.

The breadth of the Clementine platform offering makes it a big commitment. The product’s solid integration with external data sources and its $75,000 entry price make it most appropriate for dedicated analysis groups that will make use of and master the full platform.

Clementine is a mature platform, but is expanding its capabilities and moving more surely into newer techniques such as BA. Its user base is drawing third-party products — such as Kxen’s Analytic Framework— that add even more tools to the kit. Clementine’s connections to enterprise data sources and development tools make it a leading platform for supporting smart decisions in an economy that offers no additional margin for hiring or slack.

InfoWorld Scorecard
Value (10.0%)
Reporting (20.0%)
Suitability (20.0%)
Scalability (10.0%)
Ease of use (20.0%)
Interoperability (20.0%)
Overall Score (100%)
Clementine 8.1 7.0 6.0 10.0 8.0 9.0 9.0 8.3