May 07, 2004

Clementine 8.1 melds BA with BI foundation

Data mining platform uses workflow diagrams, graphical interfaces to streamline analysis

Innovation, such as that required to create and deploy BA (business analytics) solutions, is generally an easier process for smaller, focused development groups. So I’m seriously impressed by what SPSS has been able to accomplish in the BA tool area with the newest version of its data mining workbench, Clementine 8.1.

Given SPSS’ role in the market, I expected a more pro-forma approach — the two behemoths of statistical analysis, SPSS and the SAS Institute, dominate the user base for sophisticated BI and data mining applications. I was pleasantly surprised by the attention SPSS paid to both usability and breadth of features, aspects that big companies with large installed bases tend to cut corners on.

Clementine 8.1 has a sensible design and eminently practical user interface. The BA features neither degrade what’s already there nor disappear into the massive capabilities that anchor the Clementine data mining product family.

The underlying workbench design uses a graphical representation of the analyst’s own process workflow. The data mining workflowrequires formulating the right cluster of questions to ask, identifying a subset of data from the warehouse or mart that addresses the questions, cleaning and restructuring the data, loading it, running it iteratively until you have a predictive model, and then saving the work for reuse.

Clementine supports all of this work except the purely human-expertise task of creating the right set of questions. That makes the goal of the data mining client — attacking large stores of collected data and pulling out meaningful relationships that hint at or even sometimes scream out actions to take — easier to achieve. For shops already committed to SPSS infrastructure, choosing Clementine is a no-brainer; for those with mixed platforms, Clementine’s virtues make it a very strong choice.

Going graphical

Clementine’s tabbed tools palettes sequentially collect related steps in the workflow process, grouping them into “nodes.” An analyst-user drags these nodes to the work window, connecting them in a structured, graphical sequence to create workflows that SPSS calls “streams”; multiple, related streams form a project. Clementine maintains a logical structure to manage these work products, with tabbed storage areas to store and display them. Users may also draw from previously created work modules.

In its tersest expression, a stream need consist only of a data source node, a process node, and some deliverable, either a model or a graphical output. In reality, analysts will export the models and procedures to one of the many output formats Clementine supports, including SPSS, SAS, and SQL. And they’ll use the tools to put a significant slab of the data preparation back into the database so the work needn’t be re-executed in future data mining.

This workflow diagram model is eminently practical because it follows the standard professional analyst’s structure, and because the analysts trained for these positions tend to have mastered this form of structured thinking. This makes Clementine’s face to the user a gloriously productive one. The tabbed palettes of nodes are organized in a way that dedicated analytical pros will “get” instantly, and those who do a range of work, including analysis, will pick it up quickly.

Test Center Scorecard
20%20%20%20%10%10%
Clementine 8.19961087
8.3
Very Good

Sign up to receive Data Management Resource Alerts

Subscribe to the Technology: Data Management Newsletter

The one-stop resource center for IT professionals.

©1994-2009 Infoworld, Inc.