Clementine 10 reinforces defensive analytics

Upgrades to SPSS's BA app power anomaly detection, faster answers in graphical modeling system

When I last reviewed SPSS’s Clementine BA (business analytics) application in 2004, I found great virtue in its effective graphical interface and its capability to systemize the building and storage of analytical routines for later reuse or reassembly.

Since then, the demand for BA muscle has shifted radically from CRM toward the two U.S. economic sectors that are in steep growth: health care and national security. Clementine 10 aims squarely at meeting those sectors’ needs, but in doing so, it also opens up plenty of applications for shops in other markets.

Health care and national security need BA for targeting exceptions. Health care is looking to root out fraud or doctors who spend higher-than-average time with their patients, as well as to get ahead of the spread of epidemics or biological attacks. National security customers are looking to target threats from malicious groups accurately, among other tasks.

Clementine 10 sports new anomaly-detection analytics to address these types of issues. SPSS brought together routines so the analyst has intrinsics with which to work, greatly simplifying what would have previously required multinode construction. That functionality saves the analyst time and frees up design effort for analysis.

Adding anomaly-detection analytics amplifies productivity, as well. Analysts can now focus on anomalies when necessary, such as for fraud-detection efforts. The process also filters out the anomalous cases, which makes it useful for work such as market segmentation, where the anomalies tend to muddy and slow the identification of segments.

The new version also improves analyst productivity with another modeling node, Feature Selection. This node quickly delivers a list of the most useful fields to include in predictive models and the ones to filter out as chaff. It acts as an analyst on a small scale, ranking and rating each field you’ve identified as input. (You can save a Feature Selection for future reuse).

Because of these preratings, an analyst gets to the subset of critical fields more quickly, and SPSS says that by working with fewer fields Clementine will squeeze out faster hardware processing times when answering questions. I found that Feature Selection got me to answers a lot faster, but this was mostly through saving user time rather than faster system processing.

I especially appreciate Clementine 10’s upgrades to time-series analysis, something that required external tools to run smoothly in prior versions. The new features will require a healthy training investment, but analysis of data over time increasingly will become the central focus of predictive analytics, which justifies the investment.

Some of 10’s additions are merely “nice to haves.” Although previous versions imported a wide range of data sources, many of them had to come piped through ODBC connections, a mechanism that mandates IT create extra security and permissions and mandates IT support of the end-users. Clementine 10, however, supports direct connection of Excel files (ranges and worksheets), saving IT as well as user time.

There was a small glitch in two of the Excel files I attached this way: Empty columns at the end of worksheets that had been cleared of their data left behind extraneous column delimiters. Not “knowing” where the worksheet ends is a common Excel brain cramp, and it’s not a giant issue, but I’d like to see SPSS clean up this irregularity within the input routine.

Clementine has grown two versions since I last ran it through its paces, and SPSS has successfully maintained the usability of the interface. The UI’s foundation is a set of tools palettes. The top tab contains the tools needed most frequently, and after that they are organized into tabs of related workflow steps. The analyst drags nodes from palettes to the work area and connects them in a structured, graphical sequence to create workflows that SPSS calls “streams.”

You can create a useful stream and answer questions inside Clementine with as little as a data source node, a single process node, and a deliverable (a model or graphical output). Some users may occasionally choose to do ad hoc work this way, but they’ll do the bulk of their serious work by creating models inside this client and exporting the models and procedures to one of the supported output formats, including SPSS, SAS, and SQL. You can also save the data preparation work and models back into the database so they won’t need to re-execute in future data mining efforts.

SPSS has made some architectural and platform advances, too. Clementine 10 catches up with SAS Enterprise Miner’s capability to take advantage of multiprocessor systems: The software delivers multithreaded processes for complex modeling and for parallel processing of other routines. SPSS also upgraded Clementine’s capability to do in-database data mining efforts, adding specific functions, and extending in-database support for IBM DB2 and Oracle databases -- and it now runs on 32-bit Red Hat Linux. Reporting was fine overall, but it doesn’t have much in the way of printed reports, relying instead on on other SPSS models.

Nevertheless, Clementine 10 hasn’t lost its usability strengths for its core professional analyst audience, and the new features -- including anomaly detection and the Feature Selection node -- continue to serve existing BA customers while adding strengths for new ones.

InfoWorld Scorecard
Reporting (20.0%)
Suitability (20.0%)
Value (10.0%)
Scalability (10.0%)
Ease of use (20.0%)
Interoperability (20.0%)
Overall Score (100%)
SPSS Clementine 10 6.0 10.0 7.0 9.0 9.0 9.0 8.4