Free Newsletters
Technology & Business Daily

InfoWorld
Log-in | Register

Clementine 8.1 melds BA with BI foundation

Data mining platform uses workflow diagrams, graphical interfaces to streamline analysis

By Jeff Angus
May 07, 2004
 

Innovation, such as that required to create and deploy BA (business analytics) solutions, is generally an easier process for smaller, focused development groups. So I’m seriously impressed by what SPSS has been able to accomplish in the BA tool area with the newest version of its data mining workbench, Clementine 8.1.

Free IT resource

Open Source Business Conference (OSBC) May 22-23, 2007

Sponsored by OSBC

Free IT resource

Virtualization Insights from Top Experts - Learn how virtualization gets real!

Sponsored by Dell



Clementine 8.1

SPSS, spss.com

Very Good  8.3
criteria score weight
Ease-of-use 9 20%
Interoperability 9 20%
Reporting 6 20%
Suitability 10 20%
Scalability 8 10%
Value 7 10%

Cost:
Solutions start at $75,000

Platforms:
Client: Windows (Me, XP Home, Professional, 2000, 2003, or NT 4.0 with Service Pack 6); Server: Windows 2000 (Professional, Advanced Server, NT 4.0 with Service Pack 6 or later), Sun Solaris, HP-UX 11i, IBM AIX 4.3.3 or 5.2, OS/400 (iSeries) V5R2 with OS/400

Bottom Line:
Clementine's client, server, and add-on products offer a deep set of BI, BA, and classic data-mining capabilities with an elegant, productive interface for a user team with an analytical background. Excellent import and export power makes not only data but routines available to other applications.

About our Reviews and Scoring Methodology

Given SPSS’ role in the market, I expected a more pro-forma approach — the two behemoths of statistical analysis, SPSS and the SAS Institute, dominate the user base for sophisticated BI and data mining applications. I was pleasantly surprised by the attention SPSS paid to both usability and breadth of features, aspects that big companies with large installed bases tend to cut corners on.

Clementine 8.1 has a sensible design and eminently practical user interface. The BA features neither degrade what’s already there nor disappear into the massive capabilities that anchor the Clementine data mining product family.

The underlying workbench design uses a graphical representation of the analyst’s own process workflow. The data mining workflowrequires formulating the right cluster of questions to ask, identifying a subset of data from the warehouse or mart that addresses the questions, cleaning and restructuring the data, loading it, running it iteratively until you have a predictive model, and then saving the work for reuse.

Clementine supports all of this work except the purely human-expertise task of creating the right set of questions. That makes the goal of the data mining client — attacking large stores of collected data and pulling out meaningful relationships that hint at or even sometimes scream out actions to take — easier to achieve. For shops already committed to SPSS infrastructure, choosing Clementine is a no-brainer; for those with mixed platforms, Clementine’s virtues make it a very strong choice.

Going graphical

Clementine’s tabbed tools palettes sequentially collect related steps in the workflow process, grouping them into “nodes.” An analyst-user drags these nodes to the work window, connecting them in a structured, graphical sequence to create workflows that SPSS calls “streams”; multiple, related streams form a project. Clementine maintains a logical structure to manage these work products, with tabbed storage areas to store and display them. Users may also draw from previously created work modules.

In its tersest expression, a stream need consist only of a data source node, a process node, and some deliverable, either a model or a graphical output. In reality, analysts will export the models and procedures to one of the many output formats Clementine supports, including SPSS, SAS, and SQL. And they’ll use the tools to put a significant slab of the data preparation back into the database so the work needn’t be re-executed in future data mining.

This workflow diagram model is eminently practical because it follows the standard professional analyst’s structure, and because the analysts trained for these positions tend to have mastered this form of structured thinking. This makes Clementine’s face to the user a gloriously productive one. The tabbed palettes of nodes are organized in a way that dedicated analytical pros will “get” instantly, and those who do a range of work, including analysis, will pick it up quickly.

The tabbed organization of streams, outputs, and trained models also makes it very convenient to reuse them in other projects or export them to C code or to PMML (Predictive Model Markup Language), an XML-based language for defining and sharing predictive models between compliant vendors’ applications.

Clementine’s work structure is supported, albeit unevenly, by real-time error messages. When laying down nodes on the work area, the client won’t allow you to connect things that can’t be connected logically as a sequence and creates an error message to alert you.

On the other hand, some of the error messages you get at run time in the thorough event log entries will alert you that there was a failure, but not specify it closely enough to remind you of what you did incorrectly. For that, you have to go to Clementine’s documentation, a beautifully executed manual and deep, linked, on-line help with a search function and indexing.

For all its elegance, I’m relieved SPSS hasn’t claimed in its marketing or positioning documents that this software can be an equally powerful tool for non-dedicated staff. It won’t be: The documentation is comprehensive and factual but doesn’t presume to teach more than the minimum about the craft and statistical tests and models of this platform. The ideal user for this software is still the staffer whose job is dedicated to analytics and statistics.

No small commitment

On the BA side, SPSS made it easier to trigger iterative efforts by providing more visual muscle to models with graphical cross-tabs and better visualization of cluster graphics. A data audit node and reclassification capabilities support quicker data retuning, which in turn supports more exhaustive, iterative engagement with the analysis. A new utility, Cleo, also deploys models to the Web for viewing and interaction.

The breadth of the Clementine platform offering makes it a big commitment. The product’s solid integration with external data sources and its $75,000 entry price make it most appropriate for dedicated analysis groups that will make use of and master the full platform.

Clementine is a mature platform, but is expanding its capabilities and moving more surely into newer techniques such as BA. Its user base is drawing third-party products — such as Kxen’s Analytic Framework— that add even more tools to the kit. Clementine’s connections to enterprise data sources and development tools make it a leading platform for supporting smart decisions in an economy that offers no additional margin for hiring or slack.





 


 
Jeff Angus is an InfoWorld contributing editor. Contact him at jeff_angus@infoworld.com.
 

TOP NEWS:


»  Four quick tips for choosing an IM security product
71 percent of businesses will invest in real-time messaging this year. If you're one of them, be sure to protect your enterprise

»  Forrester analysts ID hot IT jobs
Research group finds 16 IT roles with a promising future

»  Nvidia claims 10 hours of HD video on Tegra chip
The Tegra 600 and 650 can be used with hard disk drives and are designed partly for mobile Internet devices

»  Database vendors add Google's MapReduce
Greenplum and Aster Data Systems will support Google's programming technique, developed for parallel processing of large data sets across commodity hardware

»  Network management: Tips for managing costs
New technologies, changing requirements, and ongoing equipment maintenance and upgrades cost money, but there are ways to manage expenses

»  EMC targets SMBs, branch offices with new low-end storage
Celerra NX4 highlights include thin provisioning, snapshot technology for data recovery and backups, and Web-based console for management of storage volumes




Keeping the E-Mail Flowing
Traditional exchange and recovery solutions are not only complicated, but very expensive. Learn from the experts how to implement Continuous Application Protection (CAP) and save yourself the complications and cost of traditional exchange and recovery solutions. Sponsored by AppAssure

»  Click here to view this Webcast
  The Path to Enterprise Security
This is your comprehensive guide to Enterprise Security. In it you'll find solutions to the most pressing security threats facing you and your company. Learn the latest on insider threats and how to effectively minimize risk within your organization. Sponsored by Nokia

»  Click here to download now

- Special Advertising Partners -
WHITE PAPERS
 

» Technology White Papers Library

Technology White Papers by Topic

Technology White Papers E-mail Alert

Find out when the latest white paper is available:
 
 
INFOWORLD MARKETPLACE
 
» BUY A LINK NOW
 

FIND PRODUCTS AND COMPANIES
» COMPLETE PRODUCT GUIDE



TECHNOLOGY INDEX
• Applications
• Application Development
• Security
• Networking
• Wireless
• Platforms
• Hardware
• Data Management
• Storage
• Web Services
• Business
• Telecom
• Professional Services
• Standards

TECH WATCH 


What's the 411 on GOOG-411?
Just as Google has become synonymous with "performing a Web search," 411 is understood to mean "information" -- as in "what's the 411?" I was thus surprised to discover, from a billboard, no less, that the king of search is taking on the ...

Apple HTML source reveals 'iPhone Extreme'
"This one's a stretch..." reports AppleInsider. Um, yeah. Reporting on HTML code sightings of product names could be called a stretch, but iPhone Extreme has a ring to it. Now, that sounds like the product Apple should have released first, rather ...

COLUMNISTS

Unified under law
Ephraim Schwartz's Column and Blog (InfoWorld) - In the litigious world we live in, deploying a unified communications platform in your enterprise could...
» MORE COLUMNISTS

MORE INFOWORLD BLOGS


Open Sources 
Product Management
When I joined MySQL four years ago, there was quite a lot of debate about product management. We didn't actually have ...

Zero Day 
Botnet herders tending smaller flocks
New research backs up the theory that botnet operators are keeping their networks smaller in a continued effort to keep ...



• Advice Line
• Database Underground
• The Deep End
• Enterprise Mac
• Geeks in Paradise
• Grid Meter
• The Gripe Line
• InfoWorld Daily
• Inside IT
• IT Troubleshooter
• ITXtreme
• Open Sources
• ProdBlog
• Real World SOA
• Reality Check
• Security Adviser
• SMB IT
• The Storage Network
• Tech Watch
• Virtualization Report
• Zero Day

ADVERTISEMENT


RESOURCE CENTERadvertisement 

GOVERNMENT IT & POLICY
'If you don't go after the network, you're never going to stop these guys. Never.'
From the State Department, All the News for Inquiring Minds
TechPresident, the Internet Citizenry's New Consensus Taker



Sponsored Technology Links

 
 
 HOME  NEWS  BLOGS  PODCASTS  VIDEOS  TECHNOLOGIES  TEST CENTER  EVENTS  CAREERS   About | Advertise | Awards | RSS | Contact Us 

Copyright © 2008, Reprints, Permissions, Licensing, IDG Network, Privacy Policy, Terms of Service.
All Rights reserved. InfoWorld is a leading publisher of technology information and product reviews on topics including viruses,
phishing, worms, firewalls, security, servers, storage, networking, wireless, databases, and web services.

CIO :: ComputerWorld :: CSO :: Demo :: GamePro :: Games.net :: IDG Connect :: IDG World Expo
Industry Standard :: IT World :: JavaWorld :: LinuxWorld :: MacUser :: Macworld :: Network World :: PC World :: Playlist