But if you start with that sort of architectural model, you’re likely to fail, says Scott Sognefest, a partner in Deloitte Consulting’s BI practice. “There’s a growing realization that you can’t put BI technology on top of a big pile of data. It’s expensive and inefficient,” he says. “You wouldn’t build a factory and then decide what products you want to produce after it’s built, but that’s what people do in the BI space.”
So understand the business case first. Then you can begin the messy work IT organizations have struggled with for years: building and refining a common data model and ensuring the data you need from multiple systems is consistent. “Data quality and data integrity are not going away. There’s no easy way to solve them,” says Betsy Burton, a Gartner vice president.
Forrester’s Evelson agrees. Before launching a BI initiative, he says, “I would have a data governance effort — and drop everything else.”
BI vendors have tried to address data quality and integration issues with MDM (master data management) solutions, but efforts to govern, cleanse, and reconcile data go beyond BI to affect every corner of the organization. In many instances, BI stakeholders have lacked the clout to drive enterprisewide MDM, yielding frustration when business execs want to scale BI beyond the original requirements that drove adoption.
Until a company cleans up its data act globally — a long-running project if there ever was one — the best strategy is to reduce the data sources to those that serve well-defined business objectives. “You’ve got no business putting in BI unless you’ve whittled down those core systems,” Martens says. That can eliminate conflicting sources and yield manageable data integration and cleansing. Keeping data close to home also keeps it closer to its context and metadata, something that can get lost when data is transformed for storage in a data warehouse. “ETL [extract, transform, load] will cost you hugely,” Martens adds, referring to the common method of pulling huge chunks of static data from legacy systems.
Reducing the number of data sources helps avoid grunt work, but data quality must still be up to par. Some data will always be dirty, perhaps because it comes from outside sources or perhaps because you’re seeking something difficult to extract. One common example is getting birth dates of customers, who see no reason to share their age, notes Anne Milley, director of technology product marketing at SAS Institute, so you get false data, such as the easy-to-enter 11/11/11, or no information at all.
In such cases, thought should be given to whether you really need that information for your analysis and, if so, how your analysis will account for the missing data so results remain meaningful, she says. This kind of thinking should be done before you deploy data collection, transformation, mining, analysis, or reporting systems, she adds.
Fleet management services provider PHH Arval provides a simple example of how such compromises can be reached. The company tracks odometer readings when truckers refuel to aid customer analyses of vehicle efficiency, delivery costs, and conformance to safety standards. But many drivers don’t take the time to transcribe odometer readings and instead enter guesstimates at the fuel terminals where this data is collected. To adjust analyses appropriately, PHH Arval created a statistical processing model that took this data weakness into account, says Greg Corrigan, the company’s vice president of BI.