5. Use data from the future to predict the future
The problem with data warehouses is that they're not static: Information is constantly changed and updated. But predictive analytics is an inductive learning process that relies on analysis of historical data, or "training data," to create models. So you need to re-create the state the data was in at the earlier time in the customer lifecycle. If data is not date-stamped and time-stamped, it's easy to include data from the future that generates misleading results.
That's what happened to a regional auto club when it set about the task of building a model it could use to predict which of its members would be most likely to buy its insurance product.
For modeling purposes, the club needed to re-create what the data set was like early on, prior to when members had bought or declined to buy insurance, and exclude subsequent data. The organization had created a decision tree that included a text variable containing phone, fax, or email data. When the variable contained any text, there was 100 percent certainty that those members would later buy the insurance.
"We were assured that the indicator was known at the time" -- before the members had purchased the insurance -- but auto-club staffers "couldn't tell us what it meant," says Elder, who worked on the project. Knowing this was too good to be true, he continued to ask questions until he found someone in the organization who knew the truth: The variable represented how members had cancelled their insurance -- by phone, fax, or email. "You don't cancel insurance before you buy it," Elder says. So when you do modeling, you have to lock up some of your data.
6. Don't just proceed, but rush the process because you know your data is perfect
Between 60 percent and 80 percent of the time spent on a new predictive analytics project is consumed by preparing the data, according to Elder Research. Analysts have to pull data from various sources, combine tables, roll things up, and aggregate, and that process can take as much as a year to get everything right. Some organizations are absolutely confident that their data is pristine, but Abbott says he's never seen an organization with perfect data. Unexpected issues always crop up.
Consider the case of the pharmaceutical business that hired Abbot Analytics for a project, but balked at the time allocated for data work and insisted on speeding up the schedule. Abbott relented, and the project moved forward with a shortened schedule and smaller budget. But soon after the project started, Abbott discovered a problem: The ship dates for some orders preceded the dates when the orders had been called in. "Those weren't problems we couldn't overcome, but they took time to fix," he says -- time that was no longer in the budget.
Once Abbott pointed out the issue, the executive realized there was a problem and had to go back to the management team to explain why the project was going to take longer. "It became a credibility issue for him at that point," Deal says. Lesson learned: No matter how good you think your data is, expect problems: It's better to set expectations conservatively and then exceed them.
7. Start big with a high-profile project that will rock their world
A large pharmaceutical company had grandiose plans that it thought were too big to fail. As it began to build an internal predictive analytics service, the team decided to do something that would "revolutionize the health care industry," Deal recalls them proclaiming in an initial meeting.
But the project's goals were just too big and required too large of an investment to pull off -- especially for a new team. "If you don't see results quickly, you don't have anything to encourage you to maintain that level of investment," he says.