In this case, the characteristics might include historical patterns of incurring debt, days to pay past debts, income, ZIP code of residence and so on. "Based on the predictive models, the collections agency would be able to use the best, most cost effective strategy for collecting debts rather than using the same strategy for everyone," he says. But you need to do experiments to get started. "Predictive analytics can't create information from nothing," he says.
3. Don't proceed until your data is the best it can be
People often operate under the misconception that they must have their data perfectly organized, without any holes, disorder or missing values, before they can start a predictive analytics project.
One global petrochemical company, an Elder Research client, had just begun a predictive analytics project with a great potential return on investment when data scientists discovered that the state of the operations data was much worse than they had initially thought.
In this case, a key target value was missing. Had the business waited to gather new data, the project would have been delayed for at least a year. "A lot of companies would have stopped right there. I see this kill more projects than any other mistake," says Deal.
But data scientists are used to dealing with messy and incomplete data, and they have methodologies that, in many cases, allow them to work around the problem. This time, the business moved forward, and eventually the data scientists found a way to derive the missing target values from other data, according to John Ainsworth, data scientist at Elder Research.
The project is now on track to deliver major cost savings by accurately predicting failures, avoiding costly shutdowns, and identifying exactly where to apply expensive preventive maintenance procedures. Had they waited for perfect data, however, it never would have happened, Deal says, "because priorities change, and the data never gets fixed."
4. When reviewing data quality, don't bother to take out the garbage
Eric Siegel, president of the consultancy Prediction Impact and author of "Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die," once worked with a Fortune 1000 financial services company that wanted to predict which call-center staff hires would stay on the job longest.
At first blush, the historical data appeared to show that employees without a high-school diploma were 2.6 times more likely to stay on the job for at least nine months than were employees with other educational backgrounds. "We were on the verge of recommending that the client begin to prioritize hiring high-school dropouts," Siegel says.
But there were two problems. First, the data, which had been manually keyed in from job applicant resumes, had been labeled inconsistently. One data entry person checked off all educational levels that applied, while another checked only the highest degree completed.
Compounding the problem was the fact that, for some reason, the latter person had labeled data from more of the resumes of people who stayed the longest than did the former. Those issues could have been avoided by making sure labelers were assigned a random group of resumes to key in and that each person used the same labeling methodology.
But the bigger message is this, says Siegel: "Garbage in, garbage out. Be sure to carefully QA your data to ensure its integrity."