The universe is an immensely tangled knot of correlations. Science is humanity's tool for identifying which correlations reveal the deep, dark, dense knots of causation at the heart of it all.
Scientific models -- also known as laws, theories, and hypotheses -- are highly simplified tools for untangling threads from the correlation knot and testing their causal plausibility. Data science is the art of using statistical models to identify and validate the correlative factors at work. However, statistical models may lull data scientists into a false sense of validation, insofar as the models may fit the observational data closely but still miss the larger causative factors at work. When that happens, the model gives the illusion of insight but lacks predictive validity. It becomes a hindsight optimization tool.
[ Download InfoWorld's Big Data Analytics Deep Dive for a comprehensive, practical overview of this booming field. | For a quick, smart take on the news you'll be talking about, check out InfoWorld TechBrief -- subscribe today. ]
In this recent blog, Michael Walker captures this paradox well when he writes, "While models can be useful in seeking to understand complex phenomena, all models are flawed and present an illusion of reality. This is especially true in high causal density environments (e.g., human behavior, finance, climate, health, public policy)." Actually, you could expand Walker's "e.g." list to include physics, chemistry, biology, economics, and pretty much every other subject-matter domain. Anybody who has ever majored in any scientific subject knows that even the most expert humans only grasp a tiny piece of the "high causal density" nature of those fields.
In business, the applications of statistical models are practical, but the need to vet the underlying causative factors remains. If you're unsure whether the historical correlations you've built into your statistical models will continue in the future, you treat that low confidence as a risk factor. For example, if you have low confidence in your predictive models of demand and response rates in a given customer segment, you're probably not going to wager millions of dollars on a new product launch that targets that segment.
Statistical modeling isn't dead, but in order to drill down to causative factors more rapidly, it needs to be grounded in real-world experimentation, per my blog from a year ago. Essentially, real-world experiments put the data science "laboratory" at the heart of the big data economy. Under this approach, business model fine-tuning becomes a never-ending series of practical experiments. Data scientists evolve into an operational function, running their experiments 24-7 with the full support and encouragement of senior business executives.
The big data revolution is spawning the inexpensive compute power and in-database, model-execution platforms needed to sustain continuous real-world experimentation across all domains, both scientific and business. Walker calls for data scientists everywhere to shift their operational focus toward spending "more time and brainpower conducting low-risk experiments and less time building models." According to Walker, "true randomized experiments are most reliable. The randomized experiment is the scientific gold standard of certainty of predictive accuracy in policy and business."
Of course, that can't happen in a business vacuum. Any shift toward real-world experimentation requires the active support of the senior stakeholders -- such as the chief marketing officer -- whose business operations will be impacted. As Walker states: "Limits to the use of experiments are established by the need for leadership, strategy and long-term vision. Business and public policy leaders need to support and adequately fund experimentation by the data science and business analytics teams."
That may be a tough sell in the most conservative, tradition-bound business environments. If you're one of the Googles or Facebooks of this world, continuous real-world experimentation is already at the heart of your operating model. But if you're an old-school business executive who has no clue what big data, data science, or real-world experimentation are all about, all of this will feel far too radical and disruptive for your tastes.
Sure, it's all essential to the new order of online business in the 21st century, but many people haven't yet absorbed the new reality beyond the buzzword level.
This story, "Big data demands nonstop experimentation," was originally published at InfoWorld.com. Read more of Extreme Analytics and follow the latest developments in big data at InfoWorld.com. For the latest developments in business technology news, follow InfoWorld.com on Twitter.