You know how political polls always mention a “margin of error” of so many percentage points? So Candidate X may be 3 percent up in the polls, but that’s actually “a statistical dead heat.” Well, the technology research world needs some discipline like this, because too many research firms have gotten lazy and are playing it fast and loose with small sample sizes.
Of course I realize we all snoozed through statistics and don’t want to revisit that heavy math stuff (chi square, null hypotheses, Type 1 errors, and so on). But I keep getting research reports that make highly specific claims, with an air of learned authority, that turn out to be based on such a small sample size that the results are very likely skewed by chance (or “fooled by randomness,” as in the title of a current math-geek bestseller).
Take one report I received last week from … let’s just call them Nameless Research. I’d call it Shameless, but it’s not really any worse than many other firms. Using page after page of fancy pie charts, Nameless’ report addressed VoIP, outsourcing, server consolidation, IPv6, Sarbanes-Oxley, open source — pretty much everything and the kitchen sink.
The study presented very granular findings such as: “59 percent of organizations using virtualization are doing so for application testing and development; 20 percent of organizations are planning to backsource work, and of those, 44 percent are doing so because of ‘cost savings not realized’; 8 percent of organizations are looking at a business unit rollout of desktop Linux solutions; and 29 percent of organizations say ILM (information lifecycle management) will be very important to them in the next year.”
Very illuminating. But when you look at the study’s “methods” description, it turns out the survey had included only 132 organizations. And when I e-mailed Nameless Research’s PR guy asking for a statistical clarification, he e-mailed back that only “88 of the 132 gave a company name with no duplicates.” And he attached a boilerplate statement that said that “our objective here is to provide guidance while not attempting to estimate the size of markets or to project adoption rates of various technologies.”
OK, fine. … Nameless Research was basically admitting that the study was SWAG (qualitative is the polite phrase), and not statistically significant. I could have forgiven Nameless right then, given it a pass on not mentioning this in the “methods” description, had the boilerplate e-mail not continued with the following (I kid you not):
“We think the results of this exercise are neither the cryptic omens of foretellers who don’t really tell us much nor the wild (and therefore unreliable) predictions of analysts who have little grounds for making such sure-handed forecasts. While perhaps a little lower in surprise value, our data is deeper in analysis value and more reliable and applicable.”
What? Huh? What were Nameless’ research directors smoking when they wrote this? Obviously they skipped out of statistics in college to attend an extra literature seminar or two. So now they’re saying that despite only 132 (or 88) respondents the survey is “deep in analysis value” and “reliable” without any of the basic math to back it up? Moral of the story: Before you base any major decisions on industry research reports, be very very skeptical – and maybe even a little afraid.