In particular, this passage jumped out at me:
Have you ever wondered what you look like to Amazon? Here is the cold, hard truth: You are a very long row of numbers in a very, very large table. This row describes everything you've looked at, everything you've clicked on, and everything you've purchased on the site; the rest of the table represents the millions of other Amazon shoppers. Your row changes every time you enter the site, and it changes again with every action you take while you're there. That information in turn affects what you see on each page you visit and what e-mail and special offers you receive from the company.
As did this one, describing how data scientists use SVD in modeling the dimensions of personal preference/taste from these tables of numbers:
The technique involves factoring the original giant matrix into two "taste matrices" -- one that includes all the users and the 100 taste dimensions and another that includes all the foods and the 100 taste dimensions -- plus a third matrix that, when multiplied by either of the other two, re-creates the original matrix....the dimensions that get computed are neither describable nor intuitive; they are pure abstract values, and try as you might, you'll never identify one that represents, say, "salty." And that's okay, as long as those values ultimately yield accurate recommendations.
Let me net that out for the layperson: The more dimensions of individual preference/taste that your data scientist attempts to capture in their decision-automation model, the more complex the model grows. As it grows more complex, the model becomes more opaque, in terms of any human (including the data scientists and subject matter experts who built it) being able to specifically attribute any resultant recommendation, decision, or action that it drives to any particular variable. Therefore, in spite of the fact those "values" (aka numbers) ultimately yield accurate recommendations, the number-driven outcomes become more difficult to understand or explain.
Hence, an increasingly complex subject-domain relationship may cause the associated data model to become less comprehensible, which in turn, makes the decision-automation process less transparent. This can occur even in those business scenarios where all the numbers and all the recommendation-engine logic that leverages them is on the table and available for anybody to inspect.
If nobody, not even your data scientists, can explain or justify the numbers, then you have a problem. That's a potential Achilles' heel in any complex data-driven application, not just recommendation engines.
This story, "Sometimes it's OK to treat people like numbers," was originally published at InfoWorld.com. Read more of Extreme Analytics and follow the latest developments in big data at InfoWorld.com. For the latest developments in business technology news, follow InfoWorld.com on Twitter.