In other words, although the Factual service offers a relatively modest 67 million listing of local businesses and points of interest around the world, it needs nearly a petabyte of HDFS storage to maintain the source data and cleanse it recursively.
"I don't think there's been enough emphasis on thinking deeply about what's the best possible workflow for good data," says Elbaz. "Data in itself is not factual until you've processed it with some sort of workflow that improves its clarity and provides more metadata."
But the problem is that the effort spent on ensuring data quality is not immediately apparent to the customer. "The unfortunate reality is that it's really hard to build a brand in data. It would be nice to live in a world where the data would speak for itself and somebody could apply a seal of approval, but we really don't live in that world today," says Elbaz.
Today the obsession is with big data analysis of semi-structured data -- which is highly useful for spotting trends, but has nothing to do with accuracy at a granular level. Meanwhile, in the broader sphere of the Internet, made-up "facts" sit side by side at a peer level with the real thing. Even the quality of data exposed by worthy initiatives like Data.gov has been called into question.
There's lots of talk about making business and government data available on the Internet, but not nearly as much conversation around the much more difficult problem of validating that data. Data provided as a service in the cloud needs to aspire to be as valuable as core data maintained by customers. But unfortunately, no independent agency exists to give a stamp of approval to the good stuff.
Perhaps we simply need to wait for trusted brands to prove themselves in practice. It could be Elbaz is right when he says, "Everything is an opinion about a fact unless there's some company behind it saying, 'We have a strong feeling about this.'"
This article, "Why we need data we can believe in," originally appeared at InfoWorld.com. Read more of Eric Knorr's Modernizing IT blog. And for the latest business technology news, follow InfoWorld on Twitter.