Yet for those IT leaders who manage to convert decades-old county records, public housing specs and precipitation patterns into a viable business plan, "the sky's the limit," says Gurin.
Gurin should know. As part of his work at the GovLab, he's compiling a who's who list of U.S. companies that are using government data to generate new business. Known as the Open Data 500, the list features a wide array of businesses -- from scrappy startups like Calcbench, which is turning SEC filings into financial insights for investors, to proven successes like The Climate Corporation. Recently purchased by Monsanto for about $1 billion, The Climate Corporation uses government weather data to revamp agricultural production practices.
Although their business models vary wildly, there's one thing the Open Data 500 companies have in common: IT departments that have figured out how to collect, cleanse, integrate and package reams of messy data for public consumption.
Take Zillow, for example. Zillow is an online real estate database that crunches housing data to provide homeowners and real estate professionals with estimated home values, foreclosure rates and the projected cost of renting vs. buying. Founded in 2005 by two former Microsoft executives, the Seattle-based outfit is now valued at more than $1 billion. But success didn't arrive overnight.
"Anyone who has worked with public record data knows that real estate data is among the noisiest you can get. It's a train wreck," says Stan Humphries, chief economist at Zillow, referring to the industry's lack of standard formatting. "It's our job to take that massive hairball and pull relevant facts out of it."
Today, Zillow has a team of 16 Ph.D.-wielding data analysts and engineers who use proprietary advanced analytics tools to synthesize everything from sales listings to census data into easy-to-digest reports. One of the biggest challenges for Zillow's IT department has been creating a system that integrates government data from more than 3,000 counties.
"There is no standard format, which is very frustrating," says Humphries. "We've tried and tried to push the government to come up with standard formats, but from [each individual] county's perspective, there's no reason to do it. So it's up to us to figure out 3,000 different ways to ingest data and make sense of it."
Complaints about the lack of standard protocols and formats are common among users of open data.
"Data quality has always been and will continue to be a very important consideration," says Chui. "It's clear that one needs to understand the provenance of the data, the accuracy of the data, how often it's updated, its reliability -- these will continue to be important problems to tackle."
So far, Zillow's plan of attack using "sophisticated big data engineering" is working. In fact, a new set of algorithms has helped improve Zillow's median margin of error from 14 percent to 8 percent. And for 60 percent of all sales that occur, Zillow's estimated sales price is within 10 percent of the actual figure.