Developed by MIT Media Lab and Deloitte, Data USA delivers interactive JavaScript visualizations of thousands of public data sets on jobs, income, education, and health

Visualizing data that matters
If you hadn't noticed, this is a presidential election year. I can recall many previous election years when I've groused over outrageous distortions of fact, but this year has been especially tough to suffer through.

The truth is that statistics collected by reliable, objective sources often put the lie to such distortions, and almost all of those sources and numbers can be found on the Internet. But guess what? That data isn't always as easy to discover as you might think. Much of it is buried in PDFs and comma-delimited files. Neither voters nor, sadly, many news organizations have the time or inclination to dig that stuff up.

For years I've seen various attempts to aggregate public data relevant to key issues of the day and make them available on the Internet in interactive graphical form. The latest effort, launching today, looks promising: Data USA is a free and open platform created collaboratively by Deloitte, MIT Media Lab, and Datawheel, a Media Lab spinoff.

Data USA is an impressive piece of work. Pick a U.S. location, industry, or occupation, and you get dozens of interactive charts that you can drill into for greater detail. Eight types of visualizations are offered, including clickable maps of the United States that serve up census data and treemaps that show relative proportions at a glance. For business analysts, policymakers, or students, this has obvious value, but the visualizations are so well executed anyone can have fun with it.

According to Matt Gentile, principal at Deloitte and federal government analytics lead, the intent of Data USA is to "tell a story" using multiple, related data sets. Much of the effort, he said, went into getting domain experts and data scientists to provide contextual frameworks for various subject areas. At launch, Data USA is focusing on jobs, skills, education, and health, with much of the underlying data gathered from the Census Bureau, the Department of Labor, the Department of Commerce, and the Department of Education.

Data USA features a number of innovations, including an extension to the open source JavaScript data visualization library D3 called D3plus, developed and maintained by Datawheel cofounders César Hidalgo, Alex Simoes, and Dave Landry. Also, Data USA auto-generates textual descriptions alongside visualizations, providing context and SEO benefit with no manual effort.

All the data on Data USA is API-accessible and the code is open source. The idea is to launch a platform that will sustain itself over time, as developers build on it and integrate their own data sets.

Although Data USA excels at providing context, it's hardly the first platform for delivering visualization of open data to the general public. The World Bank, for example, offers a gargantuan quantity of free data on 213 countries, 20 topics each, updated on a continual basis. Although a commercial venture, Statista also serves up a ton of free visualizations; it has a particularly rich array of global business information. Google launched its Public Data Explorer in 2010 using technology it acquired from Hans Rosling's Gapminder Foundation, but as far as I can tell Google has lost interest in this project.

Data USA has the sharpest visualization technology I've seen yet for this sort of venture, and the contextual framework is well executed -- but is it sustainable? Partly that depends on whether a vibrant open source community forms around it. Gentile admitted that he does not yet have a plan for how all those data sets will be updated on an ongoing basis.

From my point of view, this sort of service seems absolutely essential to democracy. Although a few interesting media ventures have appeared in the past few years, in general, Web journalism still seems stuck in the print era, posting HTML versions of magazine or newspaper pages.

If you ask me, in the Internet era, at a time when the world is more complicated than ever, the news should be a lot more like business intelligence. The data is there and it should be instantly accessible in graphical form to anyone who wants it. If we can't agree on facts, what can we ever agree on? 

