Dems score with better data
DNC's Linux warehousing project delivered on '50-state strategy'
Follow @infoworldBehind every big success these days, there's probably some darned good IT making it happen. That appears to be the case in the surprising electoral victory by the Democratic Party last week.
New data warehouse solutions commissioned by the Democratic National Committee (DNC) and also by Catalist, a for-profit group backed by a faction of leading Democratic players, are being credited for their part in the Party's strong performance in nationwide midterm elections. Those solutions may have helped Democrats close the gap with tech-savvy Republicans, according to a people involved with the projects and with the party's countrywide get-out-the-vote operation.
The DNC solution, which was commissioned one year ago by DNC Chairman Howard Dean, tapped a new generation of low-cost, Linux-based data warehouse technology to improve the quantity, quality, and availability of voter information used by state Democratic parties during the election turn-out effort. Those close to the project say the new system, part of Dean's so-called 50-state strategy, helped tip close races in the House and Senate in favor of the Democrats.
The solution was developed by Intelligent Integration Systems (IISi) of Boston, a company that develops datacenter solutions and uses a Netezza Performance Server data warehouse appliance to integrate information provided by 45 state-level Democratic parties on about 200 million voters, according to Paul Davis, IISi's CEO.
In addition to the Netezza back end and IISi code, the system uses data quality and cleansing tools from FirstLogic and enterprise integration software vendor Sunopsis, as well as data modeling tools from SPSS, according to a Netezza statement.
The new solution was hosted at a datacenter in Virginia and allowed the DNC to rapidly update so-called "voter files" as state-level party workers provided them with new information. The data was then cleaned up by comparing it to lists of known phone numbers and addresses. The DNC was also able to "overlay" the information and match it to data about individuals in the lists culled from various consumer data stores, Davis said.
Netezza, which makes the technology used by the DNC, is part of a new generation of data warehousing companies that are using commodity hardware such as Seagate hard drives, Intel processors, and hardened Linux operating systems to create low-cost, fast data warehouse appliances, according to Donald Feinberg, of Gartner.
Like incumbent data warehouse players such as Teradata (part of NCR), Netezza uses distributed database intelligence, in which data filtering, processing, and analysis is done on the same device that stores the data.
"They have code running on the hard drive, so you can parallelize the queries and do them as fast as you can lift the data off the hard drive. Fundamentally it results in a two order of magnitude improvement in speed," said Rich Zimmerman, IISi's CTO.
Parallelizing queries to databases is nothing new. However, running parallel queries on inexpensive hardware and software, like Linux and PostgreSQL, and being able to match what high-end vendors like Teradata can offer is new, said Feinberg. Appliance-based products like Netezza's Performance Server are also easier to maintain, requiring less staff and keeping the cost to implement and run the data warehouse low, he said.








