The campaign set out to create, as Wegrzyn described it, "an analyst-driven organization by providing an environment for smart people to freely pursue their ideas." The emphasis was on accommodating smart analysts, rather than hard-core engineers. A SQL-based environment was deemed friendly enough for analyst needs, rather than, say, an environment that required knowledge of Java or statistical analytics. In addition, the platform needed enough horsepower to enable analysis "at the speed of thought." But the organizational objective may have been the most important factor of all, where barriers between disparate data sets -- as well as between analysts -- were lowered, so everyone could work together effectively. In a nutshell, the campaign sought a friction-free analytic environment.
With these goals in mind, the team considered a number of approaches. They realized that while Hadoop was an important complementary technology, it required highly technical skills and was not designed for the real-time queries the team needed. They also realized that a large analytic appliance, which they used in previous campaigns, would not scale out sufficiently.
Ultimately, the team settled on HP Vertica. It was SQL-based, affordable, and scalable, as well as a strong performer in proof-of-concept tests. On the statistical analytics side, the team used R and Stata.
A cornerstone of the environment was its ability to grow. The environment was built with a feedback loop that became increasingly powerful the more it was used and tweaked. While the initial raw data was modest in big data terms -- around 10 terabytes -- the analysts generated dozens more terabytes beyond that through aggregation and experimentation.
Analytics in action
Two important initiatives during the campaign illustrate the power of the environment to gain greater efficiencies: AirWolf and Media Optimizer.
AirWolf was built to integrate the field and digital teams' efforts. A common problem in prior campaigns was that the field teams' actions, such as recording a person's particular interest in voting issues, could not be easily followed up by the Digital team (for example, with email correspondences). With AirWolf, when a voter was contacted by the field team in a door-to-door campaign, that voter's particular interests were recorded and fed back to Vertica. Then the digital team ran email blasts from the local organizer to voters, each corresponding to a voter's favorite campaign issues. This greatly enhanced the ability to pinpoint messaging and make it more feasible to sway voters.
The intent of Media Optimizer was to enable much more targeted ad purchases. Prior to Media Optimizer, TV ad buys were based on broad demographics, which is both costly and inefficient. With Media Optimizer in place, the campaign could use statistical analysis to identify the target voters in the DNC database. Next, the voter data was enriched, both with demographics data from TV ratings as well as advertisement pricing data. Finally, the results were fed back into Vertica and reanalyzed for further tuning.
With the overall picture combining likely voters for Obama, the shows they watch, and the prices of the ads -- as well as the analysis feedback loop -- it was much easier to determine the most efficient ad buys. One result was that the Obama campaign purchased twice the number of cable TV advertisements as the Romney campaign, many during niche programs, aimed at the precise demographic slices the Obama campaign was trying to reach.
All the analytic solutions shared a number of attributes: They were a combined effort of both analysts and engineers. They were time-sensitive, implemented in weeks rather than months. They were built around an unconstrained, yet centralized environment with Vertica.
The analyst-driven organization empowered the team to achieve a number of key objectives. First, all the data from the disparate teams was brought together within Vertica, enabling a 360-degree view of the data. Second, analysts could answer nearly any question quickly and easily, no matter where the data originally came from. Finally, the platform was continually improved thanks to its built-in feedback loop.
With the success of this initiative, a unified big data analytics environment is sure to take its place as a standard requirement for campaigns to come.
This article, "The real story of how big data analytics helped Obama win," was originally published at InfoWorld.com. Read more of Andrew Lampitt's Think Big Data blog, and keep up on the latest developments in big data at InfoWorld.com For the latest business technology news, follow InfoWorld.com on Twitter.