The real story of how big data analytics helped Obama win

HP Vertica played a major role, as did an org structure that centralized analytics and lowered barriers between teams

You may have heard how statistical wizard Nate Silver predicted the electoral votes for each state in the 2012 presidential election, showing that raw data crunching of polls is much more reliable than traditional punditry. What you probably haven't heard is how the Obama campaign built a 100-strong analytics staff to churn through dozens of terabytes of data with a combination of the HP Vertica MPP (massively parallel processing) analytic database and predictive models with R and Stata to gain a competitive edge.

Credit for the big data approach goes to Obama campaign manager Jim Messina, who decided to dive headfirst into an analytics-driven campaign. Messina commented, "We were going to demand data on everything, we were going to measure everything...we were going to put an analytics team inside of us to study us the entire time to make sure we were being smart about things." To ensure everything was measured, staff were evaluated on whether they entered data. The mantra became: "If you didn't enter the data, you didn't do the work."

[ Download InfoWorld's Big Data Analytics Deep Dive for a comprehensive, practical overview of this booming field. | Harness the power of Hadoop with InfoWorld's 7 top tools for taming big data. ]

Boots on the ground

Of the 100 analytics staffers, 50 worked in a dedicated analytics department, 20 analysts were spread throughout the campaign's various headquarters, and another 30 were in the field interpreting the data.

Chris Wegrzyn, director of data architecture for the Democratic National Committee, described the challenges, opportunities, and path to build the analytics-driven campaign. Wegrzyn noted that the key measurements centered on the data itself, modeling, and experimentation. The core data contained the facts about the electorate and the campaign operation. Modeling was used to understand the electorate at the individual voter level. Finally, evaluating the results of experiments helped the campaign learn how its actions actually influenced people.

Of course, the key performance indicator for the campaign was the number who planned to vote for Obama, divided by those who planned to vote overall. The campaign understood there were three levers to maximize that number: registration, persuasion, and turnout. They had to encourage their target audience of voters to register, persuade the undecided to vote for Obama, then do all they could to ensure that Obama voters would show up to vote on Election Day.

Marshaling the troops
To appreciate the challenges, it's important to understand how the campaign was organized into different teams. The field team was the personal face of the campaign: the people on the ground organizing volunteers, handling registrations, encouraging turnout, and so on. The digital team was responsible for online presence, email campaigns, online fundraising, social media, and more. The communications and media teams were responsible for Obama's personal messaging with interviews, ad buying, and so on. Finance focused on the overall campaign fundraising strategy.

In the past, all these departments had used sophisticated analytic technologies -- but had implemented their individual analytic approaches independently. The 2012 campaign changed all that.

The right people and mandates were important to make a unified analytics environment a reality. Executive buy-in from the campaign manager Messina was essential; without that authority, any ambitious initiative might have been sidestepped or dropped altogether. In addition, the core team had strong analytic experience from previous campaigns -- and highly talented analytic staff hired at well below the market rate.

1 2 Page 1
Page 1 of 2