- How granular should the data be? Most analytics tools today sample data over time intervals, then average out the results. If this data is being used to drive real-time network behavior, what's the right granularity for measurements? If the window is too wide, changes will not be real time. If the window is too narrow, there's a risk that behavior shifts back and forth, never reaching equilibrium.
- Where do you collect the data? If the source of data is a distributed set of IT infrastructure entities (some physical, some virtual), what collects the data? And where is that data stored? The act of reaching out to many devices in real time is technically challenging, but bringing that data together is downright frightening. How do you design data collection to be resilient in case of failure? What kind of scale must be considered? What about performance?
- Real-time or batch processing? Collecting big data is hard, but processing it can be even harder. Is the data processed in large batch jobs? If so, how do you ensure the processing time is sufficiently fast to make near-real-time adjustments possible? Should processing instead split into lots of smaller jobs, as with Hadoop? How does that implementation integrate with the network infrastructure?
- How much data do you keep? In a state-driven system, when something goes wrong, you cannot just look at the configuration to find out what was driving device behavior. Troubleshooting will need to expand to include an analysis of the state at the time of the issue. How much history must be stored? How is state correlated with events that might be happening elsewhere in or around the network?
- What about security? And perhaps the biggest challenge: Do people really want a network that changes dynamically? That implies a level of trust that simply doesn't exist today. What does the change approval process look like? What form does auditing take? If things are fully automated, how is a large distributed system meaningfully tested?
The next phase of networking
The technical challenges of combining SDN with big data are difficult but not insurmountable, but they need to be addressed during SDN's formative years. The worst possible outcome would be for the industry to have solidified an SDN architecture without having fully considered the impacts of big data.