SDN, big data, and the self-optimizing network

SDN will eventually relieve the burden of manual reconfiguration, but only when we collect and process enough data to enable the network to optimize itself

SDN (software-defined networking) gains more mind share every day. The concept of reinventing networking to better match today's applications and infrastructures is a tantalizing goal, but clearly not without its challenges.

Michael Bushong has been around the SDN space for some time, having spent years leading SDN efforts at Juniper Networks. Bushong is currently working at SDN vendor Plexxi, and in this week's New Tech Forum, he takes a close look at what SDN promises -- and what questions need to be answered in order to truly reinvent the network as we know it. For that goal to be realized, says Bushong, SDN and big data must go hand in hand. -- Paul Venezia

Big data and SDN: Closing the loop
The networking industry has reached an inflection point. With cloud as the backdrop, SDN and big data are poised to converge in a way that will redefine how data centers function. As with all changes of this magnitude, the reality lies not in the big concepts but in the details of how these two technology forces come together. Those who understand the nuances will be in the best position to exploit the new technology -- and provide new points of leverage for data center architects.

Why we need SDN
To understand how SDN and big data will come together, you need to get to the heart of why SDN is hot right now. While much of the emphasis has been on the supporting protocols like OpenFlow, the reality is that SDN is larger than the technologies it comprises. SDN is really an industry reaction to an ongoing pain point in networking.

Today, provisioning and managing a network is a needlessly manual chore. So long as the surrounding infrastructure and applications using that infrastructure are stable and relatively unchanging, that pain is noticeable, but not crippling. But the rise of virtualization in compute and storage arenas has fostered enough workload portability to expose networking's contribution to IT pain.

The energy behind SDN exists because of the potential to alleviate that pain. But how does that work?

The most basic tenet behind SDN is the separation of control and forwarding. By centralizing control, the network can be treated as a unified resource. With a global view, the SDN controller can use the entire network to service application workloads. Conceptually, this is not unlike global traffic-monitoring solutions in cities today. With a citywide understanding of traffic patterns, control centers can use tools like metering lights and adjustable tolls to control the flow of traffic.

These capabilities are enacted through one or more SDN controllers, which also serve as platforms on which controller applications can run. The applications themselves are the ultimate vehicles for SDN value. With the entire network as their resource, these applications can do things like streamline provisioning by making intelligent, top-down decisions based on controller input. For example, users can steer traffic to network monitoring points. What would have been a distributed configuration problem can now be solved from a single administrative touch point, reducing effort and the risk of downtime due to misconfiguration.

The role of big data
To be truly dynamic, SDN applications must be responsive to the world in which they exist. Minimally, there will need to be feedback loops to ensure the desired behavior is actually occurring after a change is made. Beyond that, it seems inevitable that the triggers for network changes will evolve from manual intervention to state-driven changes.

The natural progression from manual to automated will pass first through network analytics. For example, can current traffic conditions be used to drive path optimizations in the network? Can locality be used to intelligently pair users with content that is cached in close proximity?

Once you accept that analytics can play a meaningful role, you need to consider where that role should begin and end. What are the sources of information that can potentially be used? How many endpoints should be considered? What is the impact of virtualization on the number of endpoints? How much state information is distributed across these network and non-network entities? How is that state information collected, stored, and correlated?

The natural conclusion of this line of thinking is that analytics as we know it today will just scratch the surface. Ultimately, the promise of SDN will be inherently tied to the information that surrounds the network and drives the decisions that make SDN applications interesting. With more and more endpoints driving increasing traffic to a growing number of users, that adds up to big data.

When SDN and big data collide
There are a number of practical implications of moving to a dynamic IT infrastructure with hooks into multiple data sources. Here is a handful of the questions we'll need to consider:

  • How granular should the data be? Most analytics tools today sample data over time intervals, then average out the results. If this data is being used to drive real-time network behavior, what's the right granularity for measurements? If the window is too wide, changes will not be real time. If the window is too narrow, there's a risk that behavior shifts back and forth, never reaching equilibrium.
  • Where do you collect the data? If the source of data is a distributed set of IT infrastructure entities (some physical, some virtual), what collects the data? And where is that data stored? The act of reaching out to many devices in real time is technically challenging, but bringing that data together is downright frightening. How do you design data collection to be resilient in case of failure? What kind of scale must be considered? What about performance?
  • Real-time or batch processing? Collecting big data is hard, but processing it can be even harder. Is the data processed in large batch jobs? If so, how do you ensure the processing time is sufficiently fast to make near-real-time adjustments possible? Should processing instead split into lots of smaller jobs, as with Hadoop? How does that implementation integrate with the network infrastructure?
  • How much data do you keep? In a state-driven system, when something goes wrong, you cannot just look at the configuration to find out what was driving device behavior. Troubleshooting will need to expand to include an analysis of the state at the time of the issue. How much history must be stored? How is state correlated with events that might be happening elsewhere in or around the network?
  • What about security? And perhaps the biggest challenge: Do people really want a network that changes dynamically? That implies a level of trust that simply doesn't exist today. What does the change approval process look like? What form does auditing take? If things are fully automated, how is a large distributed system meaningfully tested?

The next phase of networking
The technical challenges of combining SDN with big data are difficult but not insurmountable, but they need to be addressed during SDN's formative years. The worst possible outcome would be for the industry to have solidified an SDN architecture without having fully considered the impacts of big data.

Controller architectures must consider how state information is going to be collected, stored, and accessed. SDN applications need to be designed with state consumption in mind. Which decisions does the application need to make, and more important, what are all the data sources required to inform those decisions? Even the devices themselves will be part of the solution. How should they publish data related to the current conditions on the device to be used by these applications? Given the uncertain architectural landscape, what can be done on devices to make the eventual integration with other infrastructure easier?

Obviously, there are more questions than answers at this point. But in our industry's rush to get point protocols and solutions to market, we are in danger of solidifying architectural principles without fully considering the endgame. We would all do well to pause now and ensure we're planning for long-term success.

New Tech Forum provides a means to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all enquiries to newtechforum@infoworld.com.

This article, "SDN, big data, and the self-optimizing network," was originally published at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.

Copyright © 2013 IDG Communications, Inc.

How to choose a low-code development platform