For almost 20 years now, the primary data center design learned by most networking professionals has been the three-tier approach of core (L3), aggregation (L2/L3), and access (L2). While this baseline design (and its many minor variants) has been hugely successful, making reliable and scalable networks far easier and cheaper to build, its usefulness is being outstripped by a new generation of data center networking requirements.
The three-tier network design dates back to a time when there was a significant price difference between devices capable of L3 routing versus L2 switching, when traffic was primarily north-south rather than east-west, and when the speed at the core and aggregation layers was at least 10 times the speed at the edge. Clearly, these considerations no longer hold true.
Worse, three-tier designs fail to support the innovation so desperately needed in today's data centers. What little modularity can be found in three-tier designs is rigid, making fast iterations and experimentation virtually impossible. It also makes keeping up with price/performance improvements in the industry a nonstarter. Practically speaking, the three-tier design mentality locks you into a "least common denominator" feature set, a single vendor, and a small band of product generations.
The time has come for a new approach to data center network design. We believe there is a lot the industry can learn from the way that hyperscale data center operators build their networks. Despite the massive scale, hyperscale data centers have a network design that lets them start small and innovate quickly.
The key insight of the core and pod approach is that hyperscale data centers aren't built in their entirety from day one, but rather grow organically in incremental blocks of capacity, starting small each time.
Starting small: Core and pod explained
Core and pod leverages individually designed pods (not necessarily similar in structure) that hang off a routed core layer. The routed core is intended to span many generations of pods and provide fast and simple interconnect, treating each pod as an atomic unit. Within the pod, you may have only a single access layer or, more often, a "leaf and spine" network for the pod. When "leaf and spine" topology is used within a pod, the core layer is often called the "spine of spines," and the network as a whole represents a "fat tree" or "Clos" topology.
Matching the incremental demand, new pods are designed as a unit, engineered as a unit, and installed and retired as a unit. In a single data center, there could be several generations of pods; pod design v1, v2, and v3 may all sit next to each other, hanging off the shared core. As in many iterative approaches, each pod design improves on the previous one -- for example, building on newer hardware platforms at better points in a price/performance curve.
The beauty of a core and pod design is that networking, compute, and storage for each generation of pod are bundled together as a unit, making them vastly simpler to operate and automate. This approach allows for diversity of designs within a data center, while keeping each unit fully uniform within pod boundaries. While the use of diverse pods violates the uniformity of the network as a whole, it adds flexibility to experiment and grow iteratively, with enough uniformity to keep it all manageable.
On the downside, this variety of pod designs increases operational complexity, as it requires the staff to maintain the knowledge of multiple designs and possibly use different tool sets to manage and operate different pod iterations. However, if you maintain a simple and uniform pod structure, automating most tasks will be much easier than automating traditional three-tier designs that span an entire data center, allowing for teams to embrace the continuous innovation model.
The core and pod design doesn't suffer from the "lowest common denominator" or "single vendor lock-in" problems inherent in designing a three-tier network for an entire data center. Each pod is, within reason, a fresh start.
Networks designed for change
Core and pod designs were pioneered by architects in hyperscale data centers as a horizontal scale-out approach, in contrast to the expensive scale-up model in a classical tree topology. Traditional core/aggregation/edge designs typically require a replacement/redesign/rebuild of all three tiers when upgrading capacity -- a normal event in hyperscale data centers (and in most enterprise data centers). In a core and pod design, you add capacity by attaching new pods to the core and possibly by changing the newest pod design (not the entire data center design) to adapt to new needs.
Imagine that an application requires double the number of uplinks on your access switches. In a traditional three-tier design, once you exceed the port density on the pair of distribution devices, your only choice is a redesign that involves bigger, denser boxes for aggregation across the entire data center. In a core and pod design, this becomes a new pod generation that is integrated with no impact to the overall data center design.
Continuous innovation inevitably requires software upgrades. A key operational win for core and pod designs is that upgrades are much easier. Pods are upgraded independently, and because the Clos or fat-tree network designs used provide N+1 redundancy (most commonly 3+1) across all layers, upgrades are low operational risk. With traditional 1+1 three-tier designs, most upgrades become high-risk or burdensome operations due to capacity constraints.
For public, documented examples that present an evolutionary and modular approach to data center design, check out the pod (in the form of a "container") deployments at Microsoft or eBay.
The core and pod mindset
When we look at the data center architects who are leading the key trends in networking -- SDN software and bare-metal hardware -- they tend to be those who have embraced the core and pod mindset. They haven't rolled out SDN across an entire data center, but rather have used it for a generation of their pod design. These shops get the automation that comes from a centralized controller, without the baggage of a "least common denominator" design, and they can much more easily start small, experiment, and adapt to future requirements without forklift upgrades.
The ironic thing is that many enterprise architects describe their data centers, at first glance, as a three-tier design but with a little something different in the details. "What's going on in those three racks over there?" "Oh, that's just an exception to the design." "How about there?" "Another exception." For many data centers architects, the shift to a pod and core design is as much about a shift in mentality as a shift in the current network.
As data center architects accept this new pod and core mindset, more data center networks (regardless of their size) will design to start small. Data center network design has always been a science of making smart trade-offs across many priorities. Over the coming decade, it will be good to see "innovation" via core and pod design on that priority list.
Kyle Forster is cofounder of Big Switch Networks. Petr Lapukhov, 4xCCIE 16379/CCDE 2010::7, is a network engineer at Facebook. Opinions in this post are his own and do not necessarily reflect the views of his current and past employers.
New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to firstname.lastname@example.org.