Review: Cisco ACI shakes up SDN
Hands-on with Cisco’s highly scalable data center network fabric driven by -- surprise -- a completely open API

-
Cisco ACI 1.1(3f)
The promise of software-defined networking -- namely simpler, more flexible network operation through centralized, software-driven control -- has been tangible for a few years now, though like many new concepts, it has suffered from misunderstandings and confusion due to amorphous definitions thrust upon it by eager marketing teams. On top of the many definitions, we’ve also seen a number of different approaches, with the OpenFlow model leading the way from the beginning.
Leave it to Cisco to come up with yet another way of doing SDN. Cisco’s Application Centric Infrastructure (ACI) is an SDN solution at odds with the OpenFlow approach, and as such, it diverges from the original direction of the OpenDaylight SDN project, of which Cisco is a founding member. Cisco continues to be a large presence in the OpenDaylight initiative, but is clearly favoring its own technology with ACI.
ACI communicates with network devices using OpFlex, Cisco’s operations control protocol, not OpenFlow, which is the OpenDaylight standard. The critical distinction is that OpFlex places the actual network configuration decisions in the network, not in the controller. The controller abstracts the higher-level configurations instead. You might imagine this as the controller telling the fabric what needs to be done, not how to do it. The fabric is responsible for implementing the controller’s instructions and reporting back on success or failure.
Cisco claims this approach scales better than OpenFlow, which relies on the controllers to perform network configuration tasks. It also allows users to configure the network in terms of application requirements, through a higher-level policy model, rather than worrying about the underlying configuration details. Cisco has OpFlex agent support commitments from Microsoft, IBM, F5, Citrix, Red Hat, and Canonical among others, and it has proposed OpFlex as an IETF standard and as part of OpenDaylight.
While ACI’s SDN internals may operate with OpFlex and not OpenFlow, it’s certainly not traditional networking. It’s hopefully also an indication of Cisco moving toward more open integration, as ACI is built around a complete RESTful API and relies heavily on the Python programming language, providing an open source SDK and tools publicly available on GitHub.
Nuts and bolts
ACI is designed as networking for data centers -- very large data centers. It’s based on the Cisco Nexus switching fabric, with 10G, 40G, and 100G connectivity in a spine/leaf layout. A fabric could be as small as a few nodes or as large as five spines, 200 leaves, and 180,000 endpoints -- a configuration Cisco currently has operating in its lab. ACI is meant to carry heavy loads.
A topology overview of the large-scale ACI infrastructure in Cisco's lab with five spines and more than 200 leaves.
ACI is also meant to reinvent the concept of the data center network, introducing a model that is essentially L2 and L3 independent, instead using the concept of tenants to compartmentalize traffic intrinsic to a logical grouping of applications and services, while using VXLAN outside of those groupings. This means you can use the same IP subnets, VLANs, even MAC addresses within multiple different tenants without any conflicts. Each tenant exists on its own logical island within the infrastructure, and each tenant can house applications that are segmented into tiers, or logical Endpoint Groups (EPGs). Communications between EPGs are governed by hierarchical policies within ACI.
EPGs are not servers or VMs, but essentially subnetworks that contain those resources. These subnetworks could be assigned to VLANs configured on physical switchports for bare-metal servers or within virtual switches within hypervisors. They are designed to contain traffic for specific server instance types, such as Web servers or database servers.
As an example, we might have a tenant that’s a business group, perhaps a finance group. We could use ACI to define an application and create EPGs that will contain the various tiers of that application stack -- Web, app, and database. We define a subnet and VLAN for the EPG and specify which resources it should attach to, such as which VMware vSphere cluster or domain. ACI does all the work to create that network object where needed, such as within the virtual distributed switch on the vSphere infrastructure and within the Nexus fabric itself. Once our EPGs are created under our tenant, VMs or bare-metal resources can be turned up and assigned to those networks.
However, we also need to explicitly enable communications between those EPGs, and even between hosts within the same EPG. In our example of a three-tier app, we might need to allow traffic on a specific TCP port from the Web server EPG to the app EPG, and database traffic from the app EPG to the database EPG. We do this using a concept called a contract -- simply a set of rules applied between EPGs that dictates what traffic should be allowed to pass between hosts residing on those EPGs. We can also bring in rules here that will apply to outside devices, such as firewalls and load balancers. ACI can integrate with several third-party L4 through L7 devices, from vendors such as F5, Citrix, and Palo Alto Networks, in addition to Cisco’s own solutions.
A detail view of a three-tier app contained in Endpoint Groups, with traffic contracts applied between them.
We can then rinse and repeat for every application that the tenant requires, Everything we define within that tenant definition will be contained under that tenant umbrella. We can have duplicate subnets, duplicate VLANs, and even duplicate MAC addresses under other tenants without a conflict. Further, we can have EPGs that share the same subnet or supernet and still enforce traffic rules between hosts. Thus, our three-tier application could have Web, app, and database servers with sequential IP addresses, and our traffic management contracts would still apply.
Tenants might be business groups, as in our example, or customers in a hosting or service environment, or simply a collection of logical groupings best suited to the deployment. Since each tenant exists within its own silo, there is no overlap with other tenants at the network level. That said, there are special areas that can be used to distribute common services to multiple tenants. These services might be DNS, NTP, and directory services that are used by all tenants, for instance.
All of this back-end configuration is handled by ACI controllers, called Application Policy Infrastructure Controllers (APICs). These are physical servers that operate in a cluster for load balancing and redundancy purposes. Generally, you will have at least three APICs per ACI fabric.
The APICs sit outside the data path, and they are not required for the fabric to function. If there are no available APICs, an ACI fabric will continue to operate normally, but no changes can be made. The APICs serve up the configuration of the fabric, provide an administration Web UI, and host the RESTful API that ACI is built around. Any one of the APICs can serve Web UI or API requests, and it functions in concert with the other controllers. ACI configuration and state data are stored in SQLite on the controllers and sharded across the controller cluster.
A view of the capacity dashboard for a large-scale ACI infrastructure running in Cisco's labs. Note that the operators have eclipsed their stated limits in three categories.
The ACI fabric makes all traffic flow decisions based on lookup tables maintained in the fabric itself, both in the leaves for local endpoints and in the spine for the remainder of the fabric. For every packet present on the wire, the fabric makes a decision on where to send that packet based on those rules, and it does so at wire speed. This is how ACI evades the boundaries of traditional IP subnetting and VLANs, and how east-west traffic can be controlled via contract configuration, even between hosts on the same subnet.
Further, this design eliminates the need for Address Resolution Protocol (ARP) and broadcast flooding, so that traffic is quashed by default -- the fabric is already aware of the location of every endpoint. There are provisions to allow ARP and broadcast flooding at the bridge domain level if it’s required for a particular application.
At a high level, this is what ACI is -- it is a method of building and maintaining a networking fabric that dispenses with the concepts of traditional networking, and offers significant software control, automation, and wire-speed switching on a very large scale.
Building the fabric
Implementing ACI is surprisingly simple, even in the case of large-scale buildouts. The most tedious part is cabling, but that’s typical of any fabric build. The Nexus 9000-series switches used in an ACI environment run a modified OS dubbed iNXOS that has the required hooks for ACI management.
A topology overview of an ACI fabric with three APICs, four nodes, and two spines.
Once you have the Nexus spines and leaves cabled up properly, with the leaves connected to multiple spines and the spines connected to one another, you connect and boot the APIC servers, which are built on Cisco UCS 1U server hardware. The APIC boots to an extremely brief CLI configuration script that asks for basic IP subnet and name information, and if the current server is the first member of an APIC cluster.
The first APIC controller begins autodiscovery of the rest of the fabric. This happens quite quickly, even on large fabrics. Once autodiscovery is complete, the ACI Web UI displays the logical layout of the entire fabric, and the solution is ready to be configured. Meanwhile, the other APIC controllers can be booted and assigned IP addresses via the initial setup script. They will then join the APIC cluster automatically.
Assuming the cabling is complete, the entire process of standing up an ACI fabric might take only a few minutes from start to finish. The addition of third-party elements such as load balancers and firewalls requires a functional fabric and is done via the Web UI or API.
Configuration and management
This is where ACI brings some true surprises. Every element of ACI can be controlled via the RESTful API. In fact, Cisco has ACI customers that do not use the CLI or Web UI administration tools, but instead have completely scripted ACI using only the API. Further, Cisco has released a full Python SDK that makes scripting ACI straightforward.
This should be made abundantly clear: This isn’t an API bolted onto the supplied administration tools, or running alongside the solution. The API is the administration tool. The CLI and Web UI all use the API to perform every task. In fact the CLI to ACI will look very familiar to Cisco admins, with the usual IOS privilege and configuration modes, but it’s all a Python script that uses the API.
If you’re involved in open source and modern development communities, this may not seem like a big deal, but for Cisco this is a very significant step. Not only is ACI an extremely open architecture, but Cisco is actively contributing and maintaining code in conjunction with its customers and others interested in ACI. Cisco’s GitHub repository contains commits from a number of developers who aren’t employed by Cisco. Cisco is actively supporting a community gathering around ACI, and the community is already reaping the rewards of Cisco’s open stance. This is a big step for Cisco and a very positive position for ACI.
Using the SDK, the meat of a Python script to stand up a new tenant with basic functionality might be only a few lines. The Python methods are well laid out and easy to grok. Plus, Cisco has built the entire API and SDK reference into the APIC Web UI, so they are easily found. Cisco has built very handy development tools into the ACI Web UI as well. For instance, there’s an object browser that allows developers to search through the ACI infrastructure and view all elements of any object for use in scripting.
This sample Python code will build a tenant in ACI via the Python SDK. The code was generated programmatically by passing a JSON object through an open source Python script that Cisco maintains.
Another tool, called the ACI Inspector, is essentially a live debug of all requests coming into the ACI API. Thus, you can open this tool, see exactly what requests are being made to the API, and easily replicate them in code elsewhere. Further, you can peel out the POSTs to the API and grab the JSON passed through. Then, using a tool called arya, which is available in the ACI toolkit on GitHub, you can incorporate that JSON data into functional Python code to re-create that event using the Python SDK. Thus, you can perform an action in the UI and have a functional script to re-create that action in minutes.
This is only one example of ACI's openness and easy scriptability. The upshot is it will be straightforward to integrate ACI into custom automation and management solutions, such as centralized admin tools and self-service portals.
Troubleshooting and maintenance
The policy-driven nature of ACI may seem a bit too hands-off for some network admins. With so much of the actual network configuration abstracted away and hidden within the fabric, problem detection and troubleshooting tools become critically important.
InfoWorld Scorecard |
Management (20%)
|
Performance (20%)
|
Reliability (20%)
|
Scalability (20%)
|
Interoperability (10%)
|
Value (10%)
|
Overall Score (100%)
|
---|---|---|---|---|---|---|---|
Cisco ACI 1.1(3f) | 10 | 9 | 10 | 10 | 9 | 9 |