At VMworld 2011, VMware made a big deal about a new networking protocol it has developed in cooperation with server and networking heavyweights like Arista, Broadcom, Cisco, Citrix, and Red Hat. That protocol, Virtual Extensible Local Area Network or VXLAN, promises to ease many of the networking challenges experienced by enterprises and providers attempting to build very large, multitenant networks, such as big private clouds and public IaaS (infrastructure-as-a-service) clouds.
While the introduction of VXLAN was greeted with a great deal of initial enthusiasm, I fear that many have started to imagine it to be a solution to many problems for which it was never designed. Worse, I've come to the realization that the networking community as a whole isn't adapting quickly enough to the rapid innovation taking place in the virtualization and cloud sector -- and VXLAN is a symptom of this failure to deliver open, well-architected innovation.
What is VXLAN?
In the most simplistic sense, VXLAN is a means for a virtualized environment to dynamically provision a very large number of isolated L2 (layer two) networks without requiring any changes to the physical network that serves it. This is accomplished by tunneling L2 traffic within UDP multicast packets that are exchanged between virtual hosts -- essentially creating virtual LANs that are only visible to the virtual hosts and virtual machines that run on them.
At first glance, you might ask why this is even necessary -- after all, isn't L2 network isolation exactly what 802.1q VLANs were designed for? You'd be absolutely right. However, 802.1q has one serious limitation that large public cloud environments and public IaaS infrastructures are quickly running afoul of: The field that defines VLAN IDs in the 802.1q standard is only a 12-bit number, meaning that a maximum of 4,096 VLANs can be supported. This might sound like a lot until you consider that a reasonably sized IaaS provider might host thousands of customers, each of which generally require many individual isolated networks to securely separate workloads.
Back in the VMware Lab Manager days, VMware needed a very quick way to dynamically generate isolated networks that could reach between multiple virtual hosts to support often short-lived lab environments. Since asking network admins to create new VLANs every time an isolated lab environment was created or trying to interface with a myriad of different networking vendors' hardware to automatically create them would be impractical, they came up with their own proprietary L2-in-L2 tunneling protocol, which later came to be known as vCDNI.
For the small scale that it was generally leveraged in combination with nonproduction Lab Manager environments, vCDNI wasn't perfect, but it generally fit the bill. However, when vCloud Director hit the stage, VMware tried to leverage this same protocol to solve the same problem, but for production-grade loads. Here, the limitations of L2-in-L2 tunneling became extremely apparent. Not only was it a requirement that all of the hosts be located on the same L2 broadcast domain, but the fact that all traffic exchanged between hosts uses the same L2 MAC addresses makes it very difficult if not impossible to effectively leverage multilink Ethernet load balancing -- a serious problem in any busy production environment.
VXLAN neatly avoids those problems while also having the potential to become an industry standard since VMware and its partners have submitted it to the IETF. However, it's still tunneling through the physical network, which means that any tools you might use on your physical network to inspect traffic aren't going to have visibility into it. That could have a range of effects, such as making it difficult to troubleshoot with tools that aren't yet VXLAN aware or making it difficult for physical security devices like IPSes to be effective without serious upgrades.
The fact that VXLAN is an L3 protocol -- meaning that it can be routed between multiple IP subnets -- has led some to believe that VXLAN may also be the solution to another problem that has started to creep onto the virtualization stage: cross-site workload migration. You could theoretically stretch a VXLAN implementation across two sites, thereby allowing you to vMotion VMs from one site to another while retaining L2 connectivity. However, VXLAN simply isn't a good solution to this problem as it has the potential result in very poor WAN bandwidth usage due to cross-site routing issues (trombone routing) and would not react gracefully to the loss of an intersite link, both being real requirements for any stretched cluster implementation.
What's the real solution?