Making do with a single switch
One way to do this is to leverage a best-of-a-bad-situation scenario with a single 10G switch. A small-to-medium data center switching infrastructure may have few or no 10G ports at all, and really, the only 10G necessary is for the virtualization hosts. This leads many shops to pick up 24-port 10G switches to run just those links.
If you do this, it's a very wise idea to spec your hosts with not only four 10G ports, but at least two 1G copper ports and ideally four 1G copper ports. It's unlikely that you'll need more than 10G per front-end and storage network, but you want those links to be redundant. You can set up the active/passive failover across both 10G ports linked to the storage network, say, or create an aggregate, though you probably won't get much use out of it. All of these links will terminate at a single 10G switch. This means that you are only protected against isolated problems like a 10G port frying on the switch or the host, or a cable going bad or coming unplugged. The redundancy will not be of any use if the switch hiccups.
While you want to protect against those problems, you must also protect against a switch failure, though with only the one 10G switch, that becomes a challenge. This is where those 1G links come into play. Though there's no need from a bandwidth or performance standpoint to use 1G links for production traffic on a virtualization host with 10G interfaces, you can configure most hypervisors to use 1G interfaces as standby interfaces. These should be connected to the main data center network directly, or through a dedicated 1G switch. If possible, they should be bonded to aggregate both 1G paths together.
When configured properly, all traffic during normal operation will be routed through the 10G links and the 10G switch, but a switch failure there will cause the traffic to be re-routed through the 1G links that are otherwise dormant. Naturally, it's imperative that the storage be configured to allow connections across its own set of links to that network, because we need to maintain the storage paths in the event of that failure.
Should that one 10G switch fail, the performance of the cluster will necessarily drop substantially, but the bits will still pass, and the virtualized infrastructure will still be accessible. This is the goal, and it shouldn't add much to the budget, especially if existing 1G switching can be utilized.
This may seem basic, but there are many who think that the raw bandwidth provided by 10G switching obviates the need for 1G connections. In a fully redundant 10G implementation they may be right, although I would still tend to err on the side of caution and configure backup management networks on a separate 1G network spanning all cluster members.
Switching failures are generally rare, and I have my share of switches with five-plus years of uptime. However, that's never a guarantee. When we're putting all of our eggs in the virtualization basket, we really do need to ensure that the basket is as strong as we can make it.
This story, "Virtualization roulette: One 10G switch is never enough," was originally published at InfoWorld.com. Read more of Paul Venezia's The Deep End blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.