Last week I wrote about some of the basics of designing a network for use with IP storage. While building in an appropriate level of redundancy and properly configuring VLANs and Spanning Tree are critical, implementing those design fundamentals barely begins to scratch the surface of the work necessary to build an exemplary IP storage infrastructure. After you've set the foundation, the next step is to configure your servers and storage to make use of it.
That typically involves determining how your servers and storage will leverage the redundant network you've deployed -- both in terms of offering redundancy and additional throughput. While you certainly can use a single NIC off each server and a single gigabit interface on the storage device, doing so will dramatically limit throughput potential and won't leverage the redundancy offered by a dual-switch architecture.
It's clear you want to use at least two dedicated storage network interfaces on each server that needs to attach to the storage and at least two -- if not more -- interfaces on the storage itself. That seems simple enough, but there are a lot of details to consider when configuring storage path redundancy and multipath throughput. To make matters worse, the best approach will vary wildly depending on the storage protocol, the specific storage hardware, and even the virtualization stack or server OS you're using.
To start, it's important to contrast the two most popular IP-based storage protocols: iSCSI and NFS. Though both protocols allow you to access shared storage across a standards-based IP network, they are dramatically different -- and require completely divergent approaches to offering network redundancy and optimizing throughput.
Fortunately, there are exceptions, but they generally require switches that can be "stacked" into a single logical switch with a single active control plane or switches designed to support building port channels across multiple distinct switches (check out Cisco's vPC and VSS for an example). The bottom line is that if you want to both be able to load-balance NFS traffic across multiple NICs on your servers and storage array while also offering switching redundancy, you need to have a stackable switch. (Cisco 2960S and 3750-Series switches are common in small-business applications, but many other networking vendors make switches that will fit the bill.) Otherwise, you can have only one or the other.
Assuming you've built your network using a pair of stacked switches, you can have the best of both worlds. However, simply constructing port channels (teams) on your switch ports and configuring the servers and storage to attach to them isn't the end of the process. As stated previously, even dynamic teaming (based on the 802.3ad or LACP standard) can't do more than load-balance connections onto different team members. You need to make sure you've configured your storage hardware and switch-load-balancing algorithm to account for traffic in the most advantageous way.
Though the exact approach will vary depending on what kind of storage hardware you use, this typically involves utilizing the "Source and Destination IP Address" team load-balancing algorithm combined with some IP address aliases on the storage hardware to ensure the best distribution of traffic across network links. Using this algorithm means that each NIC team -- whether on the servers, storage, or switches -- will hash the source and destination IP address for each packet it sees and use that hash to determine which link to send the traffic down. This is an oversimplification, but in a two-member NIC team, you can imagine source/destination IP combinations that "add up to" an odd number might be pushed down link No. 1 while those that result in an even number might go down link No. 2.
If you engineer your NFS traffic such that connections from a single server to your storage are distributed to two or more IP addresses on the storage hardware (typically on a per-volume basis), you can ensure that the traffic will flow over different NICs between the server and the switches and from the switches to the storage array. And if one of the two switches were to fail, the teams on the server and storage array will simply remove the failed links from the team and send all the traffic down the remaining links.
Putting it all together