Ten years ago, the WAN was the exclusive domain of frame-relay communication and leased lines. Today, a WAN may use anything
from IPSec connections and cable modems to MPLS (multiprotocol label switching) tunneled over multimegabit networks. The methods
may have changed, but the challenge remains the same: How do you make a WAN seem like one big LAN?
Simply throwing more bandwidth at the problem won’t solve it. MPLS, as described last January in Paul Venezia’s “Supercharge Your WAN”, can go a long way toward improving WAN performance, but the root cause of the problem lies well below the MPLS level.
Other forces are at work conspiring to rob your WAN’s performance and response time; latency, congestion, chatty applications,
and traffic contention all affect in how the WAN may respond at any given time. These are the dirty secrets of WAN performance
that are usually swept under the rug — if they’re even detected at all. Most of the time, the focus is on the size of the
pipe, not on how the pipe is being used.
Size doesn’t matter
In the world of the WAN, the size (that is, bandwidth) of the link often makes little difference in overall performance, particularly
when the link is a long one (“long” being more than a few hundred miles). Part of the problem is that TCP and other protocols
weren’t intended to function beyond the local-network edge. “The reason why long-distance networks don’t work is that the
protocols weren’t designed to do that,” explains Dick Pierce, CEO of Orbital Data, which sells WAN-optimization appliances.
“They work pretty well on a local basis, and in some cases even short distances. But wide-area networks don’t. The whole history
of how this market segment [WAN optimization] developed was on that basis.”
The problem is that the protocols’ efficiency suffers as latency increases. Latency is based on the speed of light and the
overall length of the WAN link, something we have little control over. Don’t think speed of light is a factor? Just experience
the latency in a satellite link. (A few years back, one could have argued that routers and switches added significant latency
to WAN links, but most backbone equipment today works in the sub-millisecond range.)
Latency affects network protocols in various ways. TCP, for example, uses ACK (acknowledgement) packets to help provide reliability.
By receiving an ACK from the receiving endpoint, the sending system knows the packet made it without any errors. But on high-latency
links, waiting for ACKs chokes throughput.
Thus, latency is one of the biggest — if not the biggest — killer of WAN performance, both in response time and overall throughput.
Long fat networks (LFNs) run at T1 speeds and higher, but suffer greatly from the inherent latency of the link. For most U.S.
terrestrial links, the average round-trip time is approximately 150 ms, with satellite links averaging approximately 800 ms.
Global links vary greatly, but it isn’t uncommon to see 200 ms to 400 ms or higher RTTs (round-trip times). And increasing
the bandwidth doesn’t help.
In fact, due to latency, LFNs are largely underutilized. “The reason people built long-distance pipes that turned out to be
empty was they were trying to get predictable application performance by overprovisioning,” Orbital’s Pierce says. “Yet the
inherent design of the networks — that they weren’t designed for long distance — was the problem.”
Rush hour
Congestion also affects WAN performance, of course. Congestion occurs when no bandwidth-allocation policy has been applied
to traffic on the WAN. Traffic flows can be bursty, such as when one user tries to retrieve a large
e-mail attachment while another user logs in to a CRM portal. With no bandwidth management, the download can bring the smaller
link to a grinding halt.
P.G. Narayanan, CEO of Allot Communications, believes that much of the congestion problem can be solved by applying QoS to
the traffic. “The problem most of these networks have, though, is temporary … that second, or that minute it’s congested,
you can get away with just prioritizing applications. So what you can do is put a gigabit box at the central site to prioritize
those applications, the critical applications, on a temporary basis, and you can avoid the congestion, and all other times
you’re OK anyway,” says Narayanan.