Following Hurricane Sandy, let's say you've been asked to set up replication to a disaster recovery site. Your company has chosen to back up its core operations located in Boston with space in a collocation center in Chicago -- about a thousand miles away. You've done the math and determined that you'll need a 500Mbps circuit to handle the amount of data necessary to replicate and maintain recovery-point SLAs.
As you get your Chicago site and connectivity lit up, you decide to test out your connection. First, a ping shows that you're getting a roundtrip time of 25ms -- not horrible for such a long link (at least 11ms of which is simple light-lag). Next, you decide to make sure you're getting the bandwidth you're paying for. You fire up your laptop and FTP a large file to a Windows 2003 management server on the other side of the link. As soon as the transfer finishes, you know something's wrong -- your massive 500Mbps link is pushing about 21Mbps.
Do you know what's wrong with this picture? If not, keep reading because this problem has probably affected you before without your realizing it. If you decide to move to the cloud or implement this kind of replication, it's likely to strike again.
First, understand that the answer is related to Transmission Control Protocol (TCP), one of the two main IPs that most applications use to communicate over the Internet. (The other is User Datagram Protocol, or UDP.) What matters here is that TCP has built-in congestion and packet-loss detection capabilities whereas UDP does not.
The size of the TCP window is variable, which is the key to TCP's ability to deal with congestion on the open Internet. When two network stations start a conversation, the window starts very small (perhaps the size of a single packet, though usually larger). Each time a window's worth of data is sent successfully, the window size is doubled. This process continues until either the maximum window size is reached or packets are lost. If just a couple of packets are lost, the window size is halved, then increased linearly until loss is detected again. This is called congestion-avoidance mode. If a lot of packets are lost in a row, the whole process restarts.
In the Boston-Chicago example, what limited the throughput to 21Mbps? It was the fact that most Windows systems have a default maximum TCP window size of 64KB. If the sending station (your laptop, in the example) has to spend 25ms waiting for an acknowledgement after sending every 65,535 bytes worth of data, 21Mbps is the maximum throughput it can achieve -- regardless of how large the circuit is or whether it's congested. Here's the formula:
[ TCP Window in Bytes ] * 8 / [ Latency in Seconds ] = [ Maximum Throughput in Bits per Second ]
To combat this, the RFC1323 standard defines a method of providing a binary multiplier for the originally 16-bit TCP window size so that TCP windows can be scaled up to 1GB -- providing a large-enough window to easily saturate a 10Gbps Ethernet connection with a single TCP session.
However, this feature, called TCP window scaling, is not always turned on -- though it is usually supported on modern operating systems and networking gear. In the case of Windows Server 2003, you need to manually modify a registry key to raise the maximum TCP window size and thus get any benefit from window scaling. To perfectly use a 500Mbps link with 25ms latency, you need a max TCP window size of around 1,562,500 bytes (about 24 times the Windows 2003 default). Here's the formula: