The lost art of reading packet traces

When network problems seem to defy the laws of physics, you'd better know your way around the dark region below layer 3

Long, long ago, before I could tell the difference between a /20 and a /30, a mentor sat me down and asked me what I knew about Ethernet and networking in general. Back then, I wasn't familiar with much besides configuring an IP address, subnet mask, and gateway in a server or desktop. The rest was still magic to me.

He then spent the next 30 minutes revolutionizing my life with a succinct and accurate description of exactly what happens when an Ethernet frame is constructed, dropped on the wire, transmitted, and received. He discussed the SYN-ACK-SYN handshake, RST packets, collisions (this was in the dark days of 10Base-2), and duplex settings. We moved on to IP, VLSM, ports, and sockets -- the whole shebang. I retained maybe 20 percent of what he said, but immediately developed a thirst to learn the rest. I didn't know it at the time, but my future skills at designing and building large networks started right there, that day.

[ Also on Read Paul Venezia's classic, "Nine traits of the veteran Unix admin." | Or see if you qualify for the title of certified IT ninja. | Get a $50 American Express gift cheque if we publish your tech tale from the trenches. Send it to ]

Over the intervening years, I've done the same with a few younger folks who showed an interest in and aptitude for networking. But it seems to me that more network people skip the lower levels of networking knowledge and rely on their understanding of layer 3 alone. The ability to accurately calculate IPv4 subnet masks might be the limit of their abilities; what actually happens on the wire is a big gray area. In many cases, this also includes the supporting players of IP, such as ARP. When presented with packet trace output in Wireshark, they're lost.

The truth of the matter is that you can be very successful and build functional networks without ever knowing what ARP is or why GARP even exists. An understanding of basic TCP ports, NAT, and IP subnetting goes a long way in the IT world these days. Those skills are generally enough for you to construct viable firewall rules, spot an invalid subnet mask setting that's causing problems, and so forth. But when a problem goes out of that scope, you don't have the tools to dig deeper.

The bottom line is that you need to be able to read and dissect packet traces if you want to consider yourself a bona fide network troubleshooter.

Consider this bizarre networking problem related to VMware that I dealt with recently: When a Windows or Linux host was placed on a VLAN, communication with other hosts was fine. When an ESXi host was placed on the same network, there were problems communicating at a Layer 2 level. Even a Linux VM running on the same ESXi host had no problems and displayed an accurate ARP table, yet the ARP table on the ESXi box itself had the wrong MAC addresses for certain (but not all) hosts on the same segment. It was quite the head scratcher and specifically prevented the ESXi host from using port binding. When port binding was disabled, communications with other hosts functioned, but the ARP table was still wrong, which was perplexing.

After making doubly sure that other hosts had no L2 communication issues and included a valid ARP table, it appeared to be a problem with ESXi, since I could not reproduce this problem on Windows or Linux. I shot a few packet traces and sent them off to VMware, but I didn't have time to look at them right away. On a subsequent call with VMware networking gurus, we pored over the traces and found that for each ARP who-has from a host on that subnet, there was an ARP is-at reply from the actual referred host. However, close on its heels was an is-at reply from the VRRP virtual IP MAC address claiming that the IP address in question was actually at the MAC address assigned to the VRRP virtual IP itself.

1 2 Page 1