It could be called the “ignomoment;” the split second following a definitive action when you realize you've just made a tragic mistake. For network administrators, this means the difference between going home at 5 p.m. or 5 a.m. The truth is, despite incidents of careless backhoe drivers pulling up fiber bundles or hurricanes bringing down the power lines, administrator error is the most common reason that a network fails.
“There’s always a reason why something doesn’t work,” says Richard Willmott, market manager at IBM Tivoli. “Finding that reason is the hard part.” Too often network administrators are up against a wall, lacking the budget, lab, and time necessary to determine fully the ramifications of modifications made to an internal routing protocol configuration, or accurately determine the impact of a large-scale access list modification on live traffic. Although there is no way to eliminate human error, there are certainly ways to abridge its effects.
A solid change-management process, along with proper training and sufficient IT resources, can turn that sinking feeling brought on by disparate systems and outdated tools into guarded confidence. Then there’s ITIL (IT Infrastructure Library), which is a collection of best practices for IT management. It describes in detail the steps necessary to institute various management practices to reduce problems and gain visibility into network infrastructures. Lastly, there are plenty of vendors whose products aim to streamline and automate the change-management process. Nothing is fail-safe, but that’s no excuse for not trying.
Software developers have the edge when it comes to testing and implementing changes. It’s all but unheard of to find developers writing and distributing code without any form of testing. A lab for a developer can be a laptop, and a full-scale software development lab infrastructure can be had for the cost of a few servers.
Yet, for the devices delivering the signals, changes of any scale are typically undertaken without the benefit of prior testing. Why? Because it’s nearly impossible to test every aspect of proposed network configuration changes thoroughly. Rather than simply requiring a few servers for a development environment, a network lab requires a wide variety of expensive network hardware to truly mimic the production environment. This means simulating TDM circuits and frame-relay networks, ISDN lines, and any other link types in use on the production network. Simple tests can be accomplished with a subset of the production gear but the costs are high and confidence that the proposed change will function as expected can waver.
For many infrastructures, there are two paths available to deal with this problem. One is a lab environment that can simulate portions of the network; the other is strong change-management policies and change-management software to back up those policies. It’s one thing to inadvertently cause network disruptions, it’s quite another to realize that you have no backup of the functioning configuration and must replicate detailed parameters from human memory or outdated configurations.