If there's a mantra for deep IT, it might go like this: When something is working right, leave it be, lest it change its mind. That pretty much encompasses everything from the switchport on up, for every layer of the stack, and then some. However, there's a time and a place to start messing around with an otherwise perfectly functional system. In fact, you really should break stuff on purpose when the opportunity presents itself.
The primary driver for this bizarro IT behavior is obviously new builds. If you're standing up a new architecture or framework -- be it server, network, storage, database, whatever -- and it works perfectly the first time, you shouldn't stop there and call it good. Once the production flag flips on that build, you'll never have the opportunity to torch the system again until it's replaced or it breaks horribly.
[ Also on InfoWorld: Let loose your Chaos Monkey. | How to become a certified IT ninja. | Get expert networking advice from InfoWorld's Networking Deep Dive PDF special report. | Subscribe to the Data Center newsletter to stay atop the latest tech developments. ]
Now is the only time when you can run roughshod over your setup and configs, yank pieces out and put them back in, and see what happens when you twiddle knob X and press button B. Sure, you could do that in a dev environment, but depending on your task, you may not have a dev environment -- at the least, a replica of the production system is not truly the production system.
A case in point might be a new storage array. It's big and burly, with lots of bandwidth and disks, as well as redundant everything. You cable it up, flip on the power, log in, assign addresses, create a few volumes, and whatnot. You might have to dig a little deeper to make sure it's configured the way you planned, but you're probably going to find that, ultimately, storage is a simple creature, and everything just works the first time, almost by accident.
But rather than call it good and start migrating production data, spend a day or two doing nasty things to it. Misconfigure a route and see what happens. Does it actually try to route data over the management interface? Kick one of the controllers offline and make sure the failover works -- in exactly the way you expect it should. Put load on it and kick it again. Fail it forward and back a few times to get a sense of how this works; if and when it happens in production, seeing familiar log messages and status information can be of significant benefit.
If you're really interested in ensuring you know this gear inside and out, especially with hardware and software you've never used before, you should grind it all the way down to nothing and configure it all over again, starting from a reload of the firmware or operating system. I find that when I'm in a rush to get something into production, I fly through the configuration, hunting down parameters I know I need to configure, though in unfamiliar waters they may be located in seemingly odd places. Once I've found one, I move on to the next, and so on and so forth.
If I do this only once, when I need to modify or fix something weeks or months down the road, I have to go through the same treasure hunt all over again, because I never internalized the structure. This is especially true of more complex systems that have myriad contextual layers and tend to bury configuration elements deep within a UI.
By breaking it all down and configuring it all over again, I will solidify certain details in my mind that will speed up future work immeasurably. And if I break it in interesting ways before starting over, then I've developed a better idea of how it reacts during specific events. That can be a lifesaver in a time-sensitive outage.
In IT, we tend to work on many varied systems on a daily basis. We encounter numerous vendors, a multitude of different equipment and software, and consequently a wide range of ideas of how interfaces, configurations, and normal operation should behave. While the basic tenets of storage or networking are the same among vendors, the terminology may differ, and the configuration flow will almost certainly vary.
Couple this with the fact that in many cases, we can configure a piece of gear one day and never touch it again until months or even years later. If it's suitably different in configuration or operation from our more frequently used gear, we may need to learn it all over again. This is where that preproduction experimentation and repetition really helps.
Of course, petition for lab gear whenever you see an opportunity. Virtualization has made lab testing extremely simple for software, but hardware is a different beast altogether. The ability to pretest a major change to a production system is priceless, though many who hold the purse strings might disagree. Lacking the "luxury" of lab gear, take the time to mess with new stuff before production. It's the next best thing.
This story, "Build up your production system by first tearing it down," was originally published at InfoWorld.com. Read more of Paul Venezia's The Deep End blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.