The true grit of IT troubleshooting

Tackling big problems as they emerge takes nerves, luck, and a gambler's instinct. Armchair quarterbacks need not apply

Troubleshooting brings out the armchair quarterback in most IT pros, especially during emergent and highly visible outages. After all, we love a tough problem, and it's easy to gloss over a situation from a disassociated position and say you could have solved it better, faster, or with less fallout.

We even do it to ourselves: How many times have we said that if we only knew then what we know now, we would have done differently? There is no disassociated position when troubleshooting. We only know what we know in that moment, so we forge ahead with our best guess, balancing the risks of our actions against the promise of a resolution. For the most part, our gambles succeed; otherwise we wouldn't be in the position to do much troubleshooting again.

[ Cash in on your IT stories! Send your IT tales to If we publish it, we'll keep you anonymous and send you a $50 American Express gift cheque. | Get the latest practical data center advice and info in Matt Prigge's Information Overload blog and InfoWorld's Data Center newsletter. ]

Dealing with emergent problems means making a devil's choice between a fix and forensics. Speed up the time required to fix a particularly gnarly problem, and you might end up destroying data that could reveal the problem's actual cause during a postmortem. Seasoned IT pros try to have our cake and eat it too. But in the end, returning to normal operation trumps collecting data for forensics, so it's not always possible to retain as much data as we might like to get.

Unfortunately, this often means reliving the issue again.

The true test of IT troubleshooting

Those who have never dealt with a true IT emergency can't understand what it is like, largely because it's almost impossible to describe the situation accurately. Phrases like "firefighting" or "life-saving" are bombastic to the layperson, who generally thinks of computer problems as being solved by unplugging a DSL router.

But when a blocking problem appears out of nowhere, especially during what might otherwise have been a routine procedure or when no work was under way at all, the same basic response takes hold of those of us who have to fix it. Every second that passes is a further indictment on the IT operation as a whole. There are no breaks, no timeouts; there's just a problem looming over all else and a brain or two spinning like centrifuges, looking for a way over, under, around, or through the problem, causing as little collateral damage as possible.

A better analogy might be that dealing with a major IT problem is like being locked in a room without food or water -- all other thoughts besides getting out of that room evaporate. There is and must be a singular focus on finding a key and working yourself free of the confinement.

1 2 Page 1
Page 1 of 2