It's true in IT, and it's true in any profession: To be good at the job, one must master troubleshooting. But as one experience I had demonstrates, it's critical to actually gather all of the pertinent information on the nature of the problem before running off half-cocked trying to fix it. Knowing some basic fundamentals of IT troubleshooting is helpful, too.
Our company was a large enterprise in the biotech industry, and in a merger we gobbled up a smaller company. They had lots of lab equipment that no longer had vendor support, but they had a very capable team of IT support techs who knew the equipment inside out.
After the merger completed, though, these techs were the first to get axed, and none of their knowledge on said equipment was captured. So we inherited from our fallen comrades a bunch of devices and software no one knew anything about. For a couple of months everything was fine -- but then, suddenly, it wasn't.
Try and try again
One piece of equipment was designed to monitor on a molecular level how two substances interacted over time. Then the software would produce a colorful graph with multiple lines that a biochemist would analyze and make notes on.
Problem: One day, the special software started displaying an empty white box where the graph should have been. The biochemist rechecked the sample and ran the test again. Same result. He checked all of the cable connections and did the test again. Same result. So he called one of the lab support techs to come take look at it.
This particular tech had a background primarily in lab work, but had transferred over to the IS support side not long before this. She spent a frustrating afternoon rechecking data connections, rerunning the test, and checking connections again. She had actually seen a similar issue with another system, and in that case, the problem was a security rights issue. To troubleshoot, she granted the login account running the software full blown, god-level, domain admin rights to everything on the network. No dice.
Next, a more senior analyst from our department looked into the issue. He had been with the company since the beginning and had picked up his technical knowledge as he went. Based on his experience, he suspected that there was a problem with the data cable between the machine and the workstation.
Of course, it was a proprietary cable, so he spent hours looking for a spare. But when he finally found one, it did nothing to fix the problem.
Then, he began to suspect there was an issue with the DB-9 serial port on the workstation. (Yes, this piece of lab equipment manufactured in the 21st century used a 9600 baud DB-9 serial port for its data connection.) So, he plugged the lab equipment into a USB serial adapter. No luck. Next, he special ordered a PCI serial port add-in card and had it shipped in overnight. Nope. So, after the couple of fixes he could think of didn't resolve it, he was stumped.
Leave no stone unturned
A few other people from various parts of the IT department looked into the issue with no success either. Management even considered calling up one of the former IT analysts to see if they would come in as a contractor to look at it. However, that bridge had been thoroughly burned by HR's sloppy handling of their termination. All the same, a major project was being delayed because of this problem, and no one liked the idea of spending $600,000 on a replacement piece of lab equipment for this issue.
I was a newly hired analyst with an A+ certification, so the basics of IT troubleshooting were still ingrained in my memory. I had an idea about the problem and asked my boss if I could take a look. My boss kind of laughed, "Sure, why not? We've tried everything else we can think of, so if you have some magic you can work, then go for it." OK.
I went to the lab computer, opened the C:\Program Files folder for this software, and went to (wait for it) the log files directory. The latest log entry indicated an issue with the display subsystem this application depended on. That made perfect sense because, about the time this broke, there had been an update of that system throughout the enterprise. I reinstalled the older version, and BAM! The issue was resolved. Total time to fix: 5 minutes.
My boss was completely blown away and beside himself, happy that the issue was fixed. From that time forward, everyone thought I was some sort of IT guru, but I kept telling them that all I did was check the stupid log file!
The takeaway: Log files are your best friend. You will find no better source of information about the cause of problems than the specific file where software keeps information about its problems. And it's helpful to start at the beginning of the problem with basic troubleshooting techniques.