You can never get too comfortable in IT. Even the best-implemented plans and top-of-the-line tech can fizzle in the face of shockingly mundane details, and all eyes are on IT until the problem is resolved. Here's a recent story of a problem we dealt with at our company that had an almost comical ending for all the grief it caused.
Our company is spread out across five locations. The first is a headquarters office. The other four are scattered throughout a remote city an hour south of headquarters. About a year and a half ago, we set up a microwave point-to-multipoint system linking the sites and significantly boosting bandwidth over the discrete T1 circuits that had been the sites' only connection to the core.
[ More from InfoWorld about the IT profession: The 9 most endangered species in IT. | Follow InfoWorld's Off the Record on Twitter for tech's war stories, career takes, and off-the-wall news. | Subscribe to the Off the Record newsletter for your weekly dose of workplace shenanigans. ]
We resolved several issues right after installation by upgrading firmware and replacing a bad run of equipment. Otherwise, it worked fine.
Soon after we set up the microwave system, we moved on to convergence. We had to migrate all of our internal telephone signaling traffic to our data network in order to keep our phone system current with the manufacturer's latest offerings.
A year later, both systems had been quite stable -- until two days ago.
It was just after lunch when the call came. The entire networking system in the remote city had gone berserk. One site was completely cut off, and the three others were having major connectivity issues.
It fell to me to troubleshoot because I was the principal on the microwave installation; also, I was the only one around who was part of the convergence project. I tried this, tried that, dialed into the system remotely after hours to make changes that couldn't be done during business hours. Nothing worked.
I forced everything to reconverge over the T1 circuits, which fortunately hadn't been disconnected and made plans to drive down to the city the next day. During this entire time, I was also dealing with the many people clamoring for results.
I got together supplies to deal with the most likely scenarios. At the top of the list was the site where the base station was located because that was the common point between all the failing locations. Also, as of the second morning, we couldn't even log into the base station remotely. The second most likely culprit was the site that was completely cut off.