We've all been there: an IT problem so ridiculous that only a ridiculous solution can solve it.
It could be server on the brink of shutting down all operations, a hard drive that won't power up vital data, or a disgruntled ex-employee who's hidden vital system passwords on the network. Just when all seems lost, it's time to get creative and don your IT daredevil cap, then fire up the oven, shove the end of a pencil into the motherboard, or route the whole city network through your laptop to get the job done.
[ Also on InfoWorld: For more IT hijinks and absurd assignments, see "True IT confessions," "Stupid admin tricks," and "Dirty duty on the front lines of IT" | Cash in on your IT experiences by sending your war tale from the IT trenches to email@example.com. If we publish it, we'll keep you anonymous and send you a $50 American Express gift cheque. ]
Whether you're keeping vital systems humming, extending the life of faulty hardware in dire situations, or simply hunting for sport, knowing when to throw out the manual and do something borderline irresponsible is essential to day-to-day IT work.
Here are a few poignant examples of stunts and solutions that required a touch of inspired insanity to pull off. Add yours in the comments below, or send them to InfoWorld's Off the Record blog to keep a lid on the true identities of the people involved.
Ever wonder whether you could route an entire city network through a laptop running Fedora? Take a seat. Or better yet, leave the chair for your laptop. You'll need to balance it somewhere to keep city services up and operational through a two-day snowstorm.
It started out innocently, with a sizable city network core Layer 3 switch showing signs of failure and causing network instability. Errors logged to the console pinpointed the supervisor engine. Cisco was called around 10 a.m., and a replacement was slated for delivery by 2 p.m. As long as the current supervisor could keep the network more or less functional for four hours, help was on the way.
Cue Mother Nature and her blizzard machine. The call came back at noon. There was no way to get the proper part to the site before the following day due to the storm. As luck would have it, the production supervisor finally gave out minutes later, and the city's fiber network went dark. With no backup Layer 3 supervisor engine and several hundred ports in that switch, including all the servers and edge switch trunk links, there didn't seem to be anything that could be done until the next day, leaving city services bereft, including the police and fire departments.
A compatible Layer 2 supervisor for the core switch was located on site, and a plan was hatched. While this new supervisor wouldn't bring the network up by itself, it could run the switch ports on the core -- and a laptop could do the rest.
After a feverish half-hour configuring the switch and setting up 802.1q trunking and routing on a Dell Latitude running Fedora Linux, the city network was back up and running, with all traffic routing through a single interface on the laptop balanced on a chair in the data center.
As it turned out, the replacement supervisor took another day to arrive due to the inclement weather, leaving the city at the mercy of this bubblegum and duct-tape fix for almost 48 hours. It ran without a hiccup. Suffice it to say, a cold spare supervisor was procured following this event.
Set your time machine for 1995 -- when 5.25-inch hard drives contained as much as 9GB of data, if they were really expensive -- and preheat your oven to 350 degrees Fahrenheit.
Of course, back then you could fit an awful lot of information in 9GB, including the entire mail spool of a 5,000-user dial-up ISP. But when that disk decides to stop spinning up after a power outage, you might have a problem or two -- especially when it's discovered that the 2GB DDSII DAT drive has been stretching tapes for the past several months and nobody knew until now.
The problem wasn't access -- the disk presented to the SCSI controller just fine -- but it also didn't seem to spin up at all. It would whine and the motors would click, but the spindle didn't appear to be spindling. Lacking any other options, a trick from an even older era of MFM and RLL disks was put into practice: bake the drive.
An oven was set to 350, and the full-height 5.25-inch disk was placed on a cookie sheet in the middle rack. Bake for 5 minutes, remove, do not let cool, plug immediately into power and a controller, and turn on the computer. Voilà, the grease that had hardened around the spindle had loosened enough to permit the platters to spin and the data was recovered - and immediately copied to two spare drives.
Although it was speculated that the disk might be best served with a chilled Chianti and rice, we'll leave it to you to whip up and wolf down what is sure to be a culinary delight.
Here's one where a gifted but seriously antisocial network administrator put the "jackass" in "jackass IT."
After being abruptly let go, the network admin took with him the enable password to a wide array of production network gear. As soon as the problem surfaced a few days after his departure, he was contacted by email and asked for the password.
This prompted an expletive-filled missive that lambasted everyone in the department and other parts of the company, but alas, no password. The last sentence of the email, however, read, "If you really want the [redacted] password, it's on the network already, you [redacted] [redacted]."
He proved unreachable after that email, and his phone was disconnected. Desperate to find the password, admins searched all of his files on the network storage arrays, but came up empty. They looked in his cube, on his dev systems, everywhere they could think of, but the password continued to elude them.
At that point, one of the admins who had logged into the departed admin's Linux development system noticed a process called "pping" in operation. It was a compiled binary that had been running for quite some time and was apparently pinging one of the core switches every 5 seconds. He presumed that it was some form of connectivity testing that the admin had been running and moved on to other things.
The revelation didn't happen immediately, but a day or two later he thought to run a packet capture on the network traffic departing that dev system and collected several packets of this odd ICMP ping traffic. Peeling the packets apart in Wireshark, he noted that there appeared to be a custom payload pad in the packet. A few minutes later, he'd peeled out the 16 bytes in that pad and translated them from hex to ASCII. A minute after that, he successfully logged into the core switches.
If nothing else, the fired network admin was telling the truth -- the password was definitely on the network, but only where a jackass might put it.
Note to all jackass IT practitioners: Document your hacks. After all, you never know when you might need to repeat your feat of asinine brilliance -- or prevent it from being undone.
It was a real head-scratcher: a simple RAM upgrade to a production server that left the server unable to power up. There were no POST or beeps from the mainboard -- just the silence surrounding a very big problem.
Troubleshooting step No. 1: Put the original RAM back in -- no effect. Pull and reseat every interface card -- all cards snug in their slots, and no change in behavior.
There are certainly instances when computers give up the ghost for no apparent reason, but this was a relatively new server that had never shown issues in the past, and a process as benign as RAM replacement shouldn't have caused such a major problem.
Anyone who's been in this kind of situation knows the sort of questions that were asked: Was there anything you noticed about the server when you opened it up? Did you take anything else out of it or put anything else back in? Were you drunk?
The answers to all the questions were no -- except one. Upon reflection, the admin who performed the RAM upgrade suddenly remembered having seen a rubber eraser on the mainboard when he opened the cover, and that he'd removed it before putting the server back in the rack. This caused a stir as everyone contemplated how a rubber eraser could have contributed to the well-being of the server or its downfall.
Then another admin asked where the eraser was. It was found on a desk and inspected. It was discovered that it was indeed a rubber eraser, with a crease down the middle. The same admin walked over to the server, opened the cover, and set the eraser on top of the SCSI adapter, where the crease seemed to fit perfectly, and replaced the cover. He hit the power switch and the server booted immediately.
It seems that an unknown admin (at least, no one would cop to it) had solved the problem of a popping SCSI adapter by using the eraser as a shim between the case and the card and hadn't mentioned that fact to anyone else. Naturally, the long-term fix for this problem was to tape the eraser to the underside of the server cover, next to a note saying, "DO NOT REMOVE ERASER."
Customized safeguards that keep critical systems running are fertile playgrounds for daredevil IT hacks. Built into systems to prevent major problems, all too often these safeguards become major problems themselves, and sometimes it takes the IT equivalent of open-heart surgery to keep vital systems running -- with little more than a laptop and a few lines of Perl.
It was a peculiar situation. The automation system at a major manufacturing plant required constant communication with a server that controlled plant operations. Unfortunately, the server responsible for the heartbeat activity of the manufacturing system was ailing, throwing I/O errors willy-nilly, and showing all the signs of a rapid and violent death. Dozens of workers on the shop floor were left to the mercy of this server, trying to meet a critical deadline for product delivery, with no way to shut down the manufacturing process for even a moment, as it took hours to get everything back up to speed.
The server sent no instructions to the shop gear, but if the shop gear sent a heartbeat that wasn't acknowledged, the system would go into safety mode, shutting down all operations. The heartbeat process was still running in RAM on the server, but given that the rest of the server was well on its way toward leaving the building, it was only a matter of time before the entire shop would shut down. It was time to roll up the sleeves and get into the guts of the system.
Quickly, a monitoring port was configured on the switch connected to the server, and the heartbeat traffic was surveilled for content. It turned out that the heartbeat was a simple TCP connection with a static challenge and response that occurred every 60 seconds.
Within a matter of an hour, start to finish, a Perl script was written to answer TCP connections on that particular port and emulate the heartbeat activity of the server. Then, a second after a heartbeat transaction was seen, a laptop running the Perl script was set to the IP address of the server, and the dying server was pulled out of service.
For the next day and a half, production continued without slowing and the production server was rebuilt on new hardware. All the while, that Perl script dutifully sent the A-OK to the manufacturing system, which was none the wiser.
A sizable medical records company, with offices dispersed across the United States, performed a data center upgrade that brought state-of-the-art Citrix servers to handle all remote-office desktop sessions. All was well until it came time to roll out new thin clients to every office in the company -- all 48 of them.
The new clients were built on embedded Windows NT, as was the style at the time, and weeks were spent perfecting a gold image to push out to all the clients when they were deployed. The first dozen offices went like clockwork, with the new clients performing exactly as advertised. There was much rejoicing.
But the next office, which was located in the Central time zone, proved to be a problem, as all the clients were configured for Eastern time, making the gold image always one hour ahead of the local time. You'd think a simple solution would be to set a DHCP option for the proper time zone in those offices. Sadly, embedded NT didn't pay attention to that option. The rollout was halted, and the wheels starting turning on a solution to this problem.
Compiling a gold image for every time zone would result in four identical gold images save for a single, minor setting, which would in turn prevent the clients from being shipped site-to-site without reimaging -- not an option.