The other day, I was wading through my unreasonably active email and decided to look at a subfolder I haven't checked in maybe a year. It's populated with inbound email matching certain parameters I generally don't care much about -- not spam, but system messages back to postmaster from a busy mail server. I suppose I should have been more conscientious about checking this folder, but generally speaking, most of the stuff coming into this mailbox is spurious.
Lo and behold, I discovered more than 20,000 emails, the vast majority of which were returns from a cronjob that someone else had implemented years ago. This cronjob was now failing, and the report the cronjob created couldn't be delivered because the recipient domain no longer existed, and the mailer error came back to me, the postmaster.
[ Also on InfoWorld: Nine traits of the veteran network admin | Get expert networking advice from InfoWorld's Networking Deep Dive PDF special report. | For a quick, smart take on the news you'll be talking about, check out InfoWorld TechBrief -- subscribe today. ]
The fact of the matter is that in any infrastructure of reasonable size and age, there are many examples of code such as this that was originally written to fulfill a need, then abandoned or forgotten. These zombie scripts will continue to run, performing tasks that now make little or no sense. In some cases, they can be extremely problematic when circumstances change. Though they're simple to fix (just ax the cronjob), they can be difficult to uncover.
Obviously, the best way to control these things is to provide full and careful documentation, but even if complete documentation is available, that doesn't mean it's checked when infrastructure changes are made. While a cronjob that synchronizes data from one server to another might well be documented somewhere, when the source server is replaced, that script may still run in cron, trying in vain to connect to a mothballed server and do its job, for all eternity. In most cases, nobody will ever notice unless and until the script becomes a problem.
There are many cases where zombies can become big problems. One I discovered during a particularly interesting troubleshooting session was a cronjob that caused a server to assign a secondary IP address to an interface, conduct business with that source IP, then remove the secondary. This was ostensibly written for security purposes, as the script used a mapping through the firewall to the secondary address. As such, when its job was done, removing the secondary took out a valid target IP, preventing traffic coming through the firewall from reaching an actual server.
Naturally, even if that cronjob was documented, nobody would have pored through the documentation to make sure the secondary IP used by this server for 30 minutes a day was actually engaged. Thus, after the framework requiring this component was disassembled, the script had no more purpose, but continued to assign that IP address every night. Of course, that IP address was later assigned to a production server, causing intermittent outages that couldn't easily be explained -- at least, not until we wrote a small script to capture the MAC address of that IP throughout the day. Then we were able to identify the server that was magically assuming the IP address at certain intervals.
Zombie scripts and procedures are part of life in IT, no matter how much we try to minimize them. They'll pop up from time to time, surfacing when a partition on a disk mysteriously fills to the brim with logging output or, alternately, with files created hourly or daily that had no reaper process to constrain them. They'll cause spurious network traffic within or without a network segment, causing blips on monitoring graphs that can't easily be explained. If they were poorly written (not unusual at all), they will have little or no error checking and cause huge problems when a server is upgraded or when the behavior of binaries changes enough for them to cause damage, such as when using now-deprecated commands to perform tasks that can choke a server.
But IT zombies are in many ways the opposite of "real" zombies. Those zombies are easy to find and relatively hard to dispatch (well, depending on the zombie movie you're watching). IT zombies are generally hard to find, but relatively easy to disable. The fact of the matter is that the only protection we have against IT zombies is to remember that they exist, lurking in cronjobs and scheduled tasks anywhere and everywhere, ready to cause mayhem when just the right set of bits is flipped.
When an IT problem quickly turns from inexplicable toward impossible, it behooves the erstwhile troubleshooter to remember that these creatures exist, and it might be time to grab a crossbow and go on a zombie hunt.
This story, "Zombie scripts can attack at any time," was originally published at InfoWorld.com. Read more of Paul Venezia's The Deep End blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.