A big part of IT management is handling upgrades and updates of all systems, critical and otherwise. Just about every aspect of IT, every piece of hardware and software, will need to be updated at some point along the line. It may be applying a firewall firmware upgrade or installing patches to an operating system or application stack. In every single one of those cases, there will be some degree of chance that the update blows up everything.
From a purely logistical point of view, there are only three possible outcomes to a firmware or software update:
- Everything goes as planned. Bugs are fixed or new functionality is added, and everything proceeds normally.
- There's no noticeable difference in operation or administration aside from a version number ticking upward.
- You've just turned a working system into a brick.
The odd thing is that there's really no good way to mitigate against that last possibility. Generally speaking, I only update software and firmware if there's a clear-cut reason to do so, such as a significant feature addition, a performance increase, a security fix, or a major bug that's causing problems. I do not upgrade just because there's a new version out. I've been bitten too many times.
Due diligence is the name of the game in every upgrade plan. Researching the new version is an imperative, especially regarding how well it plays with other elements of the device or with software that may be running on the same system. If you run across posts in forums or in blogs regarding problems with the new version, it pays huge dividends to inspect them carefully and ensure you're not about to fall into the same trap. That said, there's absolutely no way to guarantee you won't get stung when you poke at the hornet's nest.
I've had firmware updates go south because of something as simple as bad timing. In one case, the vendor's mechanism for updating required that the device download the update itself after rebooting to a special upgrade mode. Naturally, the vendor's firmware repository went down halfway through the download and left me with a device in an extremely unstable state, with no path forward or back. After hours of digging into a problem that had apparently never occurred before, I was able to trick the update code into thinking it hadn't yet started the download and was able to recover the device.
In other cases, the star-crossed devices turned into paperweights that had to be returned. This is especially common with vendors who are spotty with checks on proper bootloader versions and firmware versions, leading to situations where the bootloader doesn't get updated, but the firmware does, and the device (typically a switch) simply won't boot again.
I've done mass BIOS upgrades on bunches of identical blade servers, only to have one out of a dozen fail to reboot, completely hosed, with no POST, no service processor communication, nada. There was no rhyme or reason to it. The blades were procured at the same time, and all BIOS versions were the same, but one blade went from perfectly normal operation to completely dead on one reboot.