Keeping up with tech changes can be daunting at times. But it's baffling that some IT pros refuse to make active changes to stay current.
Such obstinacy caused an upgrade to a medical corporation's most critical system to turn catastrophic. This company, which I'll call "ABC Medical," has roots that go back decades to when it was a single hospital in a small U.S. town. A couple of people who started with the company straight out of school eventually progressed to become the senior managers of the IT department.
However, as the company aggressively bought up competitors' hospitals and became a large enterprise with multiple facilities across multiple states, the IT managers did not keep their tech knowledge current with the times. They maintained the attitude that their department and skills were fine with only minimal or intermittent upgrades, which held them back from implementing the most basic industry standard practices.
Take HIPAA, ARRA, and Obamacare, then add the exponential growth of technology in business over the last 20 years. Mix that with a hefty dose of stubbornness, and you have a recipe for disaster.
The most critical IT system for a hospital is the patient record system. ABC Medical's vendor had been warning that it was going to drop all support for the product and really recommended an upgrade to the more current version, but the IT managers limped along with the outdated solution.
The database for the patient record system was well beyond what the solution had been designed for. To make matters worse, the database server was a badly underprovisioned virtual machine running on an old version of a common virtualization platform. And forget updates to the OS or to the database program itself -- "Updates break stuff!" the server team manager would often quip. Database indexing? What's that? So the patient record system ran at glacial speeds anyway.
However, changes in the law finally forced the server team to upgrade the patient record system. The manager scheduled the change for the coming weekend -- news to the team, for sure.
Question: Any RFCs put in for this major change? Reply: What's an RFC? Question: Any notification sent out to the user community? Reply: Well, yeah -- on Saturday morning, 10 minutes before the upgrade we'll tell everyone to get out of the system for "a few minutes."
This did not bode well.
Here goes nothing
The day came and there were no major issues with the process itself, aside from the fact that it took six hours to upgrade the schema on the database. The application servers were updated in about an hour, then it was all brought back up. The requested "few minutes" stretched on and on, until finally an email went out to the users telling them they could resume using the patient record system.
But there was a problem: The new version of the software read and wrote a lot more data. As a result, the database transaction queue went from "barely keeping up" to "thousands of transactions behind" in less than an hour. The fix? Taking the system down for another hour to "let the database server catch up," but making no functional changes at all to the server configuration.
When it was brought back up, the same thing happened. The next step was to start a conference call with the software vendor to ask for assistance. The vendor asked about the database server specs and politely suggested that it may not be fast enough for the extra workload. "No, it has been working fine for years! All we did was install this (expletive) upgrade, and now our server is crashing! It's not the database server! You broke this (expletive expletive) and you are going to (expletive) fix it!" the server team's manager yelled red-faced at the speakerphone.
The vendor proceeded with whatever other troubleshooting it could. In the meantime, none of ABC Medical's multiple hospitals could pull up any patient record data, and the medical personnel had to chart by hand on paper. As the patients stacked up, patience with the IT department grew very thin.
This went on for days of pointless troubleshooting on other systems besides the database server because the manager refused to believe anything could be wrong with it. At one point, he threw out the idea of rolling the whole thing back to how it was before the upgrade started. But it was a moot point -- no backup of the system had been taken before the upgrade.
A victory of sorts
Finally, after the system had been down for days and after every other conceivable solution had been tried, the IT manager agreed to give a new database server a try. After it was racked, set up, tested, and ready for action, the system was brought back up -- and it worked! Problem solved.
I would like to be able to end this story with a happy note that the server team's manager was fired and sent to flip burgers somewhere, but that's not how it worked out. When the manager had to report to the board of directors on the matter, he threw the vendor under the bus -- and got a round of congratulations for his handling of the crisis.
The sad thing is that this was a completely preventable catastrophe if only some basic disaster recovery practices had been followed. Back up, back up, back up! And don't do anything major unless you have a foolproof plan for backing out.