Murphy's Law can hit even the best IT shops, despite all your efforts to avoid it. You may have carefully instructed your crew on how to properly execute a task and gone back to verify your directions were being followed, but it pays to revisit the process on occasion and confirm that the system is still working. One experience early in my career hammered home this lesson and made me grateful for today's much more streamlined backup process.
The company I worked for almost 20 years ago manufactured and distributed consumer goods, and it relied on a tape backup to protect against catastrophic server failures for its 15 remote locations. If a failure happened, we'd be able to restore to the previous night's image and only need to replicate the work done so far that day.
[ Get a $50 American Express gift cheque if we publish your story: Send it to firstname.lastname@example.org. | For a dose of workplace shenanigans, follow Off the Record on Twitter. | For a quick, smart take on the news you'll be talking about, subscribe to the InfoWorld TechBrief newsletter. ]
This was not an ideal setup. But it was the best we could offer due to the slowness of the tape process and the limited staff we had. Requests to management to hire additional IT employees or improve the backup process had fallen on deaf ears.
It takes a lot of steps
The process worked well overall. Our backups were done at night when they wouldn't interfere with other office duties. Each location had a two-week set of 10 tapes, labeled for each workday (such as Week 1: Monday). Each location also had an employee assigned to take out the previous day's ejected tape, place it in its rotation slot, and insert the tape for that day's backup.
That person was also responsible for sending the backup tape from the last working day of the month to us in HQ. They would then add a blank tape in the rotation to replace the sent tape. We received tapes from all locations every month, so we'd have offsite storage of data for 7 years in case of audits and could reproduce data on our spare servers more rapidly than on the servers at a remote location if the need arose.
For the daily backups, we set it up so that at 4 p.m. our remote servers would run a tape inventory and notify us if the drive was empty. This would give us time to contact someone at the location and get the correct tape inserted into the drive. (Half days before holidays were a real pain, but that's another story.)
The next step occurred at 10 p.m., when the tape drive would erase the tape in the drive, reformat it, and report this step's success or failure by email. At 11 p.m. the backup would start and take 2 to 4 hours to complete, depending on the location's database size. Once this was done, the software would eject the tape and report to us by email either its success or failure to complete the backup.
I would get up early and check the reports from home (no smartphones then). If there was a failure at a location, I would contact a yard foreman and walk them through either replacing the tape or reinserting it so that I could clear the error and still have most backups completed before office staff would arrive.
It was a decent plan given our circumstances, and years went by with few problems. Then I uncovered a major issue.
How long has this glitch been going on?
The end of the month was a busy time not only for our accounting office, but for me as well. Our AR/AP/GL/INV software required a lengthy process of compacting data and reallocating space for the month's closing. This whole process was administered and run from HQ after hours, so I would put the daily backups on hold until the closing was completed.