He found a small problem at this point: There was no data. The database binaries were all there, newly installed, but all the data file systems were empty.
They figured the problem was a file system corruption, which had been a problem with other recent projects. They tried to unmount and mount the file systems again. Still no data. Then they ran some file system checks, which ran cleanly. Yet again, no data. They called SAN support to verify the disks. There was nothing wrong with them.
[ Get a $50 American Express gift cheque if we publish your tech experiences. Send your story of a lesson learned, of dealing with frustrating coworkers or end-users, or a story that illustrates a relevant takeaway to today's IT profession to firstname.lastname@example.org. ]
After several of these futile attempts, they went to the fallback plan of moving the disks back to the old server, restoring the backup, and starting the database.
So they did. After mounting the file systems in the old server, they noticed they were still empty (in their minds, corrupted), including the file system where the backup was stored. Again they made several troubleshooting attempts to recover these "corrupt" file systems.
At that point it was almost 8 a.m. on Monday, end-users needed to start working, and everybody was getting nervous. They finally realized the data was gone for good and there were no choice but to restore from the daily tape backups.
First problem: They hadn't checked the backup tape before starting the change, and the backup for Sunday had not completed by the time they began. So they had to use Saturday's backup, and one day's worth of data was lost. Second problem: Restoring close to 1TB from tape takes time. It was only on Monday afternoon that the data was restored.
Then a whole bunch of different issues occurred. Consistency checks failed. Logs were missing. They had to downgrade the database binaries. Users' access and privileges were lost. Finally, on Tuesday night, the database was back to business as usual.
The problem had occurred because the database administrator's script removed all the files, including the backup. And the server administrator hadn't recognized a corrupted file system from an empty one.
To say the least, a thorough review was taken and action plans implemented to (hopefully) prevent a repeat of such an incident, including ways to communicate more effectively with each other across different countries, time zones, and first languages. But no matter how sophisticated technology becomes or how far a company reaches globally, there's still a human factor involved -- for better or for worse.
This story, "Where have all the files gone?," was originally published at InfoWorld.com. Read more crazy-but-true stories in the anonymous Off the Record blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.