October 27, 2009

An icy hard drive and the pending payroll

After the client's hard drives and backups failed, unusual measures had to be taken to make sure 800 employees would get paid

We are a vendor of time and attendance systems, and our customers use the data to process payroll. Our help desk takes many calls from customers, mainly about training issues, rule changes, or errors that occur.

The worst call we can get is from a client who "cannot process payroll." It means the customer cannot produce a file to feed the payroll data, and unless it is fixed quickly, the employees will not be paid on time. Needless to say, tensions run high while troubleshooting such calls.

[ Want to cash in on your IT experiences? InfoWorld is looking for an amazing or amusing IT adventure, lesson learned, war story from the trenches, or an instance when something went very right. Send your story to offtherecord@infoworld.com. If we publish it, we'll send you a $50 American Express gift cheque. ]

We had a new person join us on the help desk. He had been on board for a couple of weeks when he got the dreaded call from one of our long-term clients that had about 800 employees on the system. The new tech followed protocol and notified the help desk lead, who notified the manager and the VP. They had all come over and were getting up to speed with the situation when they heard, "The customer lost a second drive in their RAID array, the server is down." It turns out the customer had lost the first hard drive a few days earlier and had not noticed. With the loss of the second drive, the database server had lost the database and could not run our application.

The tech continued troubleshooting with the client's IT staff when we overheard, "The backup from the last two nights is no good, they can't get a valid database."

All of us were getting more and more tense at the thought of 800 employees not getting paid on time and accounting staff redoing all the data.

Then we heard the new tech tell the customer, "Remove both drives and put them in the freezer for an hour." We could not believe what he had told them. The new tech continued, "At my last position, we used this method to recover data when a hard drive failed. It doesn't always work, but it's worth a shot." We thought it was a strange idea, but figured there was nothing to lose at this point, so why not?

The incredulous customer removed the two hard drives and placed them in the freezer. After an hour, he put them back into the server. They worked!

The customer was able to recover the data and obtain a good copy of the database and get payroll out. The hard drives worked for a few hours, then completely failed.

We have used this cooling technique a number of times since that incident -- most recently when our CEO lost his hard drive on his laptop computer.

I guess you could say there are two morals to the story: One, don't be afraid to think outside the box -- and listen to those who do so. And two, make sure your data is backed up because the Freezer Fairy does not always grant a second chance to recover lost work.

This story, "An icy hard drive and the pending payroll," was originally published at InfoWorld.com.

White Paper

D2D Virtual Tape Library Replication Primer

This whitepaper explains the terminology and concepts behind Data Replication technologies and establishes some sizing rules through worked examples. Learn the new paradigm in disaster tolerance—protect data anywhere.

Download now »

White Paper

An Alternative to Virtualization for Datacenter Cost Savings

Server virtualization is a popular option for dealing with mounting datacenter costs. Another equally promising approach is the use of an Application Delivery Controller. Citrix NetScaler provides a low-cost way for organizations to reduce their server count and accrue cost savings from a reduction in space, cooling, power and personnel.

Download now »

White Paper

Why Your Firewall, VPN, and IEEE 802.11i Aren't Enough to Protect Your Network

The emergence of WLANs has created a new breed of security threats to enterprise networks.

Included in HP ProCurve WLAN solutions is security technology that alleviates threats from WLANs through:
* Monitoring wireless activity inside and out of the enterprise
* Classifying WLAN transmissions into harmful and harmless
* Preventing transmissions that pose a security threat to the enterprise network
* Locating participating devices for physical remediation

Download now »

White Paper

Bringing the Edge to the Data Center

Effectively address data protection challenges, implementing solutions that help store and protect business–critical data while cutting costs and improving efficiency and reliability.

Download now »
knucklebusted 27-Oct-09 11:27am
1 reply
The freezer trick is pretty old. I have used it a number of times. Although, I usually put them in a ziplock bag before introducing them to the freezer to avoid moisture. Back in the days of of less than gigabyte drives, there was a growing problem known as head stiction where the head adheres to the media and the motor was too weak to break it free at power on. The only viable solution was to hold the drive between both hands, oriented so that the platter were parallel to the palms then give a quick rotation at the wrists to dislodge the heads. This usually would resolve the problem until the drive was powered down the next time.
fushigi 30-Oct-09 6:14am
1 reply
+1.

A "gravity reset" would also sometimes free a drive with stiction issues. For those that don't know the term, that means you drop the drive with the platters parallel to the ground. Typically from about 6" above a hard surface like a desk top.

knucklebusted 30-Oct-09 6:28am
I have heard that called percussive maintenance. You have to be careful though. I had a guy see me do this. The next time I worked on one of his systems he had dropped the whole machine repeatedly at ever increasing heights. Cards had popped out of slots and cables had come loose. Not to mention it looked like it had been dragged along the highway!
Accounting IT Guy 27-Oct-09 1:20pm
To the IT guys at the client company..... The last two nights of backups were bad? ....But the data on the disk was ok? Either the backups were never good, or you didnt know how to restore them. Either way, fail. :)
timwillin 28-Oct-09 5:03am
Just performed this again on a laptop drive this past weekend. The drive was a exhibiting the click of death and was unreadable. Double bagged the drive and froze it overnight. Pulled it out the next day and after warming back up, I was able to read and copy the contents. Now the drive doesn't even click and is still readable. Usually this only works when the drive controller is going bad and overheating.
jeffmendillo 10-Nov-09 10:14am
So how does freezing the drivefix the striction problem? I would think this would just make things worse. I've heard it works, just curious how.
itbuilders 10-Nov-09 9:53pm
The freezer trick typically works if there is a problem with platter rotation (e.g., damage or wear to the bearing). The cooling causes the metal parts to contract thus reducing friction and allowing a stalled motor to spin up. The issue with moisture doesn't come into play until you remove the drive from the freezer. The cool surfaces condense ambient moisture in the air, similar to a chilled glass or beer bottle. In addition to cooling the afflicted drive, you can wrap it with frozen gel packs, similar to those used in picnic coolers. This extends the working drive of the failing drive. Another trick is to prioritize data. You have a very finite amount of recovery time, so focus on copying critical data first. Do NOT run repair utilities (scandisk, chkdsk, fsck/e2fsck, reiserfsck, iostat -E, etc.) as this just wastes valuable time. No sense rearranging deck chairs on the Titantic when you need to be loading life boats!

Sign up to receive InfoWorld Resource Alerts

Subscribe to the Off the Record Newsletter

The one-stop resource center for IT professionals.

©1994-2009 Infoworld, Inc.