How to deal with a storage capacity crisis

Sometimes, despite best-laid plans, storage capacity hits the wall and creates an emergency situation. Don't panic -- fix it right

When faced with a storage capacity emergency, you almost always find yourself under enormous pressure to deploy a quick fix. "That's OK, I'll fix it later," you tell yourself, only to live with the unfortunate consequences of your actions for years to come. I know -- it's happened to me.

Runing out of storage unexpectedly in the midst of production is all too common in the era of the enterprise data explosion. Sometimes it happens because proper capacity monitoring isn't in place. Sometimes it's because of an unforeseeable growth spurt in storage consumption. Whatever the cause, the onus is on IT to make more storage available as fast as possible.

The trick to avoiding quick fixes is discipline. Management may clamor for a quick solution, but be steadfast, rearrange your work schedule, and dedicate yourself to fixing it the right way. That's the only way to avoid anguish in the long run.

The "right way" doesn't always need to take much longer than the "wrong way," either. Here are two real-life examples of capacity emergencies and an expedient ways to fix them that won't haunt you forever.

Storage capacity crisis case No. 1: The full file server

It's 9:30 a.m. on a Monday. The help desk phone line has just started to explode with frantic calls from users -- apparently nobody can save documents to the network. A quick check reveals that the data volume on the corporate file server is packed to the gills. The server in question doesn't have any free drive slots left, so ordering additional disk for overnight delivery is a no-go. The whole server will need to be upgraded and who knows when that'll happen.

Given that all of the critical file shares for the company reside on this volume, the problem must be fixed -- and fast.

The most obvious action to take is to find data on that volume that shouldn't be there and delete it to buy time. Perhaps there's a copy of a software installation CD or a service pack that you can be absolutely sure nobody will need right away. However, that might free up only a few hundred megabytes, which will be gone in the blink of an eye.

In my experience, the most common way to solve this problem is to pick one large, less-used file share and move it to a different server, one that happens to have some space -- usually an application server. The best candidates are usually file shares for smaller departments, like marketing or finance, that have few users but lots of data. With a low user count, you need only work with a few employees and redirect their drive mappings.

That's still an ugly fix, of course. Fortunately, one fairly easy step can make the fix painless to walk back out once your new file server arrives in a few weeks: Set up DFS (distributed file system).

If you're not using DFS already, it can be a real life saver. Essentially, DFS allows you to create a virtual file sharing name space (a DFS root) that contains transparent file share mappings (DFS links). From a user's perspective, the DFS root appears to be a normal file share with normal folders. However, those folders represent links to file shares where the data is actually stored. So, while a user might browse to \\example.com\network\departments\marketing, they're actually being redirected to \\appserver1\marketing$ without being aware of it.

This can be very valuable to admins because they can easily change where the DFS links point without informing users. Thus, when the new file server shows up and you're ready to move the data back to it, all you need to do is move the data during off hours, and update the DFS link to point to the new share. Having DFS in place will also make the rest of the file server migration easier to accomplish with minimal user disruption.

Storage capacity crisis case No. 2: The overallocated SAN

Earlier this morning (also at 9:30 a.m. on a Monday, because that's when everything bad happens), you were confronted by a department director and an engineer from a software vendor who is upgrading a mission-critical application that your organization relies on. Apparently, the new version uses twice as much storage as the old version -- a net increase of 500GB. Also, they need the space this afternoon. The fact that nobody bothered to warn you about this doesn't matter -- the director in question plays golf with your boss's boss and the storage vendor flew in from San Diego and is getting paid a ridiculous hourly rate. Crunch time!

Two years ago, when you got your fancy new SAN, you were pretty sure that there was no way your company would ever use 8TB of storage. What could possibly fill that up? Now, however, you have your answer: lots of stuff. The SAN was originally purchased to store a new document imaging system, but since then, you've built a server virtualization environment on it and migrated a large chunk of your physical servers into it. Now there's barely enough free space left to take a few snapshots, much less the additional 500GB you'll need. A new shelf of storage is in the budget, but this problem needs to be solved right now.

The best option you can muster is moving a utility server off the SAN and onto a retired server with adequate direct-attached storage -- or just giving the vendor that old server instead of giving him SAN space. Neither of these options are palatable.

In the first instance, you'll be moving a server, only to move it back again later once you have available SAN space. In the second, you'll put yourself in a situation where a mission-critical server will be built on obsolete hardware.

There may be another way. When initially presented with a large centralized storage device, there's often a strong tendency to overallocate the first few volumes that are created. After all, two years ago when it was installed, the new document management system was going to be "huge," according to every indicator, so giving it a 2TB volume seemed like a no-brainer.

To date, though, it has used only 600GB of that space, leaving more than 1TB unused. Unfortunately, shrinking a production NTFS volume is a risky proposition at best. Worse, you don't have enough space to create a replacement volume into which you can move that application's data and free up the space, either.

Or do you? If your SAN is worth its salt, it probably supports thin provisioning. Thin provisioning allows you to essentially free as-yet-unallocated space from an existing volume so that you can use it for other purposes.

It can also be extremely dangerous: If you eventually try to use the storage that you've freed, the SAN will really run out of space and you'll be in serious trouble. However, thin provisioning a large, overallocated volume can give you the flexibility to shuffle items around and get you out of hock. So long as you add storage resources soon after thin provisioning -- or migrate the data on the overallocated volume to a correctly sized volume and discontinue the thin provisioning -- this cheat can be invaluable in a pinch.

Putting it all together

Clearly, the best way to avoid both of these problems is to have good capacity monitoring in place and plan to add storage resources well before needing them becomes an emergency. But even the best planning can't foresee all incidents.

When you're presented with a storage capacity crisis, take the time to consider all your options and resist the temptation to go with the first idea you come up with. You may be able to find a solution that will deliver the results your users need without resigning yourself to creating an ugly ball and chain you'll be attached to forever.

This article, "How to deal with a storage capacity crisis," originally appeared at InfoWorld.com. Read more of Matt Prigge's Information Overload blog and follow the latest developments in network storage and information management at InfoWorld.com.

Copyright © 2010 IDG Communications, Inc.