The data explosion gains momentum with each passing day, and the storage solutions that help us wrangle our increasing mountains of information grow more and more complex. As I mentioned in my blog a few weeks ago, that complexity can only decrease overall reliability as more and more lines of hastily written code drive the most critical portions of our infrastructure.
All of this puts a much more critical focus on the support that your storage vendor provides. Instead of seeing a broken drive or controller that just needs replacing, I often run across problems in the field that turn out to be software bugs requiring engineering input to diagnose and resolve.
[ Also on InfoWorld.com: Learn how data deduplication can slow the explosive growth of data with Keith Schultz's Deep Dive Report. | Looking to revise your storage strategy? See InfoWorld's iGuide to the Enterprise Data Explosion. ]
While we're becoming more and more dependent on vendor support to keep the infrastructure humming, the quality and reliability of that support itself seems less and less likely to live up to our expectations.
The way support should work
Though I had to go outside IT to find a great example -- which says something in itself -- I recently had a great experience that reminded me of what enterprise support should be.
A few nights ago, I got home from a long day on the road. As I was heading inside, I noticed an enormous puddle of transmission fluid underneath the front of my truck -- bad news. A quick check showed that there was literally no fluid in the transmission; I was lucky to have made it home at all. One thing was clear: That truck wasn't going anywhere on its own.
Given that it was late at night, there wasn't really anyone I could call, so I dropped an email to my mechanic, let him know what had happened, and asked for advice. First thing in the morning, he called me on my cell phone and said that, based on my description of the problem, he knew what was broken and what the solution would be. He had already dispatched a tow truck to pick it up that morning and would probably have it ready to return to me by the end of the day. He was true to his word.
That kind of above-and-beyond service is rare in any situation, but it is exactly what we should expect from the vendors supporting the mission-critical storage infrastructure on which our entire businesses are built. A direct, knowledgeable response from a known contact with a clear action plan and a set time for problem resolution are key components of superb support.
The way support really works
So how can you guarantee you'll get this kind of service when your organization's electronic livelihood depends upon it?
Honestly, you can't.
Just because you paid for an expensive four-hour onsite response warranty with the SAN you just bought doesn't mean you'll get it. Or rather, that four-hour warranty probably doesn't mean what you think it means. Though terms vary from vendor to vendor, that four-hour response probably means a low-level technician will call you within four hours.
Either that or you're likely to spend at least a few hours running through a support script with someone who has very little understanding of the technology. If the cause of the problem you're experiencing isn't cut and dried (such as obviously broken hardware), you can get stuck in an interminable maze of nonsensical troubleshooting steps before anyone will dispatch a tech on site and really get to work on solving the problem.
Once that's finally done, if you aren't located in or very close to a large city, you can count on a few more hours for a tech to be dispatched, track down parts that he or she needs, and actually arrive. Then the problem still has to be identified and fixed.
Premium vs. "normal" support
Most large storage vendors have a special grade of mission-critical support for situations in which a four- to six-hour fix (at any cost) is required, but such plans tend to be very, very expensive. When you're shopping for a SAN, the cost of these warranties, often more than half of the cost of the device itself, may seem ludicrous. Maybe so -- but make sure you weigh that opinion against the cost to your company of the device being unavailable, while clueless first-tier support folks ask you inane questions.
The reasons why "normal" support is so poor vary. Many times it's a result of the vendor deciding to outsource its support. This usually opens a huge chasm between salespeople who make promises and the staff who are actually responsible for supporting what has been sold.
Unlike my mechanic, the person on the phone isn't actively trying to ensure you come back for more service or to buy a car. In most cases, the company has no skin in the game whatsoever -- it already has your money. Unless you're a huge customer, even the most sternly written complaint will usually be met with disinterest and very little action.
To be sure, much of this depends on who you are dealing with. There are certainly good and bad players out there. When you're buying, you need unbiased opinions on the vendor's reputation. Sadly, that will only get you so far. In the past two years, two major storage vendors I deal with regularly have traded places as the respectively best and worst support providers. Just because a vendor's support is well regarded now doesn't mean it will be in the future.
The lesser of two evils
Given this sorry state of affairs, you basically have two options: Either you purchase exorbitantly priced mission-critical support with a service-level agreement that stipulates a time to resolution -- not response -- and hope it is honored. Or don't count on the support at all and build in your own redundancy.
That second option has become increasingly attractive to many of the clients I work with. Why buy one fallible device and spend half as much again trying to guarantee it will be properly supported when you can buy two for slightly more -- and perhaps implement some geographic redundancy at the same time? With devices lower down the food chain from primary storage, it's common to see a raft of hardware (such as network switches) covered by the lowest-level warranty and protected by a spare device sitting on a shelf.
Whatever your approach to covering yourself, you'd better have one. Unless you're absolutely confident in the SLA that's being offered, don't take a vendor's promise to support you within a certain timeframe as a real solution for maintaining uptime. If the vendor doesn't follow through, you're the one who has to deal with the consequences. Having your own do-it-yourself plan to resolve things quickly -- and follow up with support afterward -- is often the best protection money can buy.