As senior director of enterprise technology operations at Corrections Corporation of America (CCA), a prison management firm that handles more than 60 facilities, Brad Wood faces several challenges. His group manages approximately 100TB of data -- including inmate medical records, operational records, e-mail, and so forth -- across four Hitachi Data Systems (HDS) storage arrays in two datacenters. Because of federal and state rules, much of the company’s data is mirrored three or four times to keep it accessible in case of failure. Adding to the complexity, Wood buys his hardware based on current price and performance, so he has a mix of suppliers.
With costs escalating, Wood needed a way to slow deployment of new storage hardware and make better use of the disparate hardware he already had. The solution he chose was to implement storage virtualization.
The idea behind virtualization sounds deceptively simple. It aggregates storage systems (such as arrays) from multiple providers into a networked environment that can be managed as a single pool. In essence, “the storage controller is being disaggregated and spread around the network,” says Brian Garrett, lab technical director at Enterprise Strategy Group (ESG), a market research firm.
In Wood’s case, his storage engineer can now manage the company’s various hardware from a single console, using Symantec Veritas Storage Foundation and HDS HiCommand software.
Storage virtualization has several key IT benefits in addition to improved resource utilization. For one, it allows data to be moved to any storage device in the pool when subsystems fail or are replaced. Easier data migration also makes it possible to implement a tiered storage architecture cost-effectively, one in which data is moved to less expensive, lower performance systems as it becomes less business-critical over time.
Another benefit is easier replication, because virtualization removes the need for full redundancy. Without storage virtualization, IT typically must copy entire volumes from one storage type to another in order to ensure all related data is in one environment. With virtualization, it’s easier to copy partial data -- such as snapshots or delta files -- and keep it linked to the entire data set across physical devices.
And, of course, maintenance costs are always a factor. “[Virtualization] also reduces the storage administration burden over time by getting storage folks out of the server business,” notes Rick Villars, IDC vice president of storage systems research.
If you listen to storage vendors, every product offers some form of virtualization. Since 2000, the term has been hyped consistently. Although skepticism is warranted, storage virtualization is now real enough that enterprises can begin at least exploring its use as an enabling technology to simplify storage management in heterogeneous environments. Many enterprises, like CCA, have found they can get real benefits today.
Three Approaches to Virtualization
Vendors offer three approaches to true storage virtualization: host-client (via software), in-fabric (mainly through appliances but soon also through switches), and in-array (embedded functionality). While the vendors tout specific pros and cons of each approach (see the infographic on page 39), analysts agree that they all deliver in the end. The determining factor is usually how well a particular approach fits into your existing storage infrastructure, says Gartner research director Stan Zaffos.
The in-fabric approach is the most common method, offered in products such as DataCore Software’s SANsymphony, EMC’s InVista, FalconStor’s IPStor, IBM’s SVC (SAN Volume Controller), NetApp’s V-Series, and StoreAge’s SVM. These products, which have been on the market for just a few years, use dedicated appliances or software running on a server to discover storage resources and then build the metadata that lets IT manage them as a virtual pool. Of these, IBM and NetApp have the largest installed bases (about 1,000 each), Zaffos notes.
Coming soon are switch-based products -- often deployed as blades within fabric switches -- that essentially do the same thing as a separate appliance. These will be from companies like Brocade, Cisco Systems, MaXXan Systems, McData, and QLogic. By putting the virtualization functionality in the switch, the theory is that operations are more efficient because data travels through one fewer device than if it also went through an appliance, notes ESG’s Garrett. He expects most of these ultimately to run a version of Symantec’s Veritas Storage Foundation host-client software, although Symantec says it has no immediate product plans to port its software to run on switches.
Storage Foundation has been around in various versions for a decade, running on file and application servers to detect storage resources and maintain the metadata used to manage them. Until recently, Veritas (now a division of Symantec) did not release its Unix and Windows versions in sync, so it was hard to use Storage Foundation in heterogeneous environments, Gartner’s Zaffos says. Still, he says, the technology is easy to use for many purposes, including data migration, load balancing, and flexible provisioning.
A third type of storage virtualization is exemplified by the TagmaStore network controller, from Hitachi Data Systems, which lets HDS’s management software work with multiple vendors’ storage systems as if they were one pool. Approximately 45 percent of the roughly 1,700 current TagmaStore customers implement its virtualization technology, says Claus Mikkelsen, HDS’s chief scientist. Its key benefit, according to Zaffos, is that “you’re not adding another element in the
I/O path, so you’re not buying another asset.” Because it’s usually cheaper to replace storage arrays than to pay for their annual maintenance, Zaffos expects TagmaStores to be used mainly to ease migration from old arrays.
Pricing for all three strategies is fairly equivalent, though that’s not immediately evident when comparing, say, a TagmaStore controller with a NetApp appliance or a Symantec software license. “The pricing variables are driven more by scale,” says IDC’s Villars. “For example, a TagmaStore is more expensive than IBM’s SVC, but you’re buying more.”
One potential gotcha is that while virtualization promotes the idea of cross-vendor storage utilization, all three strategies also enforce vendor lock-in. In-array products obviously lock you into a specific vendor’s array hardware, says Mark Lewis, EMC’s chief development officer, but in-fabric and host-client products lock you into the virtualization software or the appliance that embeds that software.
Where Virtualization Works Today
ESG’s Garrett says his research shows that storage virtualization applied to storage environments with at least six storage fabrics reduces costs in several areas: Hardware costs drop 23.8 percent, on average; software costs drop 16.2 percent; and administration costs drop 19.3 percent.
Once an enterprise has deployed storage virtualization, the technology is “relatively easy to use,” Garrett says. The real effort lies in getting up and running. So Garrett recommends that IT focus on a specific tactical issue, such as getting nondisruptive data migration in place. If you apply storage virtualization to that specific issue, he says, “then you can extend into the other stuff as you get more experienced.”
That’s exactly the approach taken at the Baylor College of Medicine, in Houston, Texas. Two years ago, the college decided to integrate dozens of file servers and ERP stores attached to Unix and Windows servers in order to reduce unused storage capacity and lower administration costs. Despite the initial expense, Baylor decided to replace its storage devices with a single FC (Fibre Channel) storage fabric and a set of HDS arrays, recalls Mike Layton, director of IT for enterprise services and mainframe systems. Not having a heterogeneous environment to support -- “a luxury,” he says -- made the decision to deploy storage virtualization fairly safe.
Today, the Baylor system manages 200TB of data, including patient records and university operations data. HDS hadn’t yet released its TagmaStore array, so Layton deployed NetApp V-Series appliances instead. Baylor’s use of storage virtualization is mainly to pool storage resources, although the college is also considering how to use the technology to implement data lifecycle management, where patient data can be highly available during treatment but later moved to lesser systems for analysis, auditing, or other needs.
Dallas-Fort Worth International Airport had a different problem. It stored flight data (such as passenger lists, arrival times, baggage tracking, and gate information) in two SANs using Oracle RAC (Real Application Clusters). Oracle RAC could treat one storage target as the primary target and then replicate to secondary systems, but this process simply took too long, recalls John Parrish, associate vice president of terminal technology. If one terminal’s SAN goes down, the other SAN has to step in immediately so flight boarding and baggage handling isn’t delayed. DataCore’s SANsymphony appliance made Oracle RAC think it was working with just one SAN, and Parrish has seen no latency issues crop up in this deployment.
Replication issues were also a problem for Freeze.com. The online retailer needed to keep its 400GB Microsoft SQL Server transaction databases in sync with its reporting databases, but SQL’s resource requirements prevented the reporting tools from working on the same database as the transaction management, recalls Freeze.com IT director Kyle Ohme. He would mirror the database periodically, but replication took so long that the reporting database was hours behind, preventing the kind of analysis needed to manage supplies properly. Ohme deployed tools from FalconStor to pool the storage into a virtual volume so both sets of applications can access it in real time. That way, he could send snapshots of the transaction database to the reporting tools, rather than replicate the entire thing.
A Long-Term Effort
Although the storage virtualization promises touted in 2000 turned out to be premature, today’s technology does deliver at least the first step toward an automated, self-managing storage infrastructure that functions more as an IT utility, notes Gartner’s Zaffos. But that version of storage virtualization is many years away.
One reason is economics. The storage vendors can’t afford suddenly to lose the profits from their hardware businesses if storage hardware becomes a commodity managed by software tools. As a result, they’re only likely to take measured steps to support the independent standards needed by third-party management tools, Zaffos says.
Operating system vendors may help force the issue if they adopt some of the Storage Networking Industry Association standards, such as Volume Shadow Copy Service, and start moving storage virtualization from network/storage middleware to the OS, says George Teixeira, CEO of DataCore. In fact, IDC’s Villars says that in five years Microsoft Windows could implement storage virtualization for mid-tier enterprise deployments.
Another impediment to the grander storage-virtualization vision is that the large enterprises that benefit most from storage virtualization are the most conservative in deploying new technology, since their risk of failure is greater. “We’re not seeing the deployments pushing to the technical extreme,” Villars says.
The third reason is that the tools are still immature. Current tools focus on giving IT a common console for managing storage. Over time, vendors will begin to add automation and policy-based intelligence for provisioning storage and managing data migration and replication. But because of the complexity of the infrastructure that such tools must manage, “I don’t see it for years and years,” says Tom Clark, director of solutions and technologies at McData and author of Storage Virtualization (Addison-Wesley, 2005). In the interim, IT should expect to see point solutions such as heterogeneous replication and snapshots, he says.
But none of these factors prevent enterprises from benefiting from storage virtualization today. In the immediate term, companies should apply virtualization to solve specific problems, such as easing data migration or pooling data across SANs. As time goes on, IT can incrementally broaden its use of the technology, taking advantage of the continued improvements vendors will make over the coming years.