For enterprises seeking to escape the challenges of managing and maintaining tape backup architectures, disk-to-disk backup has been nothing short of a godsend. By replacing tape with disk for nightly backups and relegating tape to a long-term archival role, organizations of all sizes can shrink backup windows and provide near-instantaneous restores. While simple direct-attached storage may fit the bill for smaller organizations, larger enterprises wrestling with the task of protecting terabytes of data find themselves looking for functionality that plain old disk can't provide.
That's where deduplicating backup appliances really shine. While there are a number of well-known vendors with very strong product offerings in this space (EMC Data Domain and Quantum, to name two), ExaGrid's unique scale-out grid architecture and truly refreshing support model set it apart from the pack and place it in a class of its own.
To say that deduplication technology is "hot" is something of an understatement. With rapidly growing mountains of data, leveraging dedupe in backup (if not primary storage) has almost become a necessity. However, as sexy as deduplication tech may be, it's reached a point where the major dedupe vendors are, by and large, getting the same data reduction results from their deduplication engines. Today the differences reside mainly in the impact the deduplication engine has on backup and restore performance and how well the solution scales as backup data inexorably grows. This is where ExaGrid has chosen to invest the bulk of its R&D.
Scale-out vs. scale-up
First, the ExaGrid EX series uses a scale-out grid architecture versus the scale-up architectures adopted by many of its rivals. That architecture allows you to combine multiple EX-series appliances -- each equipped with dedupe and network capacity matched to its storage capacity -- into a linearly scalable grid. This is important because it handily deals with the one true constant of any storage architecture today: rampant growth.
However, it turns out that's not the only upside to post-process dedupe. Most inline deduplication solutions store the first full backup as the "reference" copy. As subsequent backups are performed, the data shared between those backups and the original reference copy don't need to be stored but instead are referenced back to the original copy (a technique called backward-referencing). This is perfectly efficient from a backup perspective and actually serves to decrease the number of disk writes required. But what's good for backups is bad for restores. During a restore of the most recent backup (generally the one most people want when performing a restore), the appliance is forced to "rehydrate" that backup from its most deduplicated form -- again placing a very heavy load on the controller resources.
By delaying the deduplication pass until after the backup is complete, ExaGrid can use an even more computationally costly dedupe methodology referred to as forward-referencing deduplication. Instead of storing the first backup as the reference copy, then only deduplicated differentials, ExaGrid ensures that the most recent backup is always in the least deduplicated form. Thus, a restore of data from the most recent backup ends up being far faster than a restore of an old backup (which is typically exactly what you want).
While it's true that the backward-referencing approach has very little impact on the performance of small restores, it can have a substantial impact on very large restores. As server virtualization grows almost ubiquitous, restore jobs involving multi-hundred-gigabyte virtual machine images are becoming much more common.
Additionally, some backup software platforms are able to leverage the backup appliance's storage to start a virtual machine directly from the appliance and use the virtualization hypervisor's storage migration functionality to copy the virtual machine back onto primary storage (Veeam's Instant Recovery coupled with VMware Storage vMotion is a great example of this). While a great deal of attention has always been placed on shortening backup windows, accelerating restore windows is more important today than ever before. ExaGrid's post-process approach to deduplication meshes perfectly with these heavy-duty use cases.
ExaGrid in the lab
ExaGrid EX appliances are remarkably simple from a hardware perspective. Essentially commodity servers with an ExaGrid badge on the front, they are currently available in eight different models that range in total usable capacity from 2TB in the EX1000 to 26TB in the EX13000E (the top three of which are also available in versions that include support for at-rest encryption). Beyond that, the only question you have to answer is whether or not to add 10Gbps Ethernet connectivity -- an option available on the top five models of the range. As all of the models are sold with the correct balance of CPU, memory, network connectivity, and disk resources, no further options are really needed.
In order to effectively interpret and deduplicate incoming backup streams, the ExaGrid has to know what format the data is going to take and what IP storage protocol to offer it up as. Fortunately, the list of currently supported applications is very comprehensive and growing all the time.
In my case, I wanted to emulate an environment using a typical combination of different backup software: Veeam's Backup and Replication for backing up virtual machines, Symantec's BackupExec 2010 for backing up physical machines, a direct Microsoft SQL Server maintenance plan backup share, and a TAR backup share that might be used to back up a typical Linux server. Creating those four shares only required that I give them a name, specify which kind of software I'd be using, and specify which source IPs were allowed to access the share -- about as simple as you can get. After that, I followed the directions in the application-specific best practice manuals provided for each piece of software to get the backup servers attached to the ExaGrid.
In the end, I was slinging backups about an hour after opening the box.
Managing data. After configuring some backups, I got to see how the appliance handled backup data. If you remember from when I created the shares, I was never asked to specify how much space each share is allowed to use. Instead, backup capacity is allocated dynamically -- removing some management overhead.
By default, each ExaGrid appliance is configured to dedicate half of its usable storage to a so-called landing zone and half of the storage for deduplicated retention. The landing zone is where the initial raw application backups are directed without any deduplication taking place. By default, 10 minutes after the last backup file has been closed, the appliance will start its post-process deduplication sweep, which effectively copies that most recent backup in the landing zone into the least deduplicated reference backup. Any previous deduplicated backups are thinned out and left as deltas from that most recent backup.
Where ExaGrid misses
The ExaGrid appliances deserve a great deal of praise for what they're able to do and the economy with which they get it done. However, nothing is perfect. I hit on a few shortcomings when working with the solution.
First, forget about link aggregation. On devices with more than one host-facing network interface, there isn't currently support for building a link aggregation group across the available NICs. That means you must manually target your backup jobs to individual NICs on the appliance in order to load balance the traffic across them. (The only exception is when you use Symantec OST for backups. Symantec OST does appear to be capable of dynamically streaming data to different NICs.) Of course, you can avoid this issue entirely if you opt to use 10Gb Ethernet. A single 10Gb pipe would do away with any need to load balance 1Gb interfaces.
Second, the security model is simplistic. The only control you have over who can read or write from a given share is by the IP address you designate when you configure the share. Even if you're using CIFS, which (unlike NFS v3) supports CHAP authentication, the ExaGrid does not implement it. This makes it especially important to do a good job of protecting the network segment that the ExaGrid lives on from potential threats elsewhere on the network.
Finally, backup speeds are limited by ExaGrid's isolated landing zones. One of the only real drawbacks to ExaGrid's grid architecture is that, while the retention stores are spread across all members of a grid, the landing zones are not. While your aggregate backups to the grid may be able to leverage the raw performance of the entire grid, a single backup will always be limited to the throughput of the appliance that it targets (and, due to the lack of NIC teaming, even a single network interface). With most backup applications, it is relatively easy to create multiple jobs to target shares, but this creates administrative overhead to manage them all and ensure some degree of parity among them.
Transitioning the existing grid model into a true scale-out NAS in which an entire grid of appliances appears as a single appliance to a backup application is no easy task. (Very few primary storage vendors with scale-out architectures have yet to artfully solve that problem with protocols like CIFS and NFS, either.) That this is a limitation in ExaGrid's model is perfectly understandable. Nevertheless, it's also worth noting that this is typically not a problem with monolithic scale-up implementations.