InfoWorld review: Data deduplication appliances
Data deduplication appliances from FalconStor, NetApp, and Spectra Logic provide excellent data reduction for production storage, disk-based backups, and virtual tape
NetApp FAS2040
Another appliance geared toward disk-based storage and deduplication is NetApp's FAS2040. This appliance allows multiple installation options for the data center, including as a SAN or NAS target, or direct via Fibre Channel. Like the FalconStor appliance, the NetApp can serve as production storage, as a backup device, or as both simultaneously.
The FAS2040 comes with up to two independent storage controllers and scales well, far exceeding that of FalconStor and Spectra Logic. In addition to CFIS and NFS protocols, the FAS2040 can also automatically export an NFS datastore to a VMware ESX server, a nice time saver for adding online disk space to an existing VMware environment. NetApp's deduplication policy didn't have the same level of flexibility as FalconStor, but it did a good job of reducing disk usage on volumes with a standard file/folder structure. However, on backup sets created by Symantec Backup Exec 2010, it didn't fare as well.
My NetApp-provided FAS2040 2U chassis was populated with a dozen 300GB SATA drives, two hot-swap storage controllers, each with four Gigabit Ethernet interfaces and two 4Gb Fibre Channel ports, and dual power supplies. My chassis was configured with two aggregates (RAID arrays) -- one for each controller -- in a dual-parity RAID configuration. To fit most any need, there are a variety of hard drives -- Fibre Channel, SAS, or SATA -- available for the FAS2040. By way of additional external drive chassis, the FAS2040 can access a maximum of 136TB of raw space, far more than the other chassis reviewed here.
I installed the FAS2040 on my test network via Gigabit Ethernet, connecting independently to both controllers in the chassis. I carved both aggregates into multiple volumes and shares, defining some as CIFS file shares while setting others up as iSCSI targets. (Like the other systems reviewed, the NetApp also allows you to create NFS shares for Linux/Unix clients.) As with the FalconStor and Spectra Logic appliances, I used the NetApp's various CIFS shares as NAS file storage and as a backup destination for my physical and virtual Windows Server 2008 machines. I had no trouble using both mapped drives and UNC (Universal Naming Convention) connections to the NetApp from all of my servers, physical and virtual. I also had no trouble mounting iSCSI shares as local storage using Microsoft's iSCSI initiator in Windows Server 2008. Each mounted volume behaved exactly like local storage.
One feature I really liked in the FAS2040 was the dual storage controllers. Depending on your needs and the configuration of the appliance, one chassis can serve as its own Active/Passive failover device. In case one controller should suffer a catastrophic failure, the other controller can take over transparently. Or, as in my case, you can use both controllers in an Active/Active configuration, if you want both controllers online and providing independent storage to your network.
Part of my testing involved simple file copies to the shares on the NetApp, while the other was based on using the NetApp as a destination for multiple Backup Exec jobs. The NetApp's deduplication of files and folders was impressive, showing excellent detection and elimination of duplicate or partially duplicate data. Like the FalconStor and Spectra Logic appliances, data reduction of highly duplicative file shares easily passed 90 percent. However, I was surprised at the trouble the NetApp had with the Backup Exec backup files.