But there are so many different ways to implement a deduplication product, and each vendor and product may take a different approach. Deduplication can take place in-line (meaning the data is deduplicated before being written to disk) or postprocess (meaning the data is analyzed after it has been stored to disk). It can be done at either the source or the target (the storage appliance or virtual tape library). It can be handled through the software (the OS) or the hardware. It may send your head spinning to think about the many options, but you may be happy to know that Microsoft uses some aspects of deduplication directly in some of its products.
Microsoft Exchange has used SIS for years, using pointers to direct requests for a message to a single copy of the message. Microsoft introduced SIS at the file level in Windows Storage Server 2003 R2. At the block level, Microsoft delivered more space-efficient backup via Windows Home Server.
Windows Storage Server 2008 enhances the deduplication capabilities of its predecessor, using SIS-based data deduplication for the Windows File Services, which eliminates identical files on volumes. The duplicates are replaced by pointers that link to files placed in the SIS Common Store. Obviously, for this to work on the backup side, you need to have a SIS-aware backup product. And that's where Microsoft's System Center Data Protection Manager comes into play.
Some people mistakenly believe the System Center's Data Protection Manager (DPM) to have a deduplication capabilities and may feel they have no need for a hardware product to assist with deduplication. That's not true. DPM may use components that are dedupe-like (for example, block-level change tracking), and DPM certainly does an excellent job of using small amounts of storage to fit a large amount of data or a large number of recovery points (giving the impression that traditional SIS or deduplication must be involved). But DPM does not use the traditional compression, SIS, or deduplication features that you will find in a hardware storage platform. The best scenario is to use both DPM and a hardware deduplication product.
Given that several Microsoft server products have some form of SIS or deduplication, you may think you don't need to acquire a software- or hardware- based deduplication product. You might be right. But be sure to analyze your circumstances to see if you need to go beyond the deduplication capabilities that Microsoft offers. Think about whether you need an in-line or postprocess approach, a source- or target-based approach, a software- and/or hardware-based approach. Do the research, determine the pricing, consider the savings (including energy savings that some hardware vendors may offer through MAID [massive array of idle disks] products), and make your decisions.
I'm curious to hear from readers as to what form of deduplication product they have in place. Are you using software- or hardware-based products? What kind of data reduction or cost savings have you noticed (if any)? Or do you feel locked into a software-only product because the economy, combined with high prices attached to hardware deduplication, makes it impossible for you to do otherwise?