In my post last week, I described some of the basic challenges in ensuring that data you delete actually stays deleted. In the context of personal computers and removable drives, these concepts can be confusing for users but are usually fairly well understood by IT pros. But IT pros are often confused when deleting data in the context of storage virtualization in their data centers.
Virtualizing storage has been enormously popular for several years. It's no wonder, either: By abstracting the underlying storage medium from how it's presented to storage users, you can pull off really cool tricks. Thin provisioning, snapshots, SSD wear-leveling, and automated storage tiering are all possible thanks to storage virtualization.
However, all this progress has come at a cost to data security. You can no longer simply overwrite a disk with random garbage and assume that anything that had been on that disk has been effectively obscured, as you can on your PC. Instead, there are almost certainly leftover bits and pieces of that data floating around on your storage device.
If you want to be reasonably sure that someone won't come across sensitive data by accident, you can succeed without too much difficulty. But if you're looking for an iron-clad guarantee that sensitive data will never see the light of day, you'll find it can get substantially more complicated and in fact almost impossible without committing to a mammoth undertaking.
Imagine you're in IT at a medium-size accounting firm. Your data center infrastructure consists of a few VMware vSphere virtualization hosts coupled with a Dell EqualLogic SAN. You use Veeam's Backup and Replication to back all that up daily to an ExaGrid NAS and monthly onto tape monthly archives. Maybe you use products from Citrix Systems, Hewlett-Packard, Microsoft, and/or NetApp -- it doesn't matter, as the issues are the same in this common storage scenario no matter what products you use.
A particularly security-sensitive client has asked the partner he works with to provide assurance that all the work product associated with a project has been completely and securely erased from the firm's systems. Without completely realizing what he's promising, the partner agrees. Minutes later, you have an email in your inbox asking you to ensure that the files in a given folder on the network are completely deleted and to ensure that all copies have been destroyed with 100 percent certainty.
Why 100 percent deletion is impossible in virtualized storage
Let's look at what this commitment really means to IT.
The file server. Starting with my advice from last week, the simplest thing to do is to get on the virtualized file server where the files are stored and securely delete them using a tool like Eraser. You could also use Eraser to securely wipe all the unused space on the disk. Those two steps would ensure that the disk blocks where the file used to reside have been completely overwritten with garbage and any other blocks that might have contained older versions of the work product are unrecoverable.
The hypervisor. However, you can't stop there. Because this file server is virtualized, you need to be concerned with what might exist on the VMFS file system that the file server's disks are stored on. If you've ever taken a VMware snapshot of the VM (you have, because Veeam takes one every time it performs a backup), some of the data in question might have been written into a snapshot "delta" (a file containing changes written to the disk while the snapshot was active). When the VMware snapshot is deleted later, the data in that delta file is copied back into the main disk file, and the delta itself is deleted. However, as with deleting normal files, the data isn't securely scrubbed -- it still exists on the disk, though it no longer has a file descriptor pointing at it.
If you want to delete that delta data, you have to securely wipe the free space on the entire VMFS partition as well -- largely a DIY procedure as few tools are designed to do this. For example, you might use the DD utility from a vSphere host's console to write zeros into a file on the VMFS volume until it's full, then delete it. That's dangerous because it will temporarily fill the volume, which could affect production, but it will do the job.