Last week I wrote about rsync and how it can be used for a wide variety of tasks. One of the main uses, of course, is for backups -- and not just poor-man's backups. In many cases, using a hard-link rsync backup scheme from one storage array to another can be extremely useful. It can even be better than "standard" backup schemes.
There's a downside to the relative ease of this type of backup, however, and that's data spread. Take, for instance, a situation where you need to move lots of data off one NAS for one reason or another. This might not be a rolling backup, but more of a spot backup because you need to do something relatively worrying on the source NAS, while making sure you can quickly restore the data or run off the target system for a time, in case it goes pear-shaped.
[ Also on InfoWorld: 6 wishes for SysAdmin Appreciation Day | Get expert networking how-to advice from InfoWorld's Networking Deep Dive PDF special report. | For quick, smart takes on the news you'll be talking about, check out InfoWorld TechBrief -- subscribe today. ]
This is a very common situation. A spot backup might even be a requirement when you need to change certain core functions of the source array, such as to enable dedupe on a volume. In that instance, you'll need to move all the data off the volume, wipe it, enable dedupe, then move all the data back in order to take advantage of the deduplication.
You fire up rsync and let it run, confident you have a solid backup of the data. You go through all the steps necessary, and you sync the data back -- done and done. However, you should always treat any significant modification to core services with some skepticism and operate them in a probationary period to make sure no gotchas pop up in the following days or weeks.
So you keep the backup files in place on another NAS, just in case. But unless you revisit that backup weeks later, there's a good chance it will be forgotten, lost under a pile of new work and new problems, and this massive backup won't be rediscovered until someone needs more space on the other NAS. When someone goes picking through the files to see what it is, they'll rightfully be concerned about deleting that data, since it looks important. Unless everyone knows what went on and why that data is there, it will likely be left alone, taking up space and potentially even becoming a security risk, because it may contain sensitive data.
This situation is all too common. It happens in large infrastructures and small. With the prevalence of huge disk and cheap NAS devices that are essentially used as enormous USB drives when performing certain tasks, it's not unlikely you'll find copies of data in places you weren't even sure you had places. Heck, I recently discovered a few terabytes of data that had wound up on a 24TB NAS, on an NFS export below the main export path, essentially invisible unless you knew where to look. I'm still not sure how it wound up there, but looking at the dates, I figure it's been hanging around for two years.
The bright side to having multiple backups of certain file stores: They can save your bacon if something goes wrong. The downside is the accumulation of wayward piles of data, needlessly consuming resources and presenting a potential source of confusion or even a security risk -- all because of the general feeling that you don't quite want to delete that backup just yet, because you might need it, particularly in a hurry.
All it takes is a little organization and perhaps some scheduled reminders -- when it comes down to it, all we're talking about is doing a little cleanup from time to time. But when you're in the thick of it and moving a mile a minute to rectify a blocking problem, it's very easy to forget to stop and take care of the little things.
Face it: Many IT pros are packrats when it comes to data. We'd rather add disk and save our backups than delete them to save space. We've all been burned by data loss at least once before, and it's a painful memory. Maybe that's why I have my laptops, desktops, and servers backed up to multiple destinations on a recurring basis. It's a good idea, but I guarantee that I have file stores squirreled away in places that even I'd be shocked to find. Once bitten, twice shy.
This story, "Beware of backups that come back to bite you," was originally published at InfoWorld.com. Read more of Paul Venezia's The Deep End blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.