In all my years in IT, I can't think of an everyday task that is more universally loathed than maintaining good backups. Depending upon what size environment you're running, setting them up in the first place can be a massive investment in capital and manpower, but that's just the tip of the iceberg. Daily monitoring and troubleshooting often end up making the initial deployment look like a walk in the park.
But it doesn't need to be that way. Backups can be almost a set-it-and-forget-it affair, although reaching that utopia requires careful planning and a solid understanding of what you want to accomplish. Here are some design tips to help you avoid the most common backup pitfalls.
Step No. 1: Set expectations
As is often the case, the first step is the most important. Before you even begin to think about what kind of hardware and software you'll use to back up your environment, sit down with your business stakeholders and come to a consensus on what RTO (recovery time objective) and RPO (recovery point objective) you're trying to achieve. The RTO is the time it will take you to recover a given resource; the RPO is the maximum age of the data that you'll be able to recover at any given time.
In my experience, IT pros who take the time to have this discussion with management discover that what management finds important bears little resemblance to their pre-conceptions. Either management has nearly impossible RTO/RPO objectives or -- surprisingly enough -- cares far less than you might think about how quickly you can get certain services back in production.
Step No. 3: Keep it simple
The best backup solution is the simplest. I have seen and, sadly, participated in designing some of the most complicated backup mechanisms you can imagine. Between building in various flavors of disk to disk, replicated disk, disk to tape, direct to tape, and offsite/cloud backups, you can meet nearly any set of requirements you'll ever be faced with. You can also design a system so complex that it collapses under its own weight.
The general rule is to try to use the bare minimum of hardware and software to satisfy the requirements you've defined. Ideally, that boils down to a single software package and a single layer of hardware -- perhaps augmented by a second layer of hardware that might handle offsite archiving if the first doesn't.
In a decent-sized, virtualization-heavy environment, this might boil down to a piece of software like Veeam and a backup-optimized NAS device like an Exagrid. For offsite, you might either get a second replicated NAS to park at a different site or toss in a tape drive and some stripped-down software just to shuffle your Veeam images onto tape. With that combination, you can get a very wide range of RTO/RPO, retention, and archival capabilities without introducing unnecessary complexity.
Step No. 4: Remove the human element