Confidentiality, integrity, and availability (CIA) are the standard concepts encapsulating security. I often talk about the first two, but rarely the last. This column will be different, and I'll cover a few topics surrounding fault tolerance and disaster recovery. After all, if the service or server is not available for legitimate use, then you don't really have to worry about all that other security stuff.
For any high-availability service, make sure you have two or more servers in two or more locations that cannot be taken down by the same disaster. Many companies I know locate their fault-tolerant servers at different sides of the same city. Refer to any recent flood, hurricane, or tornado news to see how futile that idea might be. A fair-sized roving energy outage might take down an entire city. Better to place the redundant servers in different states, different sides of the country, or in entirely different countries altogether, if possible.
[ How to bring high availability and disaster recovery to virtual servers? See "Test Center review: Always-on virtualization." ]
Servers should have redundant drives, and for performance reasons, logs and databases should reside on separate physical drives whenever possible. If you're using RAID, go with the RAID technology that gives you the best performance bang for the available buck.
Two servers are better than one
Should you cluster your servers or use two or more independent computers to serve up the same service? Clustering implies that two or more computers share the same database, configuration, and service name. The upside of clusters: When one node goes down, the others have access to the original's data and continue to process requests without interruption. The downside: Sharing the same database and configuration can lead to application failures.
I remember the first time I spent $150,000 to cluster two servers. I made everything high-performance and high-availability, including utilizing a separate backplane channel for the clustering fail-over. I promised my CEO that we would be up 100 percent of the time. Boy, I was innocent back then. The next day, some random piece of data got corrupted, each of the participating cluster nodes dutifully duplicated the corruption, and the entire $150,000 solution went down hard. The CEO wasn't happy. There's something to be said for skipping clusters and using regular load balancing between separate servers instead.