Just before 9/11, I was traveling quite a bit and suffering from a fear of flying that sprung out of nowhere. I hadn't had a harowing experience in an aircraft, but to my mind a combination of lack of control and basic increase of odds put me at a greater risk for catastrophe. To overcome this fear, I spent hours learning more about aircraft and even spoke with pilots in the cockpit before flights to help ease my nerves. They impressed me greatly with the redundant systems that allowed the plane to fly even if things were breaking down. They explained that the primary cause for failure is, oddly enough, human error. In his book "Outliers," Malcolm Gladwell substantiated that claim by explaining further how human error plays a major part in modern-day aircraft disasters.
Your Exchange messaging environment, obviously, doesn't have human error as the primary cause of crashes (there are many reasons for failure, including hardware and software issues). However, human error may play a part. The reason: We've reached a point with Exchange where the ability to provide tremendous levels of availability through redundancy and resilient systems is so high that human error and/or lack of understanding may contribute to the cause of a data loss that doesn't have to happen.
[ Managing backup infrastructure right is not so simple. InfoWorld's expert contributors show you how to get it right in this "Backup Infrastructure Deep Dive" PDF guide.| Stay abreast of key Microsoft technologies in our Technology: Microsoft newsletter. ]
The core enabler of that high-availability redundancy is Exchange's database availability group (DAG) capability. The storage architecture in Exchange 2010 has 1MB transaction logs, created as data goes into the production database. With DAGs, you can create a replica of the database, which is kept up to date automatically. DAG replicas are not limited to one location, unlike a normal active/passive-type cluster. Instead, you can have up to 16 replicas, in the same location or distributed throughout your data center and/or in other locations (such as company offices and failover sites) around the globe.
Exchange 2010 SP1 and SP2 have made the DAG capability even better, thanks to new features such as block mode which was introduced to reduce the latency between the time a change is made on the active copy and when that change is replicated to passive copies, thus eliminating a single point of failure in the current log file.
Another important feature is Datacenter Activation Coordination (DAC) mode helps prevent what's called split-brain syndrome from occurring should a power outage (or some other issue) take out a primary datacenter that has the majority of the members of the DAG that enable the DAG to have quorum. It does this by preventing databases from mounting in the recovered primary datacenter site. Learn more about DAC mode through TechNet.