Fail-over friends keep Exchange chugging

Five disaster-proofers take different tacks to mail server protection

An e-mail server can stop delivering e-mail for several reasons: a loss of Internet connectivity, a hardware failure, an operating system crash, an e-mail server software crash, or a corruption of the database that stores the messages. The traditional backup-and-restore process can take hours to resurrect a server, and any mail that comes in while the server is down will be lost. As a result, not surprisingly, many organizations demand CDP (continuous data protection) for e-mail.

The options start with Microsoft’s own Windows Server 2003 Clustering Services and extend to a range of third-party fail-over and high-availability solutions. Windows clustering allows Exchange Server 2003 to be set up in either an active/passive cluster or a cluster of multiple servers with one standby server. This is highly effective in ensuring uptime, but it is complex to set up, requires extra hardware and licenses, and does not protect against data loss or database corruption (see “Windows clustering a costly option for Exchange fail-over”).

The solutions reviewed here can cope with almost any Exchange-related mishap, except Internet failures, and they do so more simply, at lower costs, and with additional flexibility or protection compared with the native Exchange cluster. Two solutions, Neverfail for Exchange and SteelEye LifeKeeper, bring true fail-over to an entire Exchange server. Two others, Cemaphore Systems MailShadow and Quest Availability Manager, protect individual mailboxes on one or more Exchange servers. And one, Lucid8 DigiVault, provides backup of data stores that can be restored to a secondary Exchange server. For maximum protection, administrators might choose to implement a fail-over system plus the CDP that DigiVault provides. (Yet another alternative is a high-availability Exchange appliance from Azaleos or Teneros. These solutions are installed on your premises but managed and monitored off-site. See our review "High-availability Exchange made easy").

Each product takes a different approach to protecting Exchange and offers different advantages. Some of the differentiators are, for example, whether an Exchange server license is required for the backup server, whether more than one server can be protected by a single backup server, whether an agent is required on each Exchange server, and whether replication over WAN links is supported.

The test setup for each product consisted of a domain controller (Active Directory), two Exchange servers (the primary and secondary), and any additional servers as required by the individual product. I set up replication of the primary Exchange server to the secondary and then simulated failures by unplugging the network cable from the primary, stopping the Exchange Information Store service, and dismounting the drive the information store was running on, while monitoring incoming messages and simulating traffic using LoadSim. I observed the Outlook client experience when the primary server failed, as well as the time required to fail over to the secondary server.

40TCchart.gif
Click for larger view.

Neverfail for Exchange

Neverfail is a true, automatic, active/passive fail-over solution. It uses primary and secondary Exchange servers linked via crossover cable to maintain a heartbeat connection and perform data synchronization. If the primary server experiences a hardware or software failure, the secondary server assumes its IP address and hostname and resumes operation. I tested Neverfail for Exchange 5.0. Neverfail Group offers a variety of application modules other than Exchange, including IBM Lotus Domino, Microsoft File Server, Oracle Database, SharePoint, and SQL Server.

Neverfail provides functionality comparable with that of Windows Clustering, and because it doesn’t require Windows Server 2003 Enterprise or DataCenter and Exchange Enterprise Edition, the overall cost is comparable. Neverfail goes beyond Windows Clustering in providing easier setup, great management, and an intelligent analysis and monitoring tool that can find and resolve problems on the Exchange server before they cause failures. Further, as opposed to Windows Clustering, Neverfail doesn’t require the hardware of the primary and secondary systems to be identical.

With the Neverfail system, LAN users don’t need to restart Outlook. The interval between failure of the primary server and starting the secondary server is short, about two minutes in my testing. Users connecting via MAPI or the Outlook Web Access client may need to restart the client to connect to the backup server.

The Neverfail system requires an additional NIC in the primary server, and a backup server running the same server OS and the same version of Exchange. Neverfail runs on Windows 2000 Server or Windows Server 2003, and it supports Exchange 2000 and Exchange 2003.

Setup is simple and straightforward. The Neverfail SCOPE (Server Check Optimization Performance Evaluation) utility identifies any performance or configuration issues with the Exchange server and recommends solutions before installing the fail-over software. It takes snapshots of server performance and performs trend analysis to identify areas that may become problems in the future. It also generates a system ID that Neverfail uses to create a license key. After the key has been received from Neverfail, the system clones the Exchange server to the backup system.

The installation copies all application files, registry settings, services, and data stores associated with Exchange, so the backup server is a perfect duplicate, including any software updates, service packs, and so on. The system monitors all the key services, as well as the main Exchange server process, so any problems -- even with associated software or performance degradation -- can trigger the fail-over.

Pricing begins at $7,600, which includes Heartbeat (the core engine), the Exchange module, and four SCOPE analysis cycles (the initial analysis of the server prior to installation, and three follow-up checks), as well as maintenance for one year. Pricing is per pair of servers, based on the server in the pair with the greater number of CPUs. A low-bandwidth module is available that enables compression and encryption over a WAN link, as well as asynchronous replication. This would normally be used for additional data backups rather than fail-over.

Neverfail is relatively expensive, especially if you have multiple Exchange servers. It is Click for larger view. probably less expensive than using Microsoft Exchange clustering, and it’s much easier to set up. If you need 24/7 uptime for all e-mail users, Neverfail is a good way to go, although SteelEye’s LifeKeeper offers more functionality at a lower price.

40TCfailover_in1.gif
Neverfail is relatively expensive, especially if you have multiple Exchange servers. It is Click for larger view. probably less expensive than using Microsoft Exchange clustering, and it’s much easier to set up. If you need 24/7 uptime for all e-mail users, Neverfail is a good way to go, although SteelEye’s LifeKeeper offers more functionality at a lower price.

SteelEye LifeKeeper

SteelEye LifeKeeper is a server fail-over product similar to Neverfail, but it offers additional flexibility, including scheduling of replication for off-peak hours (or with a 24-hour delay to ensure that store corruption isn’t passed on), compression for replication over WAN links, and one-to-many replication to create multiple copies of a single server.

LifeKeeper can run on any version of Windows 2000 or Windows Server 2003. It supports Exchange 2000 and Exchange 2003, and it doesn’t require identical hardware for primary and secondary servers. The cost is less than Windows Clustering, at $3,280 per pair of servers, and one standby Exchange server can protect multiple active Exchange servers, although capacity planning will be essential in case all the active Exchange servers fail at once. In addition, LifeKeeper supports shared storage between the primary and secondary servers, which can speed up the fail-over process.

For this review, I tested LifeKeeper 5.3. Setting up LifeKeeper is straightforward. You will need to create service accounts, as with the other solutions, but the documentation steps you through the process. Clients get an error message during fail-over, but clients on the LAN will only need to retry the operation -- restarting Outlook is not necessary. As with Neverfail, users connecting via MAPI or Outlook Web Access may need to restart the client to connect to the backup server.

LifeKeeper provides data compression and encryption over a WAN connection, and it can replicate to a local server for fail-over, as well as to a remote server for business continuity. The LifeKeeper GUI can administer all LifeKeeper clusters in an enterprise via a straightforward interface.

LifeKeeper offers features that Neverfail doesn’t, and at a lower price. LifeKeeper’s setup is a little more complex than Neverfail’s, but this is partly because of the additional features. One interesting extra is the ability to fail over from a physical to a virtual server, or vice versa, although most admins will not be comfortable running mail servers in a virtual environment just yet. Unless you already have an investment in other Neverfail clustering technologies, LifeKeeper is a better deal.

Cemaphore Systems MailShadow

MailShadow is not strictly speaking an Exchange fail-over product; more accurately, it’s an Exchange mailbox fail-over product. MailShadow uses the Exchange transaction log to mirror each transaction for designated e-mail accounts on one or more Exchange 2003 servers to a backup Exchange server. If a primary Exchange server fails, or its database is corrupted, the designated accounts can access the backup server instead. Because the replication is based on transactions, no corruption of the Exchange database is passed on to the backup. I tested Version 2.0.

In addition to the primary Exchange 2003 servers that host the mailboxes to be protected, MailShadow requires three physical systems: the Source MailShadow Gateway, the Recovery MailShadow Gateway, and the Recovery Exchange Server. In a corporate environment, the Source MailShadow Gateway would be hosted in the main Exchange datacenter, while the Recovery MailShadow Gateway and Recovery Exchange Server would be in a remote DR (disaster recovery) site. Only one gateway is needed at each end, and one Recovery server can support multiple Source servers. All of the servers should be in the same Windows domain.

In addition to setting up the three additional servers, you will need to set up a service account, give it the proper permissions and delegation rights for each Exchange server to be protected, and then add the account to a group created during the MailShadow install. These procedures are well-documented in the manuals.

40TCfailover_in2.gif
Click for larger view.

When e-mail accounts have been designated as protected, there is an initial interval required for creating the backup accounts with the messages already existing in the protected accounts. The time necessary for this process will depend on the amount of e-mail stored in the inbox. In my tests, replicating an inbox of about 200KB took just a couple of minutes. But with an inbox of 1.1GB, initial replication took several hours. If you have a lot of users with fat inboxes, you might want to start replication over a weekend.

Administration via the MMC (Microsoft Management Console) snap-in is easy and follows the usual MMC conventions. Administrators can control replication by storage group, by Exchange server, or by individual accounts.

After the initial synchronization, any further transactions -- receipt of new mail, deletions of messages, moves from one folder to another, and edits of messages -- are captured and replicated to the backup server, in chronological order. This is done asynchronously, but in the same sequence as on the primary server. No agent is required on the primary Exchange server because MailShadow uses the Exchange transaction engine APIs via MAPI to identify transactions to replicate.

MailShadow identifies duplicate attachments, sending each attachment only once across a WAN link to reduce traffic loads.

As opposed to Quest, Cemaphore has chosen to use a manual fail-over process to avoid spurious fail-overs that could result in a conflict between the primary and backup mailboxes. With MailShadow, when a mailbox becomes unavailable the administrator must switch users over to the backup mailbox. This can be done on an individual basis or for all users on a given Exchange server. Users must restart Outlook to reconnect to the backup mailboxes.

When the primary Exchange server is brought back online, users can be switched back to the primary account. Any changes to the mailboxes that occur during the fail-over are incrementally updated on the primary server. If the primary Exchange server is completely wiped out, a full replication operation will take place. The switch-over process after the restore is manual as well.

Cemaphore MailShadow is an effective product that allows you to take a granular approach to protecting e-mail accounts. High-priority users can be replicated while others are protected only by backups, resulting in lower replication costs. Although some admins may take issue with the absence of automated fail-over, the product is easy to set up and administer and offers a reasonable value.

Quest Availability Manager for Exchange

Quest offers QAM (Quest Availability Manager) alone or as part of the Quest Availability Suite. The suite also includes a management application that proactively monitors Exchange, looking for data errors or configuration errors that might cause problems. I tested the management product, Quest Spotlight on Exchange, earlier this year.

QAM -- Version 2.0 in this review -- works in a fashion similar to MailShadow, allowing the administrator to designate specific mailboxes to be protected and moving users to a reserve server if any of the protected Exchange servers becomes unavailable. As does MailShadow, QAM replicates transactions rather than duplicating the message store, so it is not vulnerable to replicating database problems, and one backup server can protect multiple primary Exchange servers.

As opposed to MailShadow, QAM does use an agent on the server, but it offers additional benefits, including automatic fail-over from the primary to the secondary mailbox store. Another advantage is lower pricing per mailbox than MailShadow.

Installation of QAM involves the agent and an extension to the Microsoft Exchange System Management application to allow it to manage the QAM functions as well. In addition, you must give the Administrator account the proper permissions and set up a service account. Finally, you will need a second mailbox store, which may either be on the same Exchange server as the primary or on a second Exchange server. QAM allows you to set up secondary stores on two Exchange servers so that each fails over to the other, or to set up Click for larger view. a backup Exchange server with stores for each primary Exchange server in the enterprise.

40TCfailover_in3.gif
Installation of QAM involves the agent and an extension to the Microsoft Exchange System Management application to allow it to manage the QAM functions as well. In addition, you must give the Administrator account the proper permissions and set up a service account. Finally, you will need a second mailbox store, which may either be on the same Exchange server as the primary or on a second Exchange server. QAM allows you to set up secondary stores on two Exchange servers so that each fails over to the other, or to set up Click for larger view. a backup Exchange server with stores for each primary Exchange server in the enterprise.

If QAM detects a failure on the primary Exchange server, either in the availability of the store or in Exchange services, it will initiate fail-over to the secondary store. Users will have to restart Outlook to switch to the secondary store, and QAM provides an alert notifying them of this. It takes about five minutes for the fail-over process to take place, during which users cannot access their accounts, although incoming messages are still received by the backup Exchange server.

When the administrator shifts mailboxes back to the primary Exchange server, all changes made to the secondary server are replicated back to the primary.

Quest offers a variety of applications to make managing Exchange easier, including a monitoring/troubleshooting application, migration managers to ease the transition from older versions of Exchange to newer ones, and more. Tight integration among these applications makes overall management easier and can also help prevent problems such as mailbox store corruption from happening in the first place. Given the low cost of $8 per mailbox ($10 per mailbox for the suite), Quest Availability Manager is easy to recommend to any administrator who can live with a five-minute delay in availability of e-mail for end-users.

Lucid8 DigiVault

DigiVault is a backup product, not a fail-over system, but because it provides continuous data protection, it has a place in the high-availability scenario. It uses a small agent on each Exchange 2000 or Exchange 2003 server to track all changes to the database. All transactions are recorded as they occur, ensuring no loss of e-mails that have arrived since the last backup. I tested Version 1.4.

Multiple Exchange servers can be protected by one DigiVault repository and be managed from a single console. Backups can be scheduled, but restores are performed manually.

DigiVault requires domain admin privileges to install. The console and the DigiStore vault can share the same system. The DigiStore repository requires enough storage to hold the mail stores for all servers you want to protect. Although the repository is compressed, storage requirements will quickly mount if you keep each version of messages in the store. Archiving can be driven by fine-grain policies, but policy is applied on a per-store basis. You can track every change to some accounts, while keeping only the latest copy of other mailboxes, but only by spreading the accounts across separate stores.

One console is used to manage backups and restores for all servers in the domain. The data store is encrypted using a public/private key. Transmissions to the data store can be encrypted as well, if the store is at an off-site location connected via WAN.

Restores must be performed on unmounted stores, meaning users’ mailboxes must be offline until the recovery is complete. Restores went quite rapidly in my testing, however, with a 1.1GB mailbox restored in less than a minute. Data can be restored to a different server or mailbox, if necessary for auditing purposes. Transactions received since the last backup are backed up before the restore commences.

DigiVault does not provide high availability in the same sense as the other products covered here, but it should be considered as an adjunct for the others, and it may be enough for organizations that can tolerate short periods of unavailability. DigiVault works with Lucid8’s GOexchange product to find and correct errors in the store and rebuild indexes to provide a more reliable environment in Exchange.

Staying Alive

Each of these solutions offers benefits and disadvantages, and pricing varies widely. Which is least expensive will depend on how many users per server you have. If yours is one of those organizations that actually manages to support thousands of users per Exchange server, Neverfail’s $7,600 per server pair might be cheaper than Quest’s $8 per user, per year. You’ll also need to look at how many additional servers are required. All but Neverfail can protect multiple Exchange servers with one backup server, Neverfail requiring one backup server for each Exchange server protected.

The degree of automatic protection and impact on the end-user also varies widely. DigiVault requires taking the mail store offline to restore it, and in the event of hardware failure, it requires a complete re-install. It’s intended to protect against data loss, not outages, and it does this well. MailShadow requires admins to initiate a manual switch-over for affected users and requires users to restart Outlook. Quest has automated switch-over for affected users, although Outlook will have to be restarted. Both Neverfail and SteelEye provide automatic fail-over for a whole server, and SteelEye provides additional many-to-one functionality at a lower price.

The best fit for you will depend on the percentage of your user accounts that need to be protected, how many Exchange servers you have to protect, how much you care about having to install an agent on the Exchange server, and whether you regard database corruption as a major threat. Although database corruption is not likely, the process to recover from it can be so terrible that it’s worth additional protective measures, either through a CDP product such as DigiVault or via the transaction-based replication MailShadow and Quest provide.

InfoWorld Scorecard
Ease of administration (15.0%)
Management (20.0%)
Value (10.0%)
Setup (20.0%)
Features (15.0%)
User impact (20.0%)
Overall Score (100%)
Neverfail for Exchange 5.0 8.0 8.0 8.0 7.0 8.0 9.0 8.0
SteelEye LifeKeeper for Exchange 5.2 7.0 8.0 8.0 7.0 9.0 9.0 8.0
Cemaphore Systems MailShadow 2.0 7.0 7.0 7.0 7.0 8.0 7.0 7.2
Quest Availability Manager for Exchange 2.0 8.0 8.0 9.0 8.0 9.0 8.0 8.3
Lucid8 DigiVault 1.4.2 8.0 8.0 8.0 8.0 7.0 7.8
From CIO: 8 Free Online Courses to Grow Your Tech Skills
Join the discussion
Be the first to comment on this article. Our Commenting Policies