On my to-do list one day last week was to migrate an aging Fedora Core 3 server to new hardware running CentOS 5.2. At first glance, it seemed to be a pretty straightforward task. If the old server had been running just a single app or service, it would have been simple, but the reality was that this server was running eight FLEXlm license servers, several small Web applications, a smattering of network telemetry tools, functioned as an NIS slave, served as a loghost and hosted a local CentOS yum repository.
I had an edge here due to the simple fact that I built this server four years ago, but that's far from making this a slam dunk. Many, many things have changed at the OS level between Fedora Core 3 and CentOS 5.2 (which is basically RHEL 5), the number of services provided by the server had quintupled, and I had only a five-minute window to make the swap.
However, these are Linux boxes. That makes all the difference.
The first order of business was building the new box. Using rpm, I built a list of installed packages from the old box, formatted in such a way as to pass that list directly to yum on the new box, pulling from the yum repository on the old server. Within five minutes (with only a few tweaks for packages that had changed names or been eliminated), I had all the packages installed on the new server that I'd need. They ranged from compat libs to MySQL, to ypserv. I then rsync'd the /var/www/ tree, the /usr/local/licenses tree, the /var/yp tree, and pulled over the ntp, snmpd, nrpe, yp.conf and ypserv.conf files, among others. All of those services fired up without complaint. I then rsync'd the custom tools from /usr/local/bin and /usr/local/sbin, in addition to all the custom /etc/init.d startup scripts for the FLEXlm licenses, brought over the required /etc/httpd/conf.d/ includes, and added necessary crontab entries to the new box. I copied over the various NFS entries from /etc/fstab and wrote a quick script to make all the necessary directories to mount those shares. Since the licenses were bound to the specific MAC address of the original server, I added a MACADDR=xx:xx:xx:xx:xx line to /etc/sysconfig/network-scripts/ifcfg-eth0 to spoof that MAC, and arranged both interfaces to assume the IP addresses of the old server on reboot. A few modifications to the startup scripts with chkconfig, pulling over the original SSH keys, and an edit of /etc/sysconfig/network, and I was basically all set.
I wrote a quick script on the old box to turn down the physical interfaces, stop all the license servers, and re-IP the old server. When the cutover window arrived, I rebooted the new server, causing it to come up with the same name and IP addresses as the original, and simultaneously ran the turndown script on the old box. When the new box finished booting, all required services were running, all license servers (save one) were active, and all the Web apps worked. The old box was sitting at a different IP (with a MAC address of DE:CA:FF:C0:FF:EE), and the new box had successfully assumed all the responsibilities of the old box. The NIS maps pushed without issue, YP clients functioned normally, the various Perl and PHP telemetry tools were happy, and all was well except for one license server that was old enough to have been compiled against glibc 2.2.5. Lacking the source, I poked around Google for a few minutes and found an open directory containing the required server daemons compiled against a much more recent glibc, and a few minutes later, that license server was up and running.
The downtime for this cutover was less than one minute. Nobody using the bevy of NIS-bound servers and FLEXlm-licensed applications even noticed. In short, this was a major cutover that happened during prime time, and flew completely under the radar -- exactly as it should be.
I spent more time double-checking my work than I did actually preparing the server for this transition. Had I gone for broke and not checked anything, I might have put two hours into this whole procedure from start to finish. As it was, it took around five hours of prep time (including a considerable amount of navel-gazing to run through mental checklists) to complete the whole transition. Nagios never even noticed.
This is how it should be. This is how these projects should go. This is why I'm a fan of Unix-based operating systems. To some, they make easy things hard, but to those of us who know how to bend them to our will, they make hard things easy.