Everything seemed good; we had plans to upgrade our power, cooling, fire suppression, and build the room. We were on track to move in July, but in February got notice that they colocation facility was kicking us out with 45 days notice.
Luckily the team I have is top notch and rather than panic, we aggressively planned. We came up with contingencies for power (we didn't have enough) but could retire some systems, replace some older 21-inch CRT monitors, and virtualize a lot of systems in the short amount of time we had. We didn't have cooling but were able to secure temporary cooling units in time for the move. We needed backups but luckily had a really good relationship with our vendor who agreed to expedite new hardware for us.
We met with our executives and explained the risks: limited redundancy on power and cooling, none on our Internet connection. With three weeks left to move, we got sign-off.
During week one, we retired or consolidated 10 racks of equipment down to one. This freed up 70 amps of power and the footprint to bring our existing infrastructure into.
Week two we got executive approval, moved 35 servers that were slated to go live but hadn't yet. Mostly this was a new iSCSI SAN. Rather than take it all down, we moved three-quarters of it and built a VMware ESX cluster on the rest of it instead. This course of action allowed us to virtualize approximately 40 boxes in the remaining 3 weeks, further reducing our power consumption.
During week three we ordered new backup hardware, and retired and virtualized more boxes. We also spent a lot of time figuring out which servers had to move together, and which ones could go in move 1 or move 2. We also worked with our applications team and user community to start developing test plans.
With two weeks left before we would be shut down, we moved our first group of servers. It was a good test. We only had about 40 servers to move and started at 3:00 p.m. By 9:00 p.m., everything was powered down and in the new location. Some of the team wanted to work through the night and get it done, but we knew that this was as much a training exercise as something to get through quickly so we went home instead. Saturday morning the setup team started and had everything up and running by 5:00 p.m.
During the next week we tweaked our plan for the big move and finalized our schedule. Scheduling is tricky. With 3 feet between cabinets, only one person can fit comfortably, so when scheduling the server racking you can't have two teams in the same area or they get in each other's way, but cabling doesn't require as much moving back and forth so teams can work side by side with no issues. We took each cabinet and the number of servers in each one, laid it on a map and then calculated how much time each step would take (10 minutes to rack, 15 to run power cables, 15 to run network cables) and designed the plan that way.
Luckily our timing was pretty close and the install went off with no major issues. Oh, we had the occasional, "Where are the rails for this server", or broken cables, but no major issues.
Sunday was our test day and we started at 8:00 a.m. Things were going well until one of the administrators came running up the stairs. "The power company just called. There is a power problem in the park and they need to take power down to resolve it." Thirty seconds came and went as we waited for him to start laughing. He didn't.
Luckily, the power problem was not related to anything we did but rather the high winds of the night before. They took power down and our UPS and generators kicked in and we kept on testing, but I'm sure we all lost at least a month of our lives when we thought we had done something to cause power to fail.
At the end of a stressful project like this it is always good to relax over dinner and drinks and congratulate yourselves on a job well done. We did, and were having a good time until I said "Next time we should do it in 20 days." I thought I was going to get hung.