What are the three most important ingredients of a successful project? Planning, planning, planning. For our datacenter makeover at the University of Hawaii's School of Ocean and Earth Science and Technology, we planned early and often, and still got bit by last-minute surprises and devilish details that cost us time and money. We'll do it a little different next time. You too can learn from our mistakes.
Our little room in the Hawaii Institute of Geophysics, HIG 319, was no stranger to servers, though it only had a casual acquaintance with them. When we started the project, the room had six racks installed, one with an 80kW APC InfraStruXure UPS being used at 40kW capacity, and most of the rest of the racks only partially populated with servers for the various SOEST departments.
[ Get the scoop on how we solved our datacenter needs straight from InfoWorld Test Center contributor Brian Chee via our video shorts. ]
SOEST needed the new datacenter to house a number of new server clusters for use by the research labs. An initial estimate would add three clusters comprised of a mix of traditional servers and blade servers housed in new racks. Managing this upgrade would require doubling HIG 319’s square footage, adding an additional 250 amps of electrical power on a new breaker panel, and completely revamping the cooling system, which at the beginning of the project consisted of three wall-mounted window-style air conditioners that were already giving their all, to little effect.
Although HIG 319 had some drawbacks in terms of location, the tight deadline precluded any more political wrangling for a more favorable position on the building’s ground floor, which was occupied by several research labs. Besides, the maintenance corridor directly behind the room was a welcome advantage, and the room directly next to HIG 319 was a little-used storage room exactly the same size. Combining the rooms would give us the square footage we needed. We drew a deep breath and took the plunge.
Lesson 1: Give your physical space a good physical
A basic task list was fleshed out in February of 2007 and work began immediately, temporarily moving HIG 319’s existing servers, removing whatever artifacts were being stored in HIG 319a, knocking down the wall separating the two rooms, and gutting everything else. A sexy new tile floor had been installed, the walls painted, and new lighting wired up when the campus facilities management department threw us the first curve ball.
Because the SOEST building is almost 50 years old, it’s standard UH practice to have a structural engineer vet the room before anything as heavy as a new datacenter is installed — we just didn’t find out about that little detail until it was too late to go anywhere else. Further, because the building's original structural records had long since disappeared into Hawaii’s tropical ether, the engineer had to start from scratch with his calculations.
This effectively paralyzed the project for a solid month, since nothing could happen until the engineer rendered his verdict. Four weeks later the engineer announced the floor stable…barely. While the two rooms could house a datacenter, it would have to be a lightweight datacenter because most of the racks would be limited to an 800-pound maximum load, the few exceptions being certain areas over the support beams. That was a nasty kick in the nethers, given that a fully loaded cluster-running rack can weigh as much as 2000 pounds and we had planned on using six of the 12 racks in the new datacenter for Beowulf clusters. Strike one — back to the drawing board.
A flurry of tropical meetings later and we had what looked like an effective workaround. The four server clusters would move to another location, while the HIG datacenter would now house departmental servers from the various SOEST departments in 12 APC InfraStruXure racks. This would effectively make HIG 319 the central datacenter for all these departments while freeing up space for the clusters at the other locations. Not an optimal solution, but a necessary move if the college intended to install the new server clusters it wanted.
Lesson 2: Don't skimp on professional services
Work on gutting and remodeling HIG 319 resumed and we made our first official contacts with APC for power and cooling solutions and rack requirements. The information we received back took into account our square footage, the current electrical and cooling specs of the two rooms, and our intended server and rack load. APC ran all these figures through its datacenter planning tool and sent back a series of PDFs that gave us an initial floor plan, the names and model numbers of the power and cooling solutions they recommended, and a basic blueprint of every rack in the new datacenter. Initially this looked great, but later we found we’d made a critical mistake.
APC was kind enough to volunteer not only the equipment, but also manpower for the project. Understandably, the company wanted to save as much money as it could here, so our project was run using the cost-savings model rather than the full-on professional service model of APC datacenter design. The deluxe model would have required more manpower in the form of a project manager on APC’s side.
For readers embarking on their own datacenter project, we can’t over-recommend spending the money on full professional services consulting with a core vendor such as APC. Had we the good sense to solicit the service, UH reps say they would have tried to come up with the money somewhere, because trying to save cash by running without such help is very risky -- as we were about to find out.
Even at this early planning stage, an APC project manager would have gone over every detail in a conference call, whereas we simply received PDF-laden e-mails; he also would have given recommendations for installing the wiring, piping, and other prerequisites. Opting for the unroyal treatment, we were simply referred to a reference page on APC’s Web site that showed piping specs for a variety of different cooling solutions. Left to our own devices -- and the recommendation of a UH air-conditioning engineer who misunderstood some specifications -- we made the wrong choice.
In short, there's no substitute for expert guidance. An APC project manager would have made this selection for us and simply told us what to install. The right piping would have been a no-brainer from the start, instead of a last-minute correction, and nearly a costly rip-out-and-replace exercise.
Lesson 3: Saddle a project team member with detail duty
We did get some good advice from APC on our cooling solution, though even here a consultant would have helped. APC consultant or no, it would have been a good move to assign one of our project team to detail duty. We had a project lead coordinating activity and making sure the work was getting done. But we had no one tracking those critical little details -- product specifications, order status, supporting documentation -- that set us back time and again.
The order for our cooling solution was a case in point. Originally, we’d hoped to use the building’s chill-water cooling, because that’s typically the most cost-effective choice for small datacenters like this one. However, the chill water capacity was already taken up cooling existing labs. We’d have to use something else. APC’s product engineers put their heads together and recommended the InRow RP, a solution that uses two roof-mounted condensers matched to two APC SX rack-mounted compressors and evaporator assemblies. The InRow RP was the next-best thing to chill-water from a cost standpoint, and installation promised to be straightforward. Install the appropriate mounting brackets on the roof and run the right piping to HIG 319-319a through the pipe chase behind the room, and we'd be good to go. The best part is that the InRow solution is significantly more efficient than traditional datacenter cooling units, so we’re banking on significant energy savings as well.
After the adventure with the just-in-time structural inspection, SOEST's lead facilities manager, Phil Rapoza, insisted on proceeding with extreme care. Phil flat refused to begin construction on the condenser roof mounts until the condensers actually arrived. A good thing, too, because the two multi-ton condenser units we received were somewhat different than the unit described in APC's submittal drawings — different enough that the mounting brackets originally spec'd would have been useless.
One last condenser problem came to light only shortly before the units were ready to ship. Our project team assumed that APC’s sales team would know to coat the condensers with outdoor sealant for Hawaii’s highly salty, rust-inducing atmosphere. But without an APC project manager on the job, or any of us minding the order, the APC sales people weren’t even consulted. As a result, immediately upon arrival the condensers had to be moved from the shipping company’s truck onto a university truck and taken to a weather coating professional elsewhere on the island. This at considerable additional expense to SOEST, and the additional cost of a five-day delay during the construction phase of the project.
Putting a project team member in charge of tracking order details and other minutia likely would have avoided these difficulties. Weather-proofing would have been included in the original condenser order. Changes to the condenser models would have been noted long before they arrived, eliminating confusion over the mounting brackets. If you're diving into a datacenter project, you'll want to make sure that a dedicated, detail-oriented project manager is at the top of the budget list. Trust us when we say that the position will more than pay for itself as the project moves along.
Lesson 4: Hold your team close, and your vendors closer
One area for a project manager's special attention is the vendors. There are plenty of places a vendor can trip you up. Watch them like a hawk.
One example was our shipping experience with APC. The sheer volume of gear that APC shipped us for this project was staggering. We wound up using an entire 40 foot container truck to delivery our goods—and it was stuffed. You don’t overnight something like that. That gets shipped via ground and sea by a contractor other than APC. And that’s where the trouble starts.
Naturally, we simply took APC’s word that the shipment was en route as ordered — and they, in turn, were taking the shipping company’s word for it. It turned out, however, that our stuff was ordered and consequently shipped later than we'd thought. That became an issue right around the time we realized the cooling condensers had to be weather coated. Because the project deadline loomed just two weeks away, adding a week or so of weather coating into the schedule was a big problem.
But when APC tried to find our goods in the shipping company’s records to see if we could either halt the condenser delivery so APC could coat them, or speed it up so we wouldn’t be so crunched for time, the shipping company couldn’t give us an exact location. By the time it could, the condensers were bobbing across the Pacific. We couldn’t even get the shipping company to prioritize our container so it would get dropped on the dock early Monday morning. We ended up having to shift project deadlines and travel schedules. Staying on top of your vendor’s shipping process may be a pain, but it will serve up golden dividends of efficiency on project day.
Another important part of vendor watching is staying on top of equipment orders. We weren’t nearly careful enough here. Don’t just place the order, glance at the P.O., and assume they’re shipping what you want. We did and it hurt. Even the best vendors with the best intentions can make critical mistakes when filling orders. Only the caution of Phil Rapoza, our facilities manager, saved us from APC's condenser spec-and-switch. We also had a full cable management system spec’d out and ordered, but suddenly the vendor (who shall remain nameless) backed out, claiming resource problems. Here again, Phil Rapoza and his band of merry men saved the day, fabricating cable ladders customized for the room when an alternative supplier couldn't be found in time.
Your project's problems might have different root causes , but in an industry that moves as fast as ours companies can go out of business, shift direction, or be acquired over the course of a weekend, leaving customers holding the bag when orders disappear into the ether. Count on orders and shipments to go wrong. Plan for the unexpected by getting an early jump when you can and building time for unexpected delays into your project schedule.
Lesson 5: Make a migration checklist and check it twice