Luckily, those attempting to bring HPC in-house will not be alone. “The HPC community itself is quite small and very open and willing to share valuable information,” Appa says.
Appa points out that Fluent has an excellent benchmarking site that demonstrates performance variations among various hardware and software combinations. In his case, the Microsoft Institute for High Performance Computing at the University of Southampton provided sound advice on what hardware worked and what didn’t, particularly during the beta phase.
Virginia Tech starts from scratch
At Virginia Tech’s Advanced Research Institute (ARI), constructing an HPC cluster for cancer research has been an educational experience for the electrical and computer engineering grad students involved.
With little prior HPC experience, the students built a 16-node cluster and parallelized apps they had written in MATLAB, a numerical programming environment, over the course of several months. The project taps huge amounts of data acquired from biologists and physicians to perform molecular profiling of cancer patients. The students are also working on vehicle-related data for transportation projects.
Rather than make every aspect a learning experience, when it came to choose an HPC platform, the students and professors decided to stick with what they already knew: Microsoft Windows.
“Our students had already been running MATLAB and all their other programs on Windows,” says Dr. Saifur Rahman, director of ARI. “We didn’t want to have to retrain them on Linux.” As was the case at BAE Systems, there were also obvious advantages to a cluster that could integrate easily with the rest of ARI’s Windows infrastructure, including Active Directory.
Microsoft had already approached Virginia Tech to be an early adopter of Windows Compute Cluster Server 2003, so Dr. Rahman and his team said yes and started looking for the right hardware. They vetted several vendors, but when they found out Microsoft was performing its own testing on Hewlett-Packard servers, they decided to go with HP. “We knew we’d need help from Microsoft to fix various bugs,” says Dr. Rahman, “and since all their experience was on HP servers, we felt we’d have the most success with HP.”
So with help from Microsoft and HP, ARI installed 16 HP ProLiant DL 145 servers with dual-core 2.01GHz AMD Opteron 270 processors and 1GB of RAM each. On the same rack, ARI installed 1TB of HP FC storage. The rack also includes one head node, as well as an HP ProLiant DL385 G1 server with two dual-core 2.4GHZ AMD64 processors and 4GB of RAM.
As did BAE Systems, ARI decided to stick with Gigabit Ethernet for its cluster interconnect, mainly because it was what the team knew. “There are other interconnects that are faster, but we’ve found that Gigabit Ethernet is pretty robust and works fine for our purposes,” Dr. Rahman says. And after some servers overheated, ARI placed the entire cluster in a 55-degree Fahrenheit chilled server room.
ARI found parallelizing MATLAB apps to be a significant challenge requiring a number of iterations. “The students would work on parallelizing the algorithms, then run case studies to verify the results they were getting with the clustered applications were similar to results they got when they ran one machine,” Dr. Rahman says.