To do this, Merlin built its own highly parallelizing analysis tools, which it runs on a high-performance Oracle RAC (Real Application Cluster) installed on a rack of Dell PowerEdge 1850 and 2850 dual-core Xeon servers. Data storage is provided by EMC CLARiiON 2Gbps and 4Gbps FC storage towers. Sitting on top of Oracle is Merlin’s HPC task-scheduling software, also created in-house, and an Oracle data mart that serves as a temporary holding ground for frequently used data subsets, much like a cache. Most of the high-speed calculations run directly on the Oracle RAC, which is fronted by a series of BEA WebLogic app servers that take in requests from a set of redundant load balancers sitting behind the company’s customer-facing Apache Web servers. Sitting in front of each of the three layers are sets of redundant firewalls.
First, tightly coupled parallel processing via message passing was simply out of the question. Instead Merlin’s architects and programmers put tremendous effort into dividing processes in an “embarrassingly parallel” fashion without any interdependencies at all. This benefits scalability and reliability, as the high-speed, low-latency communications required for interprocess communications create scalability bottlenecks. They also require cutting-edge interconnects such as Myrinet and InfiniBand, which don’t have the reliability track record of Gigabit Ethernet.
“We didn’t want some new interconnect driver crashing the system,” Mohamed says, adding that straight Gigabit has also helped Merlin achieve considerable cost savings.
Reliability and enterprise-grade support fueled Merlin’s decision to stick with an Oracle RAC, which has high-quality fault-tolerant fail-over features; dual-processor Dell PowerEdge servers; high-end EMC CLARiiON FC storage; and F5 load balancers.
“There are lots of funky platforms for HPC out there and high-bandwidth data storage solutions that can pump data at amazing rates,” Mettke says. “The problem is that you end up dealing with lots of different vendors, some of whom can’t deliver the 24/7 enterprise-level support you need. That adds another element of risk.”
Finally, all code was written using Java, C++, and SQL.
“I’ve been on the other end running code written in Assembler on thousands of nodes,” Mettke says. “We want the speed, but not at the expense of system crashes in the middle of a trading day. You can claim you have the best cluster out there, but it doesn’t matter if there’s no show when it’s showtime.”
Mettke adds that the architecture of Merlin’s HPC infrastructure is constantly evolving to accommodate new data and applications.
Aerion gets HPC help
For organizations looking to get a cluster up and running quickly, enlisting the help of specialized Linux HPC hardware vendors such as Linux Networx and Verari Systems can cut down development time significantly. Not only do these companies sell and configure standard hardware, but they often have the expertise to deliver turnkey configurations with apps installed, tuned, and tested. Such was the case for Aerion, a small aeronautical engineering company that tapped Linux Networx to bring the upside of in-house HPC to its business of developing business jets.
Aerion, which works on the preliminary jet design process, relies on larger aerospace partners for design completion, as well as manufacturing and service. One of the company’s projects, an early-stage design for a supersonic business jet, required particularly demanding CFD (computational fluid dynamics) analysis.
“In many commercial subsonic transport projects, you can develop different parts of the jet independently, then put all the pieces together and refine the design,” says Aerion research engineer Andres Garzon. “But with supersonic jets, everything is so integrated and interactive that it’s really impractical to develop each element apart from the others.”
Of course, small organizations such as Aerion don’t always have the resources on hand to fly solo on HPC -- not to mention the fact that Aerion was also in the process of switching from Fluent to a series of powerful, free tools developed by NASA. So, when Garzon stumbled on a Linux Networx booth at an American Institute of Aeronautics and Astronautics meeting three years ago and the Linux Networx reps he spoke with offered to provide the hardware and much of the integration and testing work for the NASA apps Aerion wanted to use, Garzon took them up on the opportunity to get HPC up and running quickly.
Working with Linux Networx, Aerion configured an 8-node Linux Networx LS-P cluster of dual-processor AMD Opteron 246-based servers with 4GB per node, plus a ninth server to act as a master node. The NASA code requires a significant amount of complex message passing among parallel processes using the MPI, which usually requires a very high-speed, low-latency interconnect, such as InfiniBand or Myrinet. Because Aerion’s budget was limited, Linux Networx offered to benchmark the apps with Myrinet, InfiniBand, and Gigabit Ethernet. Although performance under Myrinet and InfiniBand was superior (and roughly equivalent between the two), the overall difference was not dramatic enough to justify the expense. So, Linux Networx delivered a Gigabit Ethernet configuration, saving around $10,000, Garzon estimates.
As for storage, it is all local -- rather than SAN-based -- and is managed by the master node, which mirrors the OS and file system to the compute nodes. Thus, data is stored both on the local drives and the master node.
Linux Networx recompiled the NASA code -- which was originally developed to run on SGI machines -- for the Linux cluster. It also set up appropriate flags for the system and fine-tuned the cluster so that Aerion would be operational in a few days. Management is provided by Linux Networx Clusterworx, which monitors availability on the nodes, creates the image and payload for each node, and reprovisions nodes as necessary.
In all, Garzon found the process of bringing HPC in-house with the aid of Linux Networx to be relatively trouble-free and plans to expand the system to run additional cases simultaneously and to reduce compute time on time-sensitive calculations.