The National Aeronautics and Space Administration (NASA) has given the green light to a project that will build the largest ever supercomputer based on Silicon Graphics Inc.'s (SGI) 512-processor Altix computers.
Called Project Columbia, the 10,240-processor system will be used by researchers at the Advanced Supercomputing Facility at NASA's Ames Research Center in Moffett Field, California.
Scientists will use Columbia to design equipment, simulate future space missions and model weather patterns. A portion of the $160 million system will also be made available to other government agencies and educational facilities, said Bill Thigpen, manager of Project Columbia. "We need to look at working with other agencies to provide them with access to this system because it is a unique system," he said.
What makes Project Columbia unique is the size of the multiprocessor Linux systems, or nodes, that it clusters together. It is common for supercomputers to be built of thousands of two-processor nodes, but the Ames system uses SGI's NUMAlink switching technology and ProPack Linux operating system enhancements to connect 512-processor nodes, each of which will have more than 1,000GB of memory.
"We use a very large single-system image," said Jeff Greenwald, senior director of server product marketing with SGI. "The other guys come with a very thin node cluster, and try to screw them all together."
The Altix nodes will use Intel Corp.'s Itanium 2 microprocessors, and the entire 20-node system is expected to be fully assembled by year's end, he said.
SGI has used this large-node technology to build a number of smaller Altix systems with between 3,000 and 6,000 processors, but Project Columbia will be the largest to date, Greenwald said,
Columbia's large-node, shared-memory architecture works well for NASA's "tightly coupled" weather and space simulation applications, where a lot of inter-processor communication is required, Thigpen said. "These codes scale very well on this type of architecture," he said.
The downside to the large-node architecture is that if a single processor fails, the entire 512-node system goes out of service, he said.
The first node of Project Columbia, named Kalpana, after Columbia astronaut Kalpana Chawla, was built by Ames researchers last fall. Since then, two more nodes have been added, and NASA and SGI will spend the next five months assembling the next 17 nodes.
With the next version of SGI's NumaLink technology, expected in the fall, Project Columbia will be able to share memory across 2,048 processors, Thigpen said.
Linux creator Linus Torvalds applauded the team's success at using Linux in such large nodes. The operating system typically is used in much smaller nodes of 2 to 8 processors.
"Scaling up to ... 512 CPU's is pretty damn studly," said Torvalds said in an e-mail interview. "Putting twenty of them in a cluster and making them be programmable as a single machine is pretty hot."