July 28, 2004

NASA to build 10,000-processor Linux computer

Agency will build largest ever supercomputer based on SGI's 512-processor Altix computers

The National Aeronautics and Space Administration (NASA) has given the green light to a project that will build the largest ever supercomputer based on Silicon Graphics Inc.'s (SGI) 512-processor Altix computers.

Called Project Columbia, the 10,240-processor system will be used by researchers at the Advanced Supercomputing Facility at NASA's Ames Research Center in Moffett Field, California.

Scientists will use Columbia to design equipment, simulate future space missions and model weather patterns. A portion of the $160 million system will also be made available to other government agencies and educational facilities, said Bill Thigpen, manager of Project Columbia. "We need to look at working with other agencies to provide them with access to this system because it is a unique system," he said.

What makes Project Columbia unique is the size of the multiprocessor Linux systems, or nodes, that it clusters together. It is common for supercomputers to be built of thousands of two-processor nodes, but the Ames system uses SGI's NUMAlink switching technology and ProPack Linux operating system enhancements to connect 512-processor nodes, each of which will have more than 1,000GB of memory.

"We use a very large single-system image," said Jeff Greenwald, senior director of server product marketing with SGI. "The other guys come with a very thin node cluster, and try to screw them all together."

The Altix nodes will use Intel Corp.'s Itanium 2 microprocessors, and the entire 20-node system is expected to be fully assembled by year's end, he said.

SGI has used this large-node technology to build a number of smaller Altix systems with between 3,000 and 6,000 processors, but Project Columbia will be the largest to date, Greenwald said,

Columbia's large-node, shared-memory architecture works well for NASA's "tightly coupled" weather and space simulation applications, where a lot of inter-processor communication is required, Thigpen said. "These codes scale very well on this type of architecture," he said.

The downside to the large-node architecture is that if a single processor fails, the entire 512-node system goes out of service, he said.

The first node of Project Columbia, named Kalpana, after Columbia astronaut Kalpana Chawla, was built by Ames researchers last fall. Since then, two more nodes have been added, and NASA and SGI will spend the next five months assembling the next 17 nodes.

With the next version of SGI's NumaLink technology, expected in the fall, Project Columbia will be able to share memory across 2,048 processors, Thigpen said.

Linux creator Linus Torvalds applauded the team's success at using Linux in such large nodes. The operating system typically is used in much smaller nodes of 2 to 8 processors.

"Scaling up to ... 512 CPU's is pretty damn studly," said Torvalds said in an e-mail interview. "Putting twenty of them in a cluster and making them be programmable as a single machine is pretty hot."

Close

On Twitter now

Platforms

Powered by Twitter

On Twitter now

White Paper

D2D Virtual Tape Library Replication Primer

This whitepaper explains the terminology and concepts behind Data Replication technologies and establishes some sizing rules through worked examples. Learn the new paradigm in disaster tolerance—protect data anywhere.

Download now »

White Paper

An Alternative to Virtualization for Datacenter Cost Savings

Server virtualization is a popular option for dealing with mounting datacenter costs. Another equally promising approach is the use of an Application Delivery Controller. Citrix NetScaler provides a low-cost way for organizations to reduce their server count and accrue cost savings from a reduction in space, cooling, power and personnel.

Download now »

White Paper

Why Your Firewall, VPN, and IEEE 802.11i Aren't Enough to Protect Your Network

The emergence of WLANs has created a new breed of security threats to enterprise networks.

Included in HP ProCurve WLAN solutions is security technology that alleviates threats from WLANs through:
* Monitoring wireless activity inside and out of the enterprise
* Classifying WLAN transmissions into harmful and harmless
* Preventing transmissions that pose a security threat to the enterprise network
* Locating participating devices for physical remediation

Download now »

White Paper

Bringing the Edge to the Data Center

Effectively address data protection challenges, implementing solutions that help store and protect business–critical data while cutting costs and improving efficiency and reliability.

Download now »

Sign up to receive Platforms Resource Alerts

Subscribe to the Today's Headlines: First Look Newsletter

Find out what will be news for the day, with our first-thing-in-the-morning briefing.

©1994-2009 Infoworld, Inc.