To guide the design, the team used a rule-of-thumb devised by computer scientist Gene Amdahl. Ideally, Amdahl posited, a computer should have one I/O bit ready for each instruction it executes.
Most supercomputer architects have disregarded this rule, claiming the processor caches can bank data and have it ready for use when needed. Now that datasets have grown so large, Amdahl's rule should be reconsidered, Szalay argued.
A typical Amdahl number for a supercomputer would be an Amdahl .001, or a thousandth of the optimal balance, whereas Data-Scope should have an Amdahl number of about .6 or .7.
The designers also plan to make some changes in the way databases are used. "We don't use the database just as dump storage but as an active computing environment," Szalay said. Instead of moving data from a database across a network to a cluster of servers, researchers can write user-defined functions that can run against the database itself.
Researchers can use one of three images that can be booted on the system: Windows Server 2008, a combination of Linux and MySQL and a third instance running Hadoop.
Data-Scope will be housed in a new campus green data center being built with $1.3 million in funding from the NSF.