On Feb. 25, 1991, during the first Gulf War, a Scud missile hit U.S. Army barracks in Dhahran, Saudi Arabia, killing 28 U.S. soldiers. The barracks was defended by a Patriot missile defense system, which for some reason failed to track and intercept the incoming Scud. A year later, a U.S. General Accounting Office (GAO) investigation into the Patriot's failure concluded that the battery's weapon control system suffered from a fatal flaw: It was bad at math.
On Tuesday, researchers at Sun Microsystems Inc. discussed work they are doing, as part of a three-year, $50 million Defense Advanced Research Projects Agency (DARPA) grant, that aims to avoid the kind of errors that caused the Patriot failure.
Mathematical errors are far more common in the computer industry than most people realize, said Greg Papadopoulos, Sun's executive vice president and chief technology officer. While his company is normally the first to accuse Microsoft Corp. of shoddy operating system design, bad math and not Windows is sometimes behind those unexplained PC crashes, he admits.
"There are a lot of errors that happen in machines that go undetected," Papadopoulos said. "Sometimes a machine just goes away and freezes. You always blame it on Microsoft. We do, too. It's convenient. It's convenient for Intel, too."
"It's a dirty secret. Floating-point arithmetic is wrong," said John Gustafson, a principal investigator with Sun, based in Santa Clara, California. "It only takes two operations to see that computers make mistakes with fractions."
The problem that Gustafson and Papadopoulos referred to stems from the fact that the binary mathematics employed by computers has a hard time accurately representing certain numbers. Fractions, for example, are particularly tough, because they often involve non-terminating numbers that are impossible to accurately express in binary format.
Dividing two by three on a calculator illustrates the problem. The fraction 2/3, when represented in a computer, is inevitably rounded up, making the last digit a seven.
In the case of the Gulf War incident, the Patriot battery's computer rounded a similar, non-terminating number in order to calculate time. But by shaving off a few digits during every calculation, the battery also shaved off a bit of time. After one hour, the Patriot's clock was off by .0034 seconds. On Feb 25, the computer had been in operation for 100 hours straight, and its clock was off by over one third of a second, enough to cause it to miss the incoming Scud.
Programmers who write software that requires these types of calculations are "very much aware of these problems," and use a variety of techniques to work around the inaccuracy, said Nathan Brookwood a principal analyst with the firm Insight64 in Saratoga, California.
But with supercomputers that calculate billions of sums per second, some of these workarounds can slow down performance, and the risk that some unanticipated mathematical error may occur remains a niggling doubt.
Sun researchers are looking to solve both problems using a technique called interval arithmetic, which essentially traps a mathematically incorrect number between two other numbers that are known to be correct, and prevents mathematical inaccuracy from ballooning out of control over time. "If you can prove mathematically that the right answer is between this answer and that answer, you can restore mathematical rigor to computing," Gustafson said.
This whitepaper explains the terminology and concepts behind Data Replication technologies and establishes some sizing rules through worked examples. Learn the new paradigm in disaster tolerance—protect data anywhere.
Download now »Server virtualization is a popular option for dealing with mounting datacenter costs. Another equally promising approach is the use of an Application Delivery Controller. Citrix NetScaler provides a low-cost way for organizations to reduce their server count and accrue cost savings from a reduction in space, cooling, power and personnel.
Download now »
The emergence of WLANs has created a new breed of security threats to enterprise networks.
Included in HP ProCurve WLAN solutions is security technology that alleviates threats from WLANs through:
* Monitoring wireless activity inside and out of the enterprise
* Classifying WLAN transmissions into harmful and harmless
* Preventing transmissions that pose a security threat to the enterprise network
* Locating participating devices for physical remediation
Effectively address data protection challenges, implementing solutions that help store and protect businesscritical data while cutting costs and improving efficiency and reliability.
Download now »
Sign up to receive Platforms Resource Alerts
