Large-scale software systems are staggeringly complex works of engineering. Bugs inevitably come with the territory and for decades, the software profession has looked for ways to fight them. We may not see perfect source code in our lifetime, but we are seeing much better analysis tools and promising new approaches to remedy the problem.
TDD (test-driven development) is one increasingly popular approach to finding bugs. The overhead can be substantial, however, because the test framework that ensures a program’s correctness may require as many lines of code as the program itself. Run-time checking is another popular approach. By injecting special instrumentation into programs or by intercepting API calls, tools such as IBM’s Rational Purify and Compuware’s BoundsChecker can find problems such as memory corruption, resource leakage, and incorrect use of operating system services. TDD and run-time checking are both useful techniques and are complementary. But ultimately, all errors reside in the program’s source code. Although it’s always important for programmers to review their own code (and one another’s), comprehensive analysis demands automation.
One compelling demonstration of the power of automated source code analysis is Coverity’s Linux bugs database. Viewable online, this April 2004 snapshot pinpointed hundreds of bugs in the Linux 2.6 source code. Coverity’s analyzer, called SWAT (Software Analysis Toolset), grew out of research by Stanford professor Dawson Engler, now on leave as Coverity’s chief scientist.
In the Windows world, a static source code analyzer called PREfast, which has been used internally at Microsoft for years, will be included in Microsoft Visual Studio 2005 Team System. PREfast is a streamlined version of a powerful analyzer called PREfix, a commercial product sold in the late 1990s by a company called Intrinsa. Microsoft acquired Intrinsa in 1999 and brought the technology into its Programmer Productivity Research Center.
Today’s source code analyzers are not new. Their lineage traces back to a tool called lint which, from the early 1970s, enabled programmers to find common errors in C programs. Renewed interest in this venerable art prompts experts to offer several possible explanations.
Brian Chess, chief scientist at Fortify Software, whose analyzer specializes in detection of security vulnerabilities in C, C++, Java, JSP, and PL/SQL source code, thinks security concerns make analysis even more imperative. Many of the errors in C and C++ that are hard for humans to spot, but relatively easy for automated analyzers to find, involve memory management. Before computers were pervasively interconnected, a buffer overflow was an inconvenience but not necessarily a disaster. Now, such errors are routinely exploited by attackers.
However, Engler thinks the security explanation should be taken with a grain of salt. His research in the late 1990s aimed to improve the reliability of software. Security analysis was part of the story, he says, but “basically, we just didn’t want stuff to crash.”
Moore’s Law is also driving source code analysis forward. Exhaustive analysis of code can chew up vast quantities of computing resources. At Microsoft, for example, PREfix performs deep analysis of millions of lines of C and C++ code, but it can only run infrequently as part of a centralized build processes. Developers typically use PREfast, PREfix’s less resource-intensive cousin, for routine daily checking.
As available computing horsepower grows, we can devote more of it to program verification. Also, Engler notes, faster CPUs tend to marginalize the optimization work that was the traditional focus of compiler professionals. “As the range of applications that benefits from optimization gets smaller,” he says, “there’s been a push to find something else interesting.”
Fortify’s Chess adds that there has been a fundamental philosophical shift in how we approach the issue of source code analysis. Early researchers were interested in program correctness, he says. The goal was to prove that “my program will, under all circumstances, compute what I intend it to compute.” Now, he says, the emphasis has switched to a more tractable form of proof: that “there are specific properties my program does not have.” Buffer overflows and deadlocks are examples of such properties.
Microsoft’s Chris Lucas, group program manager for Visual Studio Team Developer Edition, thinks that better rules, more than better techniques, account for the growing efficacy of source code analysis. As did Coverity’s analysis of Linux code, Microsoft’s analysis of Windows code proved to be an effective way to flush out bugs. Within Microsoft, the rule set evolved in an iterative way. “First the PPRC [Programmer Productivity Research Center] identified some interesting rules,” Lucas says, “and then they were applied to the Windows source base.” The rules that yielded important defects without creating too much “noise” were codified and then the cycle repeated. “It’s all about tuning the rule set,” Lucas explains.
Benjamin Chelf, who studied under Engler and is now Coverity’s chief analyst, agrees that today’s analyzer is not your grandfather’s lint. “This isn’t just another tool that’s going to spew a ton of useless warnings,” he says. Finding the right level of analysis has always been the challenge for source code analysis tools. With too little precision, a tool can flood the developer with false positives. With too much precision, it could take years to complete an analysis. Fortunately, the state of the art has improved in recent years. Solutions are now striking the right balance, Chelf says, and he invites developers to revisit their old assumptions.
Everyone agrees that modern source-code analysis improves software quality. However, techniques differ with respect to the scope of analysis and the amount of specific knowledge of operating systems and application frameworks needed to do the job.
The most powerful form of analysis is global — that is, interprocedural — in scope. Patterns detected in one function or method are correlated with patterns found elsewhere in a program. The analyzer watches the flow of data throughout the entire program, creates a model of the program, and simulates execution paths. The Coverity and Fortify analyzers fall into this category, as does PREfix, Microsoft’s centralized tool. Microsoft’s desktop tool, PREfast, is instead an intraprocedural analyzer mainly focused on local pattern matching. Programmers can, however, annotate functions using SAL (structured annotation language), a notation that enables PREfast to undertake more powerful interprocedural analysis. Another pattern-matching intraprocedural analyzer is the one included with Compuware’s DevPartner Studio. Its domain of analysis is VB.Net and C#.
Analysis that comprehends the complete scope of a program still benefits from knowledge of the context in which that program runs. Fortify’s analyzer, for example, traces from Java methods into database-stored procedures and back again. “Conventional wisdom says that if you use parameter binding, you won’t be subject to SQL injection attacks,” Chess says. But since stored procedure can also construct SQL code, they too are included in the analysis. Similarly, Fortify’s tool looks at how programs interact with their XML configuration files. “If you don’t stitch configuration files together with code,” Chess says, “you don’t really know how the program will behave.”
Coverity’s approach, though, places less emphasis on specific platform knowledge. Using statistical analysis of the patterns seen in a program, it infers that deviations from the norm are probably errors. To illustrate, Engler describes the kinds of rules implicit in programs: “If you do A, then you must do B. In context X, you can never do Y.” How does automatic inferencing work? “Count how often A is followed by B, versus how often A appears by itself,” Engler says. “If you see that A and B are paired 1,000 times, and not paired once, you can be pretty sure that one time is an error.”
Microsoft’s Lucas embraces both strategies. PREfast, and a companion tool called FxCop that works with the .Net languages, grew out of Microsoft’s study of its own programming practices. Some of the rules that emerged are heavily domain-specific, describing correct use of APIs and frameworks. In the .Net realm, for example, FxCop finds “problems with COM interop, common security violations, and common errors that lead to poor-performing code,” Lucas says. Compuware’s analyzer performs similar kinds of checks. A rule called “open to file path hacking,” for example, ensures that if a file system path is protected, the corresponding UNC (Universal Naming Convention) path is also protected.
In general, Java and .Net analyzers are more likely to focus on domain-specific issues. That’s because the memory-related errors that preoccupy C and C++ analyzers, making programs unstable or vulnerable to attack, are mostly absent in managed-code environments. That doesn’t mean there are no memory-management issues to worry about. Managed programs can, for example, still “leak objects,” but the emphasis tends to shift to a higher level of analysis.
Making the Rules
At the heart of every source code analyzer are the rules that describe patterns of error. Analyzers provide a general set of rules and typically enable customers to add new rules codifying knowledge of their own systems and programming practices. The analyzer included with Compuware’s DevPartner Studio, for example, can be extended with rules that match patterns in the text of source code. According to product manager Peter Varhol, this technique is used often to enforce rules about coding style.
Coverity’s rule-writing language, MetaL (Meta Language), can be used for the same purpose. Customers have also used it to propagate bug fixes. Chelf cites one case where a software product failed in the field. After days of debugging, programmers found the erroneous function call that caused the failure. “They wrote a MetaL check to comb the code for other instances,” Chelf says, “and found several that would have caused the same problem.”
Fortify also supports user-written rules. It’s crucial, Chess says, that customers who don’t want to reveal proprietary details about their systems be able to work independently to expand the analysis coverage. PREfast, bound tightly to the Microsoft compiler, does not support user-written rules, but FxCop does, using .Net itself as the rule language.
Could there be a standard way to represent these rules, enabling direct comparison of analyzers and pooling of knowledge about common patterns? In principle, that’s possible; in practice, it seems unlikely anytime soon. Extensibility is important, but vendors of source code analyzers must first convince programmers to take another look at a class of tool that many have long dismissed as irrelevant. Just give it a try, they say, and see if a scan of your source code pinpoints important bugs that you wouldn’t otherwise have found.