Coverage Analysis
written by Walter Bright
June 8, 2008
You’ve finished writing the code, and have put together a nice test suite to check it out and verify it is functioning properly. But it’s hard to tell if the test suite is complete or not just by looking at it. Even worse, how is someone not intimately familiar with the code (such as the QA staff) going to tell how well the code is tested?
The answer is a coverage analyzer. For gcc, the gcov tool works well. For the D programming language, the compiler has a -cov command line switch to generate coverage information. Here’s what the report looks like for the sieve program:
|/* Eratosthenes Sieve prime number calculation. */ | |import std.stdio; | |bool flags[8191]; | |int main() 5|{ int i, prime, k, count, iter; | 1| writefln("10 iterations"); 1| for (iter = 1; 11| iter <= 10; 10| iter++) 10| { count = 0; 10| flags[] = true; 163840| for (i = 0; i < flags.length; i++) 81910| { if (flags[i]) 18990| { prime = i + i + 3; 18990| k = i + prime; 168980| while (k < flags.length) | { 149990| flags[k] = false; 149990| k += prime; | } 18990| count += 1; | } | } | } 1| writefln("%d primes", count); 1| return 0; |} sieve.d is 100% covered
The numbers to the left of the | indicate how many times that line was executed. (If there is more than one statement on a line, the number is the sum of the execution counts of each of those statements.) The last line is the percent of the statements that were executed, in this case every executable statement was executed at least once.
What can this information be used for?
- to improve the program’s performance, by finding ways to reduce the execution count of expensive operations.
- to find the most likely path of execution through a function, so the layout of the function can be optimized for that.
- to find dead code, which is code that can never be executed. Dead code tends to accumulate in older projects that have been successively maintained. Once dead code is identified, it can be removed or commented out, to streamline the source code.
- most importantly, it identifies code that isn’t executed by the test suite. Test cases can then be crafted specifically to fill in those gaps.
How well does this work in practice?
It’s pretty common for programs that have a test suite, but have never undergone coverage analysis, to have only 40% coverage. This tends to shock the programmers who thought they had a thorough test suite! It’s like running a profiler the first time and discovering that where the program actually spends its time has no relation to where the programmer thought it was.
This means that code can often ship that over half of which has never even been run! What are the chances of all that code working correctly the first time? Much of this never tested code tends to be error reporting and recovery code, which could explain why programs are so poor at it.
Once the coverage analyzer is used to raise the coverage by the test suite to as much as possible (it’s hard to get to 100% because some of that code might be there to handle error conditions that are hard to simulate, or are for configurations not present on the test machines), it’s almost startling how much better the code performs once in the customers’ hands. Every project I know about where a commitment was made to coverage analysis, there was a dramatic improvement in reliability. Sure, simply executing code offers no guarantee that the code is producing the correct result, or that the design is correct, or that there won’t be buffer overflows, etc. But the experience is that bugs of all sorts are reduced dramatically when the coverage is near 100%.
Of all the schemes, tools and methods that purport to improve code quality, there are few that are as effective (and as easy to use) as a coverage analyzer. They’ve been around for at least 20 years and are widely available, often built in to the tool set you’re already using.