After my oh-so-recent review of Best Kept Secrets of Peer Code Review, I followed up with the author to inform him of the review. I also took the opportunity to tell him that I disagree with him on a few points, not the least of which is the concept that finding a larger number of defects is good. His reply was gracious, and I hope to have an opportunity to further discuss some of these points with him. Meanwhile, I was inspired to carry on my rant here on the home front.

The book says “Authors and reviewers should be proud of any stretch of where many defects were found and corrected”. Personally, I think that reviewers should be proud, but authors should be thoroughly embarrassed for letting the reviewers find the issues before he or she did.

In all fairness to Mr. Cohen, the ideas that I take issue with did not originate with him. They have been put forth by Watts Humphrey and other peer code review experts for more than a decade now. Furthermore Humphrey’s PSP and it’s successor TSP, which rely heavily on this philosophy have been shown to be remarkably effective in some organizations. So the stand I will take here is probably in direct opposition with the prevailing empirical evidence; nonetheless, I’m sticking to it.

My real pain regarding the “more defects == better” philosophy comes from a project I worked on, where PSP was a big push. It epitomized the worst of all review worlds. The PSP training was abbreviated to a single 6 hour class, but really emphasized the “more==better” point, so it seemed that some developers just shrugged their shoulders about quality, apparently figuring it was better for the team if the bugs were found in review. Most reviews were desk-reviews and author prep (or even participation) was practically nil. Reviews were painful and time-consuming because of all the trivial garbage; and few real bugs were found because time was not given to do a thorough review. (I actually had a review closed out from under me when I was one of only two reviewers, because time was up… I had taken a long time because I was finding some very significant issues.) As schedule pressures rose, reviews were deferred until the end of the project; so that review-inspired learning never occurred. When reviews were finally done, management fought hard against changing anything that could be left alone. The result was that practically every code-quality attribute suffered.

From the above experience, I came to the realization that “higher-percentage==better”. We need to know that the review was effective, and a raw quantity can’t tell you that. I would prefer to use something like capture-recapture built into the metrics to determine success; as opposed to quantity. If capture-recapture had been looked at on the project, I have an awful suspicion that our find rate would have been on the order of 70%.

The “more==better” philosophy seems to anticipate that quality is always poor at first review; and it promotes an attitude that poor quality is okay, or even good.

To give you an idea of the other extreme: On another recent project, I wanted to get a feel for what I was in for with review; so I turned over a group of files for “preliminary review” to a man who was known to be the toughest reviewer in the organization; A Ph.D. who was the SQA lead. He had the reputation of finding about 80 – 90% of the errors in any review. The code I gave him constituted about 6000 LOC (2000 PSLOC); and included a communication driver and a command interpreter (fairly complex stuff). Along with it I provided a Doxygen generated design document (200 pages) for reference; and a guideline for the best order to analyze it in. About 2 weeks later he gave me the results: He had found 3 defects; all within comments. Two punctuation errors, and one missing document reference ID. Because he also had a question about why I had done something the way I did, I added another defect: I would document my rationale in the code. Now, I never expected to get off quite that easy, especially in an organization where I had never had code reviewed before. While this was a preliminary review, it was a pretty thorough one. Comparing the results to Mr. Cohen’s average of 32 defects per 1000 LOC, I apparently fared about 50x better.

Of course, while I’d like to believe I’m just that good, in truth I’m not; at least not without some help. My secret was, that in addition to thorough prep; I was using lightweight formal methods (aka extended static checking with annotations) and every file had been run (until “clean”) through 3 separate analyzers prior to each build; and of course, prior to the review. Note that I integrated these tools with my editor, so each was available with a two-keystroke operation; with a mouse-click to instantly navigate to the point of any issue.

My work on that project was over before full-reviews were scheduled (that darned schedule pressure again); but I’m pretty confident that defects-found for my full 15,000 lines would have numbered in the low 10’s.

So my view as a developer is that “less==better”. I view it as a challenge, like a game-of-skill against my opponent, the review team; or when I’m the reviewer, visa-versa. If both parties take that view, then the developer will always try for zero-defects, and the review team will always try to find something significant. This leads to a continuous evolution of code-quality.

Of course, the social aspects are a bit tougher… hurt feelings and animosity certainly can and do arise. So you pit people against their own averages, rather than against those of other on the teams… management can provide a reward system when the averages continually improve.

So, for that second project; what was the cost of all that rigor in terms of productivity? I assert that my productivity was higher than it might have otherwise been; a little more time spent in tool integration, a lot less time spent in debug and refactoring. The old rule-of-thumb is that 10 debugged LOC per day is average for the life of a project, but since I wasn’t around for the verification / release phases, that number does us no good; and I’ve never believed it to be a very reasonable number anyway. On the other hand, Mr. Cohen’s book provides a real-world example, where a 10,000 line application was written with 30 man-months of effort. Now I can very comfortably assert that my 15,000 line application was well-beyond 50% complete in its life-cycle even without verification, after only 8-man-months of effort; but if we assume that it was only at the 50% mark, then we can extrapolate almost 3 times the productivity of the book’s example. I consider myself a reasonably, but not extraordinarily, fast coder; and all the design skill in the world can’t make 15,000 lines take less time to enter; so where did that extra productivity come from? I attribute it to the tools and methods that were applied. The extra rigor cost nothing, in fact it was profitable, as is often the case for quality-enhancing practices; and that will be the topic of another post.

About Max H:
Max is a father, a husband, and a man of many interests. He is also a consulting software architect with over 3 decades experience in the design and implementation of complex software. View his Linked-In profile at http://www.linkedin.com/pro/swarchitect