Static Code Analysis HOWTO

Last week I promised to write about different static source code analysis tools and methods we’re using. Here you go:

Finding bugs

The rationale for using this kind of analysis is obvious. Of course we want to catch bugs as early as possible.

For C++ code, we’re using CppCheck together with Cppcheck Plugin for Jenkins for finding bugs that a C/C++ compiler doesn’t see. CppCheck Jenkins plugin has nice configuration options:

We’ve set our jobs so that if there are any new errors, the build will be marked as failure. We also have unchecked ‘Style’ issues from the severity evaluation as we’re mostly concerned about real errors. With this configuration we haven’t had any false positives reported this far.

Needless to say, all problems reported by CppCheck need to be fixed immediately. Developers are also able to run CppCheck analysis on their own development environments.

When the build fails, all who committed code in that build will receive an e-mail stating that it might have been their commit which broke the build. The mail is also Cc:d to all other developers working in the project.

Finding duplicate code

Duplicate code is bad. Not only can it be considered as waste, but it also means that when code is changed, all the duplicates of it probably need to be changed as well. And as one might not even know that some piece of code is duplicated, it’s likely that e.g. a bug fix is only done on one occurrence of the same code. Therefore we want to minimize the amount of duplicate code.

For finding out duplicate code we use a tool called PMD’s Copy/Paste Detector (CPD) together with Jenkins DRY Plugin. Also here we have taken similar approach as with other analysis, if the amount of duplicate code grows, we will mark the build as unstable or failed. CPD works with Java, JSP, C, C++, Fortran and PHP code.

Finding Complex Code

For C/C++ code, we use a tool called CCM to find out code which is potentially hard to change. CCM measures the Cyclomatic Complexity of code. Even though Jenkins supports running CCM as part of the build, it is missing the features for setting thresholds based on CCM results. Therefore we writed a small Python script to parse the CCM’s XML report and calculate the amount of High risk methods (cyclomatic complexity 20 or more) and the amount of Medium risk methods (cyclomatic complexity 10 or more). The script takes in threshold values for high and medium risk methods and fails the build in case high risk threshold is exceeded. In case medium risk threshold is exceeded, the script outputs a warning text and we use Jenkins Text Finder plugin to mark the build unstable.

For Java code, we get the complexity figures from Clover report. With that we also need to use a custom Python script for parsing the results.


Get the Most Out of Static Code Analysis

Static code analysis is a cheap way to improve code quality. One might argue with this by stating that tools like Klocwork and Coverity are actually fairly expensive. And here comes one common misconception related to static analysis, some people think it’s about tools while it’s not. Sure good tools help, but there are a lot more important issues related to this. There are several problems related in doing static code analysis and only some of those might be solved with proper tools.

Problems / Solutions

Problem: Too many problems reported

Amount of reported problems is devastating and developers feel overwhelmed. As a result, the problems aren’t fixed. Sometimes a large number of reported problems are actually not real problems but e.g. style issues or even false positives. Good tools help in this but truth is all of them require manual work in tuning the results.

Solution: Define baseline and focus only on new problems

Measure the number of problems reported and define that as a baseline. Then make sure that no new problems are introduced. This ensures that the codebase doesn’t at least get worse. Later on its possible to tighten up the thresholds to get rid of older reported problems as well.

Problem: Tools are not used

It doesn’t really help if you have the best tools money can get but nobody’s using them.

Solution: Run tools automatically

If you’re using continuous integration system such as Jenkins you can run static analysis for every commit. In case the analysis takes long time, run it at least once a day (or night). Make sure all committers are being informed should the analysis reveal new problems. In my opinion, if static analysis reveals new problems, it should be considered a stop-the-line situation. So stop everything, fix problems, and only then continue.

You can also take this to extreme by introducing a pre-receive (or pre-commit) hook in your VCS. There a commit is rejected if static analysis doesn’t pass. In this case analysis needs to be really fast.

Problem: Results are not understood

It might well be that tools produce fine reports pointing out the problems but if developers don’t understand why something should be fixed then it doesn’t get fixed. Commonly heard reasoning is “but the code works why should we fix it?”

Solution: Educate developers, chew the results automatically

In the worst case you need to do like was suggested in the previous solution, declare stop-the-line in case analysis reveals problems or even reject commits to VCS in such a case. I’ve noticed one of the most effective ways is to just show and tell. So show developers a recent problem which has been found, point out the source code location and tell what’s wrong with the code. For example complex code usually doesn’t look to be “wrong” but if you try refactoring it turns out to be difficult. Or if code is not covered by unit tests, retro-fitting them is not easy.

It makes sense to do some pre-analysis for the results. So not to show raw results to developers but try to filter out non-problems and possibly also old problems. The simpler the better.

Problem: Tools are used by wrong people

Some companies have solved some of the previously presented problems by devoting a separate team to perform code analysis tasks. Unfortunately it might be that this team is capable of only finding the problems but not fixing them. Sure it might be more effective if there’s number of people telling that here are these problems which should be fixed but it’s still likely the problems will not get fixed.

Solution: Automatic analysis, everyone responsible for fixing problems

A basic rule of thumb which works in several different issues is “if you break it, you fix it”. Making problems visible by automatic analysis and rapid feedback makes it more likely that problems get fixed. With the tools available it is fairly trivial to point out who caused codebase to become worse. If nothing else helps, public humiliation is the last option to try. And that was a joke, being constructive is of course always really important.


So what kind of different static source code analysis tools and methods are out there? I’ll tell something about the ones we are using in the next post. Stay tuned, meanwhile you can for example run some static analysis.


Farewell SVN, part II

We have one team which has been using Git for several years already. When they heard we’re looking for a substitution to Subversion, and speciall when they heard that one of the alternatives is VisualSVN, they said that Git is pretty much the only sensible choice. They did not want to give up using Git. So I started looking for more information about this wonderful VCS. The more I read about Git, the more convinced I was that this is the future choice of ours.

There were few challenges still to be resolved. How to manage access rights? How to manage storing SSH keys on server? How to deal with subcontractors? All the same challenges we had with SVN wouldn’t go away on their own with Git.

That’s when I was introduced to GitHub. A colleague of mine told that GitHub pretty much has everything we’ve been longing for (see part I), it comes out-of-the-box with features supporting using subcontractors and it has some really nice “social” features (see Crappy thing was that our company policies don’t allow storing source code on any other than our own servers. So GitHub was out of the question, it seemed.

Luckily enough, I learned that there’s a concept called GitHub:Firewall. And it’s exactly what we needed and wanted – a private installation of GitHub on our own server, behind our own firewall. Excellent!

As of writing, we’re in the process of taking our own GitHub:FI in use. We’ve got a dozen developers using it at the moment and will expand to whole corporation use during Q2 (hopefully). I really hope that before the end of this year, we no longer have any single SVN installations running (not to even mention about those few CVSes we still have!).

So farewell SVN, I won’t be missing you.