Writing and maintaining complicated software is a difficult task and every programmer inadvertently makes mistakes. Usually these are minor typographical errors that will be caught at compile time, but others may remain undetected until the system is in production. In the most extreme cases, the bug will cause system failure as was the case in the massive blackout in the Northeast last summer.
In addition to the consequences of system malfunction, software defects have a substantial financial cost. A 2003 report from the National Institute of Standards and Technology (NIST) calculated the total annual cost of bugs at nearly $60 billion. The report also confirmed what most developers already know: the difficulty and cost of fixing a problem grows dramatically throughout the development cycle. Though the report concluded that it is not practical to locate and remove all defects from an application, it stated that nearly 40% could be eliminated by an improved inspection and testing process.
Two common methods for improving software quality are code reviews and automated testing. The goal of both is to detect mistakes, but the distinction between them is how this is achieved. Automated testing attempts to expose problems by executing the code, while reviews rely on "another set of eyes" to verify that it's correct.
Problems can arise when we get too involved with the code to be objective or to consider unusual paths of execution. Certainly every programmer has at one time spent hours trying to track down a problem only to have someone else spot it immediately. Code reviews can be an effective technique for finding errors, but in my experience, useful code reviews are surprisingly uncommon. In some places they're deemed an unwarranted impediment to finishing a project, while in others they degenerate into arguments about trivial issues such as whitespace and brace placement. My explanation for this is that the best candidates for leading the reviews — senior development talent — are constantly in short supply and high demand. Perhaps more importantly, reading and completely understanding someone else's code can be very tedious and time consuming.
Given that code reviews can be effective at locating software defects during development when they are most easily and inexpensively fixed, it makes sense to automate this process as much as possible. This allows for earlier and more frequent inspections, while letting the programmer eliminate the more obvious problems before meeting with others to review the code. A report released by The Standish Group confirms this, finding that automated code inspection reduced the number of people needed for manual code reviews by 50%.
There are several available code analysis tools, but one of the better ones is PMD.
PMD is a static source code analysis tool, meaning that it analyzes source code lexically rather than by executing it as would be done in a unit test. It was originally developed to improve the Cougaar project, a DARPA initiative that developed a framework for distributed, agent-based applications. PMD is both free and open source; it is released under a BSD-style license that allows you to use, modify or distribute it, so long as the copyright notice is left intact.
Though PMD is only about two years old, it is under active development with more than 100 programmers working on the project. The SourceForge project page shows that it has been downloaded about 70,000 times.
PMD can be executed at the command line, but I think more benefit is derived from integrating it into an IDE or Ant build, so my article will focus on these techniques.
The latest version at the time of this writing, 1.7, comes with more than 80 built-in rules. The tool allows you to pick the rules you wish to use, and also to define new rules using either Java classes or XPath statements. The PMD Web site explains the predefined rules in detail, so I won't try to duplicate that information here. PMD separates the rules into 14 categories, such as naming, import statements and unused code. To use the tool effectively, I think it's helpful to also think of each rule as belonging to four broad groups:
This group locates code that deviates from generally accepted coding standards, such as Sun's Code Conventions for the Java Programming Language. Since the objective of PMD is to locate potential defects rather than issues of preference like whitespace and brace placement, the style rules essentially check for more serious style violations that have the potential to confuse other developers or allow for future bugs. Examples of these rules include IfStmtsMustUseBraces, MethodNamingConventions and AvoidDollarSigns.
These rules identify code that serves no effective purpose. Such code might simply be an artifact of refactoring or it might signal a mistake. Since production code should be concise, these violations should definitely be examined, and in most cases, removed altogether. Examples of these rules include EmptyIfStmt, EmptyTryBlock and UnusedLocalVariable.
This group can help to uncover code that violates basic design and coding strategies. While these do not always indicate a logic flaw, they generally show areas which will be unnecessarily inefficient or difficult to maintain. Examples of these rules include UseSingletonRule, StringInstantiation and SignatureDeclareThrowsException.
These rules identify code that is almost certainly in error, or at least very likely to cause undesired behavior. You should examine such violations very carefully and either fix the problems or ensure that the code follows your intentions. Examples of such rules include EmptyCatchBlock, OverrideBothEqualsAndHashcodeRule and ProperCloneImplementationRule.
PMD requires JDK version 1.3 or higher. I tested the instructions with Ant 1.6.1, though presumably it should work with any previous (yet reasonably modern) version of Ant. Follow these steps to set up PMD as part of your build process:
<taskdef name="pmd" classname="net.sourceforge.pmd.ant.PMDTask"/> <target name="pmd"> <!-- NOTE: include a comma-delimited list of ruleset file paths here. The paths (assuming you are using the pre-defined PMD rulesets) should all begin with rulesets/ since this is where they are stored in the PMD JAR file. --> <pmd rulesetfiles="rulesets/strings.xml,rulesets/basic.xml"> <formatter type="html" toFile="pmd_report.html"/> <!-- NOTE: change src.dir to the property name you have defined for your source code directory. --> <fileset dir="${src.dir}"> <include name="**/*.java"/> </fileset> </pmd> </target>
The Ant task will create file named pmd_report.html
in the base directory of your project. Naturally, you can change the path and filename
to suit your preferences. The file paths in this report are all absolute, and
consequently they take up a lot of space in the report, making them difficult
to read or print. You can make the report show relative file paths by adding
the attribute shortFilenames="yes"
to the <pmd>
element.
In addition to the basic HTML report shown above, you can create reports
in several formats,
including XML, ASCII text and CSV. You can produce any number of these formats
in the same build by specifying multiple <formatter>
elements. There is also an alternate HTML format (summaryhtml
)
that includes a table with rule violation counts.
While you could define a custom ruleset by copying and pasting relevant sections from pre-defined ruleset files, there is a better way. You just need to create a new ruleset file with the basic structure:
<?xml version="1.0"?> <ruleset name="customruleset"> <description> A description of your custom ruleset goes here. </description> </ruleset>
Next, add references to rules from a pre-defined ruleset file (rule
references are children of the <ruleset>
element).
You can include entire rulesets with the following syntax:
<rule ref="rulesets/design.xml"/>
You can import them one at a time by specifying the path to the ruleset file followed by a slash and then the name of the rule, for example:
<!-- This is how you include individual rules --> <rule ref="rulesets/basic.xml/EmptyTryBlock"/> <rule ref="rulesets/design.xml/UseSingletonRule"/>
Finally, you can exclude specific rules from imported rulesets:
<rule ref="rulesets/naming.xml"> <exclude name="ShortVariable"/> <exclude name="LongVariable"/> </rule>
PMD integrates with several Java IDEs and editors, including Eclipse, IDEA, NetBeans, JBuilder, jEdit and Emacs. Unfortunately there is no common plugin architecture across these applications, and instructions for installing and configuring PMD is different for each. In this article, I will describe how to get PMD working with one of the most popular IDEs: Eclipse. I have verified these instructions with Eclipse 3.0M7 on RedHat Linux 9, Mac OS X (Panther) and Windows 2000.
The first step to installing the PMD plugin is to download the plugin from SourceForge. It doesn't matter where you save the ZIP file.
You will see the PMD welcome page once Eclipse has restarted, but beyond that, it is not immediately obvious how to use the plugin. I think the easiest way to begin is by checking code manually. To do this, right-click (or ctrl-click on Mac OS X) on an icon in the package navigator view and choose PMD->Check code with PMD, as shown in the following screenshot.
You can expand or limit the set of code that PMD examines by selecting different icons when you right-click; you can choose an entire project, a package or a single class (and also groups of projects, packages or classes).
After running PMD on your source the first time, you won't see the list of violations. Earlier versions of the Eclipse plugins displayed the output directly in the tasks list, but since it was clumsy to navigate a list that combined the two, it has since been moved to its own view. To enable the view, click the Window menu and then choose Show View->Other.... A dialog box will appear, and you must expand the PMD group, select PMD Violations and then click the OK button.
The PMD view sorts the rules in priority order, so that the most likely problems are displayed first. There are also five color-coded buttons in the upper-right of the view; each has a number of one through five and can be used to filter rule violations by priority. The PMD violations view is visible at the bottom of this screenshot.
When you check code manually, the items in the PMD rule violation view don't go away as soon as you fix each problem. You should keep the list up-to-date by periodically repeating the process for checking your code with PMD.
You can also enable PMD to automatically check your source code as you
make changes. To do this, select the Project->Properties menu
item, choose the PMD
item in the dialog and then check the
Enable PMD box. After you close the dialog box, click OK when
prompted to rebuild the project. Note that enabling PMD to review code
automatically can really slow down Eclipse, especially for larger projects.
Sometimes PMD will flag some code as a violation even though you intended to write it that way. I find this to be the case most often with naming rules like ShortVariable — there are times when a short name is best. I don't want to disable the rule, but I do want to tell PMD that a specific violation was intentional so that it doesn't warn me about it again. You can do this by right-clicking on a rule violation and choosing Mark review. This will add a specially-formatted comment to the code that notifies subsequent PMD reviews to ignore that specific instance of the violation.
Finally, you can change the active rules, priorities and parameters by selecting the Window menu, choosing the Preferences item and then selecting PMD->Rules Configuration. The following screenshot shows the Preferences window.
You can create custom PMD rules in two ways: Java classes and XPath. Both methods are described on the PMD Web site, but I will give an example of how to create a rule using a Java class.
In order to focus on how to write the rule rather than the complexity of the rule itself, my example will simply identify classes that have no package. Classes in the null package can create import problems and are usually the result of either a novice developer or a quick-and-dirty hack that should be refactored or eliminated.
The easiest way to get started with creating a new class is to download the source package for PMD. Once you unpack it to your development workspace, you will see a number of familiar directories, such as src/ and lib/. You can build PMD using Maven, but since Ant is more prevalent, I will use it to build my examples.
There is an Ant build.xml file beneath the etc/
directory. You may need to modify the includes attribute in
the jar target so that it will include the package for your
rule class. The default configuration will only put classes in the
net.sourceforge.pmd
package into the JAR file.
Next, you need to create the directory structure beneath src/
to hold your class. My example will be a class named
AllClassesMustHaveAPackage
in the com.ociweb.example.pmd
package.
The rules are constructed using the Visitor pattern. When PMD runs, it uses a parser created by JavaCC to break the source code down into a parse tree called the Abstract Syntax Tree (AST). Your ruleset code will contain methods that are called as the parser hits the relevant nodes.
My example class is shown below, followed by an explanation of the code.
package com.ociweb.example.pmd; import net.sourceforge.pmd.AbstractRule; import net.sourceforge.pmd.RuleContext; import net.sourceforge.pmd.Report; import net.sourceforge.pmd.ast.ASTName; import net.sourceforge.pmd.ast.ASTPackageDeclaration; import net.sourceforge.pmd.ast.ASTClassDeclaration; /** * This is a basic PMD rule that will detect classes that are * not defined to exist within a named package. * * Since this class will check for empty or missing package * statements, it was necessary to import ASTPackageDeclaration, * though different rules will need to import different classes. * Also note that the visit() method is defined in the * JavaParserVisitor interface. It is implemented by the superclass * of AbstractRule, which is named JavaParserVisitorAdapter. */ public class AllClassesMustHaveAPackage extends AbstractRule { String packageName; public Object visit(ASTPackageDeclaration node, Object data) { // this is only called if there is a package, so we need // to check downstream that it was set. packageName = ((ASTName) node.jjtGetChild(0)).getImage(); return data; } // I chose to listen for class declarations because we are // guaranteed to have it, and it will always be called after // the package declaration. public Object visit(ASTClassDeclaration node, Object data) { if (packageName == null) { // Messages printed to standard output will be shown when // PMD is run from the command line or as an Ant target. System.out.println("Found a class with no package declaration."); RuleContext ctx = (RuleContext) data; // now add it to the report ctx.getReport().addRuleViolation( createRuleViolation(ctx, node.getBeginLine())); } return data; } }
First, I define the package for my rule class and then import the classes I'll need to use. As noted in the comments, the specific classes you import depend on which AST nodes you need to examine. All rules must import the first four classes.
Since you will likely be interested in just a few nodes, most rules will
extend the AbstractRule
class rather than implementing the
JavaParserVisitor
interface directly.
Next you need to examine the
JavaDoc
for the JavaParserVisitor interface and determine which nodes you're interested
in examining. The AST Viewer is a utility distributed with PMD that will display the AST
tree for the source code you provide and help you to visualize which nodes to inspect.
You can run it with the etc/astviewer.bat batch file on Windows, or with the
etc/astviewer.sh shell script on UNIX). My rule must listen for when
a package declaration is found, but in cases where no such declaration exists, that
code is never called. The solution is to listen to another node — one
guaranteed to exist and that always follows the package declaration — and check
to see if the String assigned with the package name is still null
. This
will indicate that the package declaration code hadn't been called (or was called
with the value null
). In order to provide a simple example that's easily
understood, my rule only checks Java classes that lack a package declaration. You
could create a more complete implementation by checking to see if interfaces are defined
within a named package.
At this point, I just add a rule violation to the report. The PMD report subsystem will take care of writing it in the desired format.
That's all the Java code you need to write for the rule, but there are a few more steps. You must add information about your new rule to a ruleset file. You could simply modify an existing ruleset file (such as basic.xml), but to avoid polluting the default distribution, I suggest that you create a new one to house your custom rules. Start by creating a new file in the rulesets/ directory using the basic structure of a custom ruleset file (as I explained in the Customizing Rulesets section of this document). Inside the <ruleset> element (below <description>), add a <rule> element. This will define the rule name, class and the message that appears in the report when the rule is violated. You will also give a description of the rule, specify the priority (a number between one and five, with five being the lowest) and show one or more examples of code that would trigger a violation. Here is my complete ruleset file, which I named tomwheeler-custom.xml:
<?xml version="1.0"?> <ruleset name="Basic Rules"> <description> This ruleset defines the custom rules that I have created. </description> <rule name="AllClassesMustHaveAPackage" message="All classes must belong to a named package" class="com.ociweb.example.pmd.AllClassesMustHaveAPackage"> <description> AllClassesMustHaveAPackage rule catches instances in which a class does not have an explicit package declaration. </description> <priority>3</priority> <example> <![CDATA[ // note that there is no package statement here. public Class NotAPackageMember { } ]]> </example> </rule> </ruleset>
The final step is to rebuild the PMD source by changing to the etc/ directory and running the Ant jar target (i.e. ant jar). This will produce a file in the lib/ directory named pmd-1.7.jar (naturally the filename will differ slightly depending on your version of PMD). You will need to then copy that JAR file to your your Ant installation's lib/ directory. If you're using the PMD Eclipse plugin, you must put the JAR file in the plugin classpath by copying it to the plugins/ directory beneath your Eclipse installation, and then restart Eclipse.
There are dozens of Java source analysis tools on the market today, including both free and commercial products. Based on feature lists from vendor Web sites, most of the commercial products offer little more — and in some cases, less — than what you'd find in free/open source implementations. One commercial product that seems to stand far above the rest is ParaSoft's JTest. I have not yet had a chance to use JTest, but have read many positive reviews about it, including its ability to test code automatically through both inspection and execution. However, with prices starting at nearly $3500 per developer, JTest may be beyond the reach of many development budgets.
In contrast, open source tools generally cost nothing and have licenses that encourage sharing the tool among developers, so there is often less of a barrier to implementing them in your projects.
CPD is a tool distributed with PMD that helps to identify duplicate code that was likely copied and pasted between different classes. Covering it in further detail is beyond the scope of this article, but you can find out more from the PMD Web site or the Detecting Duplicate Code with PMD's CPD article at onjava.com.
PMD is similar in concept to Checkstyle, another open code analysis tool that Mark Volkmann reviewed in the November 2002 Java News Brief. While both tools are free and open source (Checkstyle is released under the GNU Lesser General Public License), there are major architectural differences between them. I also find that though there is a lot of overlap between the rulesets of the two programs, Checkstyle seems to be geared more towards adherence to a specific coding style (in terms of style preferences) than on identifying latent defects. Checkstyle comes with many more rules than PMD, but this can be overwhelming when run against a project containing hundreds or thousands of classes. Still, Checkstyle can be a useful tool, and like PMD, it can be integrated into an Ant build or any of several popular IDEs and editors.
FindBugs is a newer project, but looks very promising. It is almost entirely focused on finding performance, logic and security problems, though it has fewer rules than either PMD or Checkstyle. FindBugs is also free software, and like Checkstyle, is released under the GNU LGPL. It is not, in the strictest sense, a source analysis tool because it examines the bytecode of compiled class files. While both PMD and Checkstyle run under JDK 1.3, FindBugs requires JDK 1.4.
To get the most out of PMD, I recommend that you:
I have explained how automated source analysis can help augment a quality improvement strategy for development projects. In addition to having better code with fewer latent defects, this approach can ease the burden of manual code reviews while reducing overall development cost.
I have also presented a detailed look at one of the more prominent static analysis tools — PMD — along with descriptions of CPD, Checkstyle and JTest. You should now be prepared to effectively use source analysis to improve the quality of your projects.