Improving Project Quality with PMD

by
Tom Wheeler, Software Engineer
Object Computing, Inc. (OCI)

Introduction

Writing and maintaining complicated software is a difficult task and every programmer inadvertently makes mistakes. Usually these are minor typographical errors that will be caught at compile time, but others may remain undetected until the system is in production. In the most extreme cases, the bug will cause system failure as was the case in the massive blackout in the Northeast last summer.

In addition to the consequences of system malfunction, software defects have a substantial financial cost. A 2003 report from the National Institute of Standards and Technology (NIST) calculated the total annual cost of bugs at nearly $60 billion. The report also confirmed what most developers already know: the difficulty and cost of fixing a problem grows dramatically throughout the development cycle. Though the report concluded that it is not practical to locate and remove all defects from an application, it stated that nearly 40% could be eliminated by an improved inspection and testing process.

Two Methods for Improving Quality

Two common methods for improving software quality are code reviews and automated testing. The goal of both is to detect mistakes, but the distinction between them is how this is achieved. Automated testing attempts to expose problems by executing the code, while reviews rely on "another set of eyes" to verify that it's correct.

Problems can arise when we get too involved with the code to be objective or to consider unusual paths of execution. Certainly every programmer has at one time spent hours trying to track down a problem only to have someone else spot it immediately. Code reviews can be an effective technique for finding errors, but in my experience, useful code reviews are surprisingly uncommon. In some places they're deemed an unwarranted impediment to finishing a project, while in others they degenerate into arguments about trivial issues such as whitespace and brace placement. My explanation for this is that the best candidates for leading the reviews — senior development talent — are constantly in short supply and high demand. Perhaps more importantly, reading and completely understanding someone else's code can be very tedious and time consuming.

Given that code reviews can be effective at locating software defects during development when they are most easily and inexpensively fixed, it makes sense to automate this process as much as possible. This allows for earlier and more frequent inspections, while letting the programmer eliminate the more obvious problems before meeting with others to review the code. A report released by The Standish Group confirms this, finding that automated code inspection reduced the number of people needed for manual code reviews by 50%.

There are several available code analysis tools, but one of the better ones is PMD.

About PMD

PMD is a static source code analysis tool, meaning that it analyzes source code lexically rather than by executing it as would be done in a unit test. It was originally developed to improve the Cougaar project, a DARPA initiative that developed a framework for distributed, agent-based applications. PMD is both free and open source; it is released under a BSD-style license that allows you to use, modify or distribute it, so long as the copyright notice is left intact.

Though PMD is only about two years old, it is under active development with more than 100 programmers working on the project. The SourceForge project page shows that it has been downloaded about 70,000 times.

PMD can be executed at the command line, but I think more benefit is derived from integrating it into an IDE or Ant build, so my article will focus on these techniques.

Rule Types

The latest version at the time of this writing, 1.7, comes with more than 80 built-in rules. The tool allows you to pick the rules you wish to use, and also to define new rules using either Java classes or XPath statements. The PMD Web site explains the predefined rules in detail, so I won't try to duplicate that information here. PMD separates the rules into 14 categories, such as naming, import statements and unused code. To use the tool effectively, I think it's helpful to also think of each rule as belonging to four broad groups:

Style Rules

This group locates code that deviates from generally accepted coding standards, such as Sun's Code Conventions for the Java Programming Language. Since the objective of PMD is to locate potential defects rather than issues of preference like whitespace and brace placement, the style rules essentially check for more serious style violations that have the potential to confuse other developers or allow for future bugs. Examples of these rules include IfStmtsMustUseBraces, MethodNamingConventions and AvoidDollarSigns.

Useless Code

These rules identify code that serves no effective purpose. Such code might simply be an artifact of refactoring or it might signal a mistake. Since production code should be concise, these violations should definitely be examined, and in most cases, removed altogether. Examples of these rules include EmptyIfStmt, EmptyTryBlock and UnusedLocalVariable.

Best Practices

This group can help to uncover code that violates basic design and coding strategies. While these do not always indicate a logic flaw, they generally show areas which will be unnecessarily inefficient or difficult to maintain. Examples of these rules include UseSingletonRule, StringInstantiation and SignatureDeclareThrowsException.

Likely Defects

These rules identify code that is almost certainly in error, or at least very likely to cause undesired behavior. You should examine such violations very carefully and either fix the problems or ensure that the code follows your intentions. Examples of such rules include EmptyCatchBlock, OverrideBothEqualsAndHashcodeRule and ProperCloneImplementationRule.

How to Use PMD

How to Use PMD with Ant

PMD requires JDK version 1.3 or higher. I tested the instructions with Ant 1.6.1, though presumably it should work with any previous (yet reasonably modern) version of Ant. Follow these steps to set up PMD as part of your build process:

First, download PMD from SourceForge. There are many packages listed, but you will only need the latest binary version of the pmd package.
Unzip the package to the location of your choice. The lib/ subdirectory will contain several JAR files. Copy these files:
- lib/jaxen-core-1.0-fcs.jar
- lib/saxpath-1.0-fcs.jar
- lib/pmd-1.7.jar
to the lib/ directory beneath your Ant installation (i.e. %ANT_HOME%\lib on Windows or $ANT_HOME/lib on UNIX).

Modify the Ant build file for your project to add this new taskdef and target:

<taskdef name="pmd" classname="net.sourceforge.pmd.ant.PMDTask"/>				
<target name="pmd">				    
  <!-- 
    NOTE: include a comma-delimited list of ruleset file paths 
    here.  The paths (assuming you are using the pre-defined PMD 
    rulesets) should all begin with rulesets/ since this is where 
    they are stored in the PMD JAR file.
  -->
  <pmd rulesetfiles="rulesets/strings.xml,rulesets/basic.xml">
    <formatter type="html" toFile="pmd_report.html"/>
    <!-- 
      NOTE: change src.dir to the property name you have defined for
      your source code directory.
    -->
    <fileset dir="${src.dir}">
        <include name="**/*.java"/>
    </fileset>
  </pmd>
</target>

Change to the directory containing your Ant buildfile and run the pmd target (i.e. ant pmd).

The Ant task will create file named pmd_report.html in the base directory of your project. Naturally, you can change the path and filename to suit your preferences. The file paths in this report are all absolute, and consequently they take up a lot of space in the report, making them difficult to read or print. You can make the report show relative file paths by adding the attribute shortFilenames="yes" to the <pmd> element.

In addition to the basic HTML report shown above, you can create reports in several formats, including XML, ASCII text and CSV. You can produce any number of these formats in the same build by specifying multiple <formatter> elements. There is also an alternate HTML format (summaryhtml) that includes a table with rule violation counts.

Customizing Rulesets

While you could define a custom ruleset by copying and pasting relevant sections from pre-defined ruleset files, there is a better way. You just need to create a new ruleset file with the basic structure:

<?xml version="1.0"?>

<ruleset name="customruleset">
  <description>
    A description of your custom ruleset goes here.
  </description>
</ruleset>

Next, add references to rules from a pre-defined ruleset file (rule references are children of the <ruleset> element). You can include entire rulesets with the following syntax:

<rule ref="rulesets/design.xml"/>

You can import them one at a time by specifying the path to the ruleset file followed by a slash and then the name of the rule, for example:

<!--  This is how you include individual rules -->
<rule ref="rulesets/basic.xml/EmptyTryBlock"/>
<rule ref="rulesets/design.xml/UseSingletonRule"/>

Finally, you can exclude specific rules from imported rulesets:

<rule ref="rulesets/naming.xml">
    <exclude name="ShortVariable"/>
    <exclude name="LongVariable"/>
</rule>

IDE Integration

PMD integrates with several Java IDEs and editors, including Eclipse, IDEA, NetBeans, JBuilder, jEdit and Emacs. Unfortunately there is no common plugin architecture across these applications, and instructions for installing and configuring PMD is different for each. In this article, I will describe how to get PMD working with one of the most popular IDEs: Eclipse. I have verified these instructions with Eclipse 3.0M7 on RedHat Linux 9, Mac OS X (Panther) and Windows 2000.

The first step to installing the PMD plugin is to download the plugin from SourceForge. It doesn't matter where you save the ZIP file.

Start Eclipse, click the Help menu and then choose Software Updates->Find and Install... item.
Select the Search for new features to install radio button and then click the Next button.
Click the Add Archived Site... button and locate the Eclipse plugin ZIP file you downloaded earlier.
Select the checkbox for Eclipse PMD plugin ZIP file in the Sites to include in search list and then click the Next button.
Select the checkbox for PMD for Eclipse 3 in the the Select features to install list and then click the Next button.
Read and agree to the license terms and then click the Next button.
Choose the directory in which you wish to install the plugin and then click the Next button.
Ignore the warning message about the plugin being unsigned then click the Next button. Unfortunately PMD doesn't provide a method (such as PGP or MD5 checksum) to verify the integrity of the plugin.
Verify your Eclipse installation directory and then click the Finish button. You must restart Eclipse before the plugin can be activated.

You will see the PMD welcome page once Eclipse has restarted, but beyond that, it is not immediately obvious how to use the plugin. I think the easiest way to begin is by checking code manually. To do this, right-click (or ctrl-click on Mac OS X) on an icon in the package navigator view and choose PMD->Check code with PMD, as shown in the following screenshot.

Screenshot showing how to check code manually

You can expand or limit the set of code that PMD examines by selecting different icons when you right-click; you can choose an entire project, a package or a single class (and also groups of projects, packages or classes).

After running PMD on your source the first time, you won't see the list of violations. Earlier versions of the Eclipse plugins displayed the output directly in the tasks list, but since it was clumsy to navigate a list that combined the two, it has since been moved to its own view. To enable the view, click the Window menu and then choose Show View->Other.... A dialog box will appear, and you must expand the PMD group, select PMD Violations and then click the OK button.

The PMD view sorts the rules in priority order, so that the most likely problems are displayed first. There are also five color-coded buttons in the upper-right of the view; each has a number of one through five and can be used to filter rule violations by priority. The PMD violations view is visible at the bottom of this screenshot.

Screenshot showing the PMD violations view in Eclipse

When you check code manually, the items in the PMD rule violation view don't go away as soon as you fix each problem. You should keep the list up-to-date by periodically repeating the process for checking your code with PMD.

You can also enable PMD to automatically check your source code as you make changes. To do this, select the Project->Properties menu item, choose the PMD item in the dialog and then check the Enable PMD box. After you close the dialog box, click OK when prompted to rebuild the project. Note that enabling PMD to review code automatically can really slow down Eclipse, especially for larger projects.

Sometimes PMD will flag some code as a violation even though you intended to write it that way. I find this to be the case most often with naming rules like ShortVariable — there are times when a short name is best. I don't want to disable the rule, but I do want to tell PMD that a specific violation was intentional so that it doesn't warn me about it again. You can do this by right-clicking on a rule violation and choosing Mark review. This will add a specially-formatted comment to the code that notifies subsequent PMD reviews to ignore that specific instance of the violation.

Finally, you can change the active rules, priorities and parameters by selecting the Window menu, choosing the Preferences item and then selecting PMD->Rules Configuration. The following screenshot shows the Preferences window.

Screenshot showing the PMD preferences window in Eclipse

How to Create a Custom Rule

You can create custom PMD rules in two ways: Java classes and XPath. Both methods are described on the PMD Web site, but I will give an example of how to create a rule using a Java class.

In order to focus on how to write the rule rather than the complexity of the rule itself, my example will simply identify classes that have no package. Classes in the null package can create import problems and are usually the result of either a novice developer or a quick-and-dirty hack that should be refactored or eliminated.

The easiest way to get started with creating a new class is to download the source package for PMD. Once you unpack it to your development workspace, you will see a number of familiar directories, such as src/ and lib/. You can build PMD using Maven, but since Ant is more prevalent, I will use it to build my examples.

There is an Ant build.xml file beneath the etc/ directory. You may need to modify the includes attribute in the jar target so that it will include the package for your rule class. The default configuration will only put classes in the net.sourceforge.pmd package into the JAR file.

Next, you need to create the directory structure beneath src/ to hold your class. My example will be a class named AllClassesMustHaveAPackage in the com.ociweb.example.pmd package.

The rules are constructed using the Visitor pattern. When PMD runs, it uses a parser created by JavaCC to break the source code down into a parse tree called the Abstract Syntax Tree (AST). Your ruleset code will contain methods that are called as the parser hits the relevant nodes.

My example class is shown below, followed by an explanation of the code.

package com.ociweb.example.pmd;

import net.sourceforge.pmd.AbstractRule;
import net.sourceforge.pmd.RuleContext;
import net.sourceforge.pmd.Report;
import net.sourceforge.pmd.ast.ASTName;
import net.sourceforge.pmd.ast.ASTPackageDeclaration;
import net.sourceforge.pmd.ast.ASTClassDeclaration;

/**
 * This is a basic PMD rule that will detect classes that are
 * not defined to exist within a named package.
 *
 * Since this class will check for empty or missing package 
 * statements, it was necessary to import ASTPackageDeclaration, 
 * though different rules will need to import different classes.  
 * Also note that the visit() method is defined in the 
 * JavaParserVisitor interface.  It is implemented by the superclass
 * of AbstractRule, which is named JavaParserVisitorAdapter. 
 */
public class AllClassesMustHaveAPackage extends AbstractRule {

   String packageName;

   public Object visit(ASTPackageDeclaration node, Object data) {
      // this is only called if there is a package, so we need
      // to check downstream that it was set.
      packageName = ((ASTName) node.jjtGetChild(0)).getImage();

      return data;
   }

   // I chose to listen for class declarations because we are 
   // guaranteed to have it, and it will always be called after 
   // the package declaration.
   public Object visit(ASTClassDeclaration node, Object data) {
      if (packageName == null) {
         // Messages printed to standard output will be shown when
         // PMD is run from the command line or as an Ant target.
         System.out.println("Found a class with no package declaration.");
         RuleContext ctx = (RuleContext) data;
      
         // now add it to the report
         ctx.getReport().addRuleViolation(
              createRuleViolation(ctx, node.getBeginLine()));
      }

      return data;
   }
}

First, I define the package for my rule class and then import the classes I'll need to use. As noted in the comments, the specific classes you import depend on which AST nodes you need to examine. All rules must import the first four classes.

Since you will likely be interested in just a few nodes, most rules will extend the AbstractRule class rather than implementing the JavaParserVisitor interface directly.

Next you need to examine the JavaDoc for the JavaParserVisitor interface and determine which nodes you're interested in examining. The AST Viewer is a utility distributed with PMD that will display the AST tree for the source code you provide and help you to visualize which nodes to inspect. You can run it with the etc/astviewer.bat batch file on Windows, or with the etc/astviewer.sh shell script on UNIX). My rule must listen for when a package declaration is found, but in cases where no such declaration exists, that code is never called. The solution is to listen to another node — one guaranteed to exist and that always follows the package declaration — and check to see if the String assigned with the package name is still null. This will indicate that the package declaration code hadn't been called (or was called with the value null). In order to provide a simple example that's easily understood, my rule only checks Java classes that lack a package declaration. You could create a more complete implementation by checking to see if interfaces are defined within a named package.

At this point, I just add a rule violation to the report. The PMD report subsystem will take care of writing it in the desired format.

That's all the Java code you need to write for the rule, but there are a few more steps. You must add information about your new rule to a ruleset file. You could simply modify an existing ruleset file (such as basic.xml), but to avoid polluting the default distribution, I suggest that you create a new one to house your custom rules. Start by creating a new file in the rulesets/ directory using the basic structure of a custom ruleset file (as I explained in the Customizing Rulesets section of this document). Inside the <ruleset> element (below <description>), add a <rule> element. This will define the rule name, class and the message that appears in the report when the rule is violated. You will also give a description of the rule, specify the priority (a number between one and five, with five being the lowest) and show one or more examples of code that would trigger a violation. Here is my complete ruleset file, which I named tomwheeler-custom.xml:

<?xml version="1.0"?>

<ruleset name="Basic Rules">
  <description>
      This ruleset defines the custom rules that I have created.
  </description>

  <rule name="AllClassesMustHaveAPackage"
        message="All classes must belong to a named package"
        class="com.ociweb.example.pmd.AllClassesMustHaveAPackage">
  
    <description>
      AllClassesMustHaveAPackage rule catches instances in which a 
      class does not have an explicit package declaration.
    </description>
  
    <priority>3</priority>

    <example>
      <![CDATA[
        // note that there is no package statement here.
        public Class NotAPackageMember {
        }
      ]]>
    </example>
  
  </rule>

</ruleset>

The final step is to rebuild the PMD source by changing to the etc/ directory and running the Ant jar target (i.e. ant jar). This will produce a file in the lib/ directory named pmd-1.7.jar (naturally the filename will differ slightly depending on your version of PMD). You will need to then copy that JAR file to your your Ant installation's lib/ directory. If you're using the PMD Eclipse plugin, you must put the JAR file in the plugin classpath by copying it to the plugins/ directory beneath your Eclipse installation, and then restart Eclipse.

Related Tools

There are dozens of Java source analysis tools on the market today, including both free and commercial products. Based on feature lists from vendor Web sites, most of the commercial products offer little more — and in some cases, less — than what you'd find in free/open source implementations. One commercial product that seems to stand far above the rest is ParaSoft's JTest. I have not yet had a chance to use JTest, but have read many positive reviews about it, including its ability to test code automatically through both inspection and execution. However, with prices starting at nearly $3500 per developer, JTest may be beyond the reach of many development budgets.

In contrast, open source tools generally cost nothing and have licenses that encourage sharing the tool among developers, so there is often less of a barrier to implementing them in your projects.

CPD

CPD is a tool distributed with PMD that helps to identify duplicate code that was likely copied and pasted between different classes. Covering it in further detail is beyond the scope of this article, but you can find out more from the PMD Web site or the Detecting Duplicate Code with PMD's CPD article at onjava.com.

Checkstyle

PMD is similar in concept to Checkstyle, another open code analysis tool that Mark Volkmann reviewed in the November 2002 Java News Brief. While both tools are free and open source (Checkstyle is released under the GNU Lesser General Public License), there are major architectural differences between them. I also find that though there is a lot of overlap between the rulesets of the two programs, Checkstyle seems to be geared more towards adherence to a specific coding style (in terms of style preferences) than on identifying latent defects. Checkstyle comes with many more rules than PMD, but this can be overwhelming when run against a project containing hundreds or thousands of classes. Still, Checkstyle can be a useful tool, and like PMD, it can be integrated into an Ant build or any of several popular IDEs and editors.

FindBugs

FindBugs is a newer project, but looks very promising. It is almost entirely focused on finding performance, logic and security problems, though it has fewer rules than either PMD or Checkstyle. FindBugs is also free software, and like Checkstyle, is released under the GNU LGPL. It is not, in the strictest sense, a source analysis tool because it examines the bytecode of compiled class files. While both PMD and Checkstyle run under JDK 1.3, FindBugs requires JDK 1.4.

My Advice for Using PMD Effectively

To get the most out of PMD, I recommend that you:

Use PMD with both your IDE and automated builds, but use a different strategy for each. The configuration of the PMD plugin in your IDE should be very strict in order to flag potential mistakes as soon as they are introduced. Remember that the Eclipse plugin lets you mark intentional violations so that they're ignored in the future.
Create and use a custom ruleset for your automated builds that flags only the more important errors. With a large project, you are otherwise liable to lose the real problems among all the minor infractions. It might help to use a scheduler program, such as cron, at or CruiseControl, to e-mail the reports to you so that reading it becomes routine.
Consider using the Ant task parameters to make major rule violations (such as the Likely Problems group of rules that I described earlier) cause the build to fail. Since developers generally take failed builds seriously, this may help to convince them that there are problems that need to be addressed. Beware that you don't do this unnecessarily or you may actually encourage a lax attitude about broken builds.
Create your own custom rules to enforce best practices for your project. The default rules that come with PMD are very general. Your organization may have its own best practices that would benefit from more specific rules.
Use other analysis tools, like Checkstyle or FindBugs, in addition to PMD to increase the likelihood that you'll catch any mistake.
Recognize that automated code analysis should improve — not replace — manual code reviews. The focus should be on fixing the more obvious and mundane problems so that the reviewers can concentrate on finding problems (such as incorrect business logic) that a machine can't.

Summary

I have explained how automated source analysis can help augment a quality improvement strategy for development projects. In addition to having better code with fewer latent defects, this approach can ease the burden of manual code reviews while reducing overall development cost.

I have also presented a detailed look at one of the more prominent static analysis tools — PMD — along with descriptions of CPD, Checkstyle and JTest. You should now be prepared to effectively use source analysis to improve the quality of your projects.

References

Software Bug Linked to Blackout, CNN: http://www.cnn.com/2004/US/Northeast/02/13/blackout.ap/
The Economic Impacts of Inadequate Infrastructure for Software Testing, NIST: http://www.nist.gov/director/prog-ofc/report02-3.pdf
Show Me the Money: Return on Inspection, The Standish Group: http://www.standishgroup.com/sample_research/PDFpages/show_me.pdf
PMD: http://pmd.sourceforge.net/
Cognitive Agent Architecture (Cougaar): http://cougaar.org/
PMD SourceForge Project Page: http://sourceforge.net/projects/pmd/
Apache Ant: http://ant.apache.org/
Code Conventions for the Java Programming Language, Sun Microsystems: http://java.sun.com/docs/codeconv/
Eclipse: http://www.eclipse.org/
Maven: http://maven.apache.org/
JavaCC: http://javacc.dev.java.net/
Checkstyle: http://checkstyle.sourceforge.net/
Parasoft JTest: http://www.parasoft.com/jsp/products/home.jsp?product=Jtest&itemId=12
FindBugs: http://findbugs.sourceforge.net/
Static Analysis with PMD, OnJava.com: http://www.onjava.com/pub/a/onjava/2003/02/12/static_analysis.html
Custom PMD Rules, OnJava.com: http://www.onjava.com/pub/a/onjava/2003/04/09/pmd_rules.html
Detecting Duplicate Code with PMD's CPD, OnJava.com: http://www.onjava.com/pub/a/onjava/2003/03/12/pmd_cpd.html