[Title Page] [TOC] [Prev] [Next] [End]

4 Simplified Complexity Calculation


Applying the e - n + 2 formula by hand is tedious and error-prone. Fortunately, there are several easier ways to calculate complexity in practice, ranging from counting decision predicates to using an automated tool.

4.1 Counting predicates

If all decisions are binary and there are p binary decision predicates, v(G) = p + 1. A binary decision predicate appears on the control flow graph as a node with exactly two edges flowing out of it. Starting with one and adding the number of such nodes yields the complexity. This formula is a simple consequence of the complexity definition. A straight-line control flow graph, which has exactly one edge flowing out of each node except the module exit node, has complexity one. Each node with two edges out of it adds one to complexity, since the "e" in the e - n + 2 formula is increased by one while the "n" is unchanged. In Figure 4-1, there are three binary decision nodes (1, 2, and 6), so complexity is 4 by direct application of the p + 1 formula. The original e - n + 2 gives the same answer, albeit with a bit more counting, 12 edges - 10 nodes + 2 = 4. Figure 4-2 has two binary decision predicate nodes (1 and 3), so complexity is 3. Since the decisions in Figure 4-2 come from loops, they are represented differently on the control flow graph, but the same counting technique applies.

Figure 4-1. Module with complexity four.

Figure 4-2. Module with complexity three.

Multiway decisions can be handled by the same reasoning as binary decisions, although there is not such a neat formula. As in the p + 1 formula for binary predicates, start with a complexity value of one and add something to it for each decision node in the control flow graph. The number added is one less than the number of edges out of the decision node. Note that for binary decisions, this number is one, which is consistent with the p + 1 formula. For a three-way decision, add two, and so on. In Figure 4-3, there is a four-way decision, which adds three to complexity, as well as three binary decisions, each of which adds one. Since the module started with one unit of complexity, the calculated complexity becomes 1 + 3 + 3, for a total of 7.

Figure 4-3. Module with complexity 7.

In addition to counting predicates from the flow graph, it is possible to count them directly from the source code. This often provides the easiest way to measure and control complexity during development, since complexity can be measured even before the module is complete. For most programming language constructs, the construct has a direct mapping to the control flow graph, and hence contributes a fixed amount to complexity. However, constructs that appear similar in different languages may not have identical control flow semantics, so caution is advised. For most constructs, the impact is easy to assess. An "if" statement, "while" statement, and so on are binary decisions, and therefore add one to complexity. Boolean operators add either one or nothing to complexity, depending on whether they have short-circuit evaluation semantics that can lead to conditional execution of side-effects. For example, the C "&&" operator adds one to complexity, as does the Ada "and then" operator, because both are defined to use short-circuit evaluation. The Ada "and" operator, on the other hand, does not add to complexity, because it is defined to use the full-evaluation strategy, and therefore does not generate a decision node in the control flow graph.

Figure 4-4 shows a C code module with complexity 6. Starting with 1, each of the two "if" statements add 1, the "while" statement adds 1, and each of the two "&&" operators adds 1, for a total of 6. For reference, Figure 4-5 shows the control flow graph corresponding to Figure 4-4.

Figure 4-4. Code for module with complexity 6.

Figure 4-5. Graph for module with complexity 6.

It is possible, under special circumstances, to generate control flow graphs that do not model the execution semantics of boolean operators in a language. This is known as "suppressing" short-circuit operators or "expanding" full-evaluating operators. When this is done, it is important to understand the implications on the meaning of the metric. For example, flow graphs based on suppressed boolean operators in C can give a good high-level view of control flow for reverse engineering purposes by hiding the details of the encapsulated and often unstructured expression-level control flow. However, this representation should not be used for testing (unless perhaps it is first verified that the short-circuit boolean expressions do not contain any side effects). In any case, the important thing when calculating complexity from source code is to be consistent with the interpretation of language constructs in the flow graph.

Multiway decision constructs present a couple of issues when calculating complexity from source code. As with boolean operators, consideration of the control flow graph representation of the relevant language constructs is the key to successful complexity measurement of source code. An implicit default or fall-through branch, if specified by the language, must be taken into account when calculating complexity. For example, in C there is an implicit default if no default outcome is specified. In that case, the complexity contribution of the "switch" statement is exactly the number of case-labeled statements, which is one less than the total number of edges out of the multiway decision node in the control flow graph. A less frequently occurring issue that has greater impact on complexity is the distinction between "case-labeled statements" and "case labels." When several case labels apply to the same program statement, this is modeled as a single decision outcome edge in the control flow graph, adding one to complexity. It is certainly possible to make a consistent flow graph model in which each individual case label contributes a different decision outcome edge and hence also adds one to complexity, but that is not the typical usage.

Figure 4-6 shows a C code module with complexity 5. Starting with one unit of complexity, the switch statement has three case-labeled statements (although having five total case labels), which, considering the implicit default fall-through, is a four-way decision that contributes three units of complexity. The "if" statement contributes one unit, for a total complexity of five. For reference, Figure 4-7 shows the control flow graph corresponding to Figure 4-6.

Figure 4-6. Code for module with complexity 5.

Figure 4-7. Graph for module with complexity 5.

4.2 Counting flow graph regions

When the flow graph is planar (no edges cross) and divides the plane into R regions (including the infinite region "outside" the graph), the simplified formula for cyclomatic complexity is just R. This follows from Euler's formula, which states that for planar graphs n - e + R = 2. Re-arranging the terms, R = e - n + 2, which is the definition of cyclomatic complexity. Thus, for a planar flow graph, counting the regions gives a quick visual method for determining complexity. Figure 4-8 shows a planar flow graph with complexity 7, with the regions numbered from 1 to 7 to illustrate this counting technique. Region number 1 is the infinite region, and otherwise the regions are labeled in no particular order.

Figure 4-8. Planar graph with complexity 7, regions numbered.

4.3 Use of automated tools

The most efficient and reliable way to determine complexity is through use of an automated tool. Even calculating by hand the complexity of a single module such as that of Figure 4-9 is a daunting prospect, and such modules often come in large groups. With an automated tool, complexity can be determined for hundreds of modules spanning thousands of lines of code in a matter of minutes. When dealing with existing code, automated complexity calculation and control flow graph generation is an enabling technology. However, automation is not a panacea.

Figure 4-9. Module with complexity 77.

The feedback from an automated tool may come too late for effective development of new software. Once the code for a software module (or file, or subsystem) is finished and processed by automated tools, reworking it to reduce complexity is more costly and error-prone than developing the module with complexity in mind from the outset. Awareness of manual techniques for complexity analysis helps design and build good software, rather than deferring complexity-related issues to a later phase of the life cycle. Automated tools are an effective way to confirm and control complexity, but they work best where at least rough estimates of complexity are used during development. In some cases, such as Ada development, designs can be represented and analyzed in the target programming language.



[Title Page] [TOC] [Prev] [Next] [End]