mail to Amy Law   mail toAdam Geras  mail to Joseph Man

SENG 621 Software Process Management

Web Document Group Exercise on
Software Testing

Prepared by Amy Law, Adam Geras, and Joseph Man


Table of Contents
  1. Abstract
  2. Introduction
  3. Basic Concepts and Definitions
  4. Test Levels
  5. Test Techniques
  6. Test-Related Measures

  7. Managing the Test Process
  8. Conclusion
  9. Group Presentation
  10. References  


Software testing is a critical aspect of software process management, a point emphasized by the fact that testing is one of the ten "knowledge areas" in the Software Engineering Body of Knowledge (SWEBOK).  In this paper we describe software testing using the structure of the SWEBOK chapter on testing as the backdrop for our discussion. We begin the discussion by outlining the basic concepts and definitions of testing, and proceed into the details of software testing levels, test techniques, test-related measures, and test process management. Organizations that want to improve their testing capability may turn to the Capability Maturity Model (CMM) as the means for managing their improvement. Testing falls into the "Software Product Engineering" key practice area (CMM Level 3). As an exercise, we present the CMM assessment questions related to this KPA and present them in a testing perspective, providing the conditions that organizations would have to meet in order to answer "yes" to the question as they relate to testing. We conclude the discussion by reviewing the means of managing the testing process, including the advantages and disadvantages of having a separate testing group in the software organization.



Software testing is an essential part of improving software quality and reducing the cost of owning the software.  The reason it has such a central role lies in its basic purpose: to detects faults in the application prior to deployment. It saves on total cost of ownership of the software because if the faults were left undetected, then the organization would not only suffer through the resulting failures, but also pay more for fixing those same faults. Faults are less expensive to fix earlier in the software development life cycle. Testing is also central to having customers "accept" the application and approve it for use in production. This means that testing plays a role in the endgame, that is, determining exactly when the project to build the application finishes and the work of maintaining the application begins. "Playing the endgame," or finishing a project, is a topic near and dear to every software engineer's heart, since it is a primary source of job satisfaction and permits them to move on the next exciting project. At the same time, testing receives a certain amount of skepticism from the developer community at large, being perceived as drab, tedious work that carries little glory. Testing is one of the ten "knowledge areas" in the Software Engineering Body of Knowledge (SWEBOK), further emphasizing its criticality.

Testing isn't an easy process and doing it well requires discipline, experience, and patience. There are a number of testing management issues. First, note that testing can only show the presence of bugs and not their absence [Humphrey, 1989]. This means that no matter how much testing we do, we can never really guarantee that a specified application is free of defects. The possible combinations of the input space and the execution paths are simply too numerous to allow us to perform exhaustive testing, even on relatively simple programs. So the trick is to know when to stop testing, while at the same time keeping the likelihood of having the application fail post-deployment to under the target reliability objective. Software inspections provide a complementary mechanism for finding defects prior to application deployment, and most importantly prior to testing. This fundamental concept is important for software engineers to realize: they can further strengthen their ability to deliver quality software using a combination of reviews/inspections and effective testing.

The structure of this document mirrors the structure of the SWEBOK chapter on testing. The first section describes basic testing concepts and definitions, important given that many software organizations that do attempt to formalize their own testing efforts also choose to invent their own terms, or use nuances that other organizations might not. The second section describes the primary testing levels: unit testing, integration testing, and system testing. These levels are strongly related to one another, each applied at different stages of the software development life cycle. Teams may also choose to use the three test levels in conjunction with one another, under the guise of "continuous integration," a concept that is gaining industry attention since the rise of extreme programming (XP).

The third section describes test techniques as representative of the way that the tester or testing team comes up with test cases. The SWEBOK provides two perspectives on this: a taxonomy of the source of test cases, and a separate taxonomy on the degree of implementation knowledge. The two perspectives overlap, meaning that some techniques such as equivalence partitioning and boundary value testing show up in both decompositions. To supplement this discussion, we introduce testing methods in the context of the Capability Maturity Model in this section, identifying testing as falling under the "Software Product Engineering" Key Practice Area (KPA) in Capability Maturity Model (CMM) Level 3.  Six questions associated with this KPA are discussed with the intention to clarify the significance of test techniques to software process improvement.

The fourth section describes test-related measures. These metrics are slightly more accessible than other metrics given that they used by many teams to determine when they are finished their project, that is, when they can release. Some of these measures are used for evaluating the product under test, while others are used for evaluating the tests performed. Our contribution to this section is a third category of measurements, those measurements used for managing and improving the test process. Our fifth and final section is all about managing the test process, including the advantages and disadvantages of having a separate test group in the software organization.

Better approaches to testing provide us with better chances of delivering systems successfully. With the advent of the test-first practices in XP, testing is also gaining a higher profile in the minds of developers, an impressive consideration given the competition from new programming languages and environments. Gaining consensus on software testing concepts through SWEBOK is another important step towards a discipline for software engineering.


I. Basic Concepts and Definitions

What is Software Testing?

Different software organizations interpret "software testing" differently.   Some people relate software testing to software quality while others associate it with software inspection.  Based on our textbook [Humphrey, 1989] and our research, software testing is fundamental to software process management.  It is one of the ten "knowledge areas" in the Software Engineering Body of Knowledge [SWEBOK, 2001].  The purpose of testing is to find faults during the execution of the system under test.  Testing consists of a finite set of test cases in controlled conditions that include both normal and abnormal situations, documented into a test plan.  The test plan describes testing steps, commands, actions, input data, and expected behaviors.  Table 1 illustrates a typical test plan.

Table 1:  Sample Test Plan [Crispin, 2002, page 43]
Step Command / URL Action Input Data Expected Result
1 localhost:80/login.jsp Login to secured system readAccount System displays confirmation page.
2 localhost:80/search.jsp Enter search data XP book System displays search result page
3 localhost:80/search.jsp Repeat step 2 CMM book System refreshes and displays the search result page
4 ... ... ... ...

Using normal conditions as the test input, we ask “What will happen to the software if we enter correct input?”  Using abnormal conditions as the test input, we ask “what will happen to the software when we enter invalid input?”  Testing should purposely make things go wrong.  It determines if things happen when they shouldn't and things don't happen when they should. Testing is done to find information, and the team or its management make critical decisions about the project or the product based on that information [Kaner et al, 2002].


Why do we need software testing?

By identifying defects and problems during the software testing process, the testing process assists the team to improve the quality of the product under test.  The problem that testing attempts to address is fundamentally an economic one; it is more expensive to fix defects after deploying the product than it is to fix them before deployment.  Test planning should start as early as in requirements gathering stage, since you can tell a lot about a product by thinking about how you will test it [Beizer, 1983].  Teams should refine the test plan as requirements change and execute it continuously as the development proceeds.  As Ron Jeffries described on agile testing,

"Every minute between when the programmer thinks the story is done, and when she has run the acceptance test and PROVEN that the story is done, is a minute the project is running out of control." [Crispin, 2002, page 50]

[Kaner et al, 2002] describe testers as "providing the headlights" for the project team, giving the team information about where they are and what pitfalls they might face in the future. This is a useful characterization since there are many projects that suffer from lack of effective testing often seem "lost."


How can a fault escape testing?

"Program testing can be used to show the presence of bugs, but never their absence!" [Humphrey, 1989, page 191]

The product under development will still contain faults after software testing.  A number of common areas that software testing cannot cover is listed below.

  • System objectives are not specific or measurable, so the team is unable to build an effective test plan.  For example, an objective of a system is to accept a numeric value from the user.  Nevertheless, the requirement does not specify the maximum tolerance size of the numeric value.  If the limit to the numeric value is only a three decimal digit and the business clients enter the input as 12345.6789, then the system might crash, or at least provide an unexpected result (also a failure).
  • Actual use of the program has changed from the original software requirements specification, but the developers are not informed about the changes.  For example, a program requirement was initially designed to prompt for a string and then the program requirement was changed such that it is now designed to prompt for a digit.  This change must be communicated with the programmer.  Otherwise, the system would react incorrectly.
  • The operating system may have been upgraded and never considered during testing.  For example, a technician installing a new operating system might inadvertently include an upgraded version of an important component such as an Oracle database driver.  The new driver may not be backward-compatible, so it breaks the software. 
  • Using inexperienced testers.  A tester's experience and intuition is important enough to be considered one of the test techniques in the SWEBOK [Bertolino, 2001].
  • Due to the time constraints, testers can only execute a finite number of test cases.  Therefore, there is a chance that they did not test all possible branches.  In this scenario, the clients discover a problem in the untested cases.  Usually testing stops when time assigned to the testers is used up.  This leads to additional time and efforts fixing the problems after deployment.
  • Initial assumptions or design constraints are misunderstood.  For example, a system requirement may originally define that only one client would bulk load a file at one time.  However, the users misinterpret the requirements and assume that two different clients can up-load files simultaneously.  This may lead to database corruption and a system failure.


Stopping Software Testing

There are too many possible execution paths and input variations to permit testers to perform exhaustive tests [Bertolino, 2001].  As a result, testers have to determine the most appropriate time to stop testing.  Figure 1 illustrates the four key considerations that testers balance.

Figure 1:  Software Testing Trade Off

Before a project team decides to stop testing, they should ask themselves four questions:  "What is the probability of finding more problems in the system?", "What is the marginal costs of doing more testing to detect these problems?", "What is the probability of users encountering these problems?", and "What is the resulting impact of these problems to the users?"  Reliability growth modeling is one way to predict when you can stop testing [Eberlein, 2002].  This involves determining the most appropriate growth model, then using it to predict when the product might achieve a specified reliability objective.

The team may also express a reliability objective in less formal terms.  A simple technique that many teams use is to assign a severity level failures.  For example, an organization may use the following severity levels:

Severity 1:  A primary component failure has had a negative impact on business operations in all departments.

Severity 2:  A secondary component failure has had a negative impact on business operations in internal departments.

Severity 3:  System performance has deteriorated although there is no specific failure.

This may lead to differing reliability objectives for different components.  The primary components that may cause a Severity 1 failure would have to achieve a more stringent reliability objective before going into production. Some organizations overlook software faults if the resulting failure does not impact their business.  For example, an organization may have a score card tracking how well a system performs.  If the system is functioning, and it provides services for the business, they have a "Perfect Day".  If the system suffers a failure and interrupts business operations, then they don't achieve the "Perfect Day".  If the system suffers a failure but none of the system users detect the failure, then the organization still considers it a "Perfect Day" since the software failure didn't impact the business.

Knowing when to stop testing is highly context-sensitive, meaning that different organizations will have different guidelines, published or unpublished.  It is dependent on the organizational culture,  the nature of the potential problem, and the impact of any potential problems to the business operation.


Software Testing and Software Inspection

Software inspection is a static analysis of the program under test.  There are a number of types of inspections, from formal inspections using multi-part forms to informal peer reviews.  Humphrey's Personal Software Process (PSP) also favors an individual developer using software inspection [Humphrey, 1995].  Note that while software testing is a dynamic verification of the program behavior, inspection is a "static analysis," meaning that it involves looking at the source code. This static vs. dynamic characterization of inspections and testing means that they are complementary.

For example, the Netscape Navigator web browser has problems with "nested tables" in HTML documents and with "switch statements" in JavaScript functions.  Microsoft's Internet Explorer web browser does not have these problems.  The only way to identify these problems is to execute the program under these two browsers, no inspection would uncover this behavior (without advance knowledge of the defects in Netscape Navigator).


II. Test Levels

The SWEBOK decomposes test levels into two primary groups, one identified by the target of the test, and the other identified by the objectives of testing [Bertolino, 2001].  The "target of the test" category decomposes into the three familiar levels of unit testing, integration testing, and system testing. Software organizations may choose to refer to these three levels by some other name, however, they are required for any non-trivial software project. The second decomposition contains test levels that are more customizable, meaning that any given project may not have each of them. Teams will usually determine which of these levels they require based on the product and project characteristics: acceptance/qualification testing, certification testing, installation testing, Alpha/Beta testing, usability testing, etc. One valuable contribution of the SWEBOK is articulating the distinction between "test levels" and "test techniques" since the ambiguity in other test publications may cause confusion.


Unit Testing

Unit testing is used to verify a single program or a small section of a program.  It is typically done in an isolated test environment.  Unit testing can be considered “white box” testing since the tester knows the details of the implementation and uses that knowledge to come up with test cases.  Unit testing checks for the successful execution of all internal execution paths. For example, if a program has a specified loop or branch control structure, the tester should execute test cases that cause the program to run all possible paths that result from the control structure.

Another example of unit testing involved the use of Junit.  Junit is a regression testing framework for the Java™ programming language. The idea is that Java developers first implement a program to perform the system requirement.  Then, they implement a separate Junit test class that tests the production program.  Running the tests within the framework is the means of automating the test execution.   Therefore, they can incrementally build test cases to measure the progress.  In addition, it is much more fun to program than perform manual testing.

However, the cost of automation is to be considered before committing to test automation completely.  The considerations include the following:

  • Developers require time to create, test, and maintain the automated test scripts
  • Requirement changes can lead to the total rewrite of the automated test scripts
  • Developers may need to download and maintain the Junit tools
  • Automated scripts may not catch bugs that can be caught by manual testing

There is no right answer to use manual testing or automated testing.  Instead, a balance between manual testing and automated testing is probably optimal, although this is being challenged on several fronts in the trade magazines. Ron Jeffries, for example, states that any test that is not automated is useless.


Integration Testing

Integration testing is used to verify the interaction between system components.  In a large development project, it is common to break the system into many components.  If this is the design strategy, then developers ensure that the individual program units pass their tests and then combine them into larger components. Then they or the testing team may conduct more tests to ensure that once the individual units are put together, they still operate as expected. In addition, the testing team can take several components that work together and test their interaction against expected behavior. For example, if the team uses a standard layered architecture, then the database access component might be testing first independently, then together with the business logic layer, then lastly together with both the presentation layer and the business logic layer.

Two major types of integration testing are top-down and bottom-up.  Top-down testing is a prototyping approach.  It has a skeleton test model building from the top.  Each new component is added to the skeleton as the development proceeds.  It can be compared to a tree structure which the testing begins at the root level moving incrementally to different branches.  On the other hand, bottom-up testing evaluates components individually using drivers specific to the components.  The testing is repeated as more components are integrated.  It can be compared to the tree structure where the testing begins at the branch level and moving incrementally up to the root.  Both of them are continuous and evolved activities.

Another approach is the big bang testing.  The components are individually tested.  Then, they are assembled into one system and tested at one time.  This is the least effective approach because incremental integration testing strategy is preferred.


System Testing

System testing is used to verify and validate the behaviors of the whole system against the original system objectives. Verification is an attempt to uncover problems by executing a program in a simulated setting.  Validation is an attempt to uncover problems by executing a program in a real setting.  This is also the time to test the system with any external interfaces, such as hardware devices and operating environment.   Furthermore, system testing focuses to test non-functional requirements including

  • Compatibility:  To determine the compatibility level of system hardware and software
  • Configuration:  To determine the behavior of the system when incorrect configuration is made
  • Performance:  To determine the behavior of the system at the peak utilization
  • Recovery:  To determine the behavior of the system after the occurrence of an error
  • Reliability:  To measure the reliability while the system is operating with a typical work load
  • Security:  To determine how the system would behave when unauthorized users break in
  • Volume:  To determine the level of continuous heavy load at which the system will fail


Relationship Between Test Levels

The relationship between the key test levels is illustrated in Figure 2.  Initially, developers would perform separate unit testing to each individual program.  Diagram 3 has nine separate units and each one is to be tested by the developer.  The developers would investigate all potential paths and design in this stage.  Potentially, the unit testing can be accomplished using some automated testing tools, such as Junit.

Then, individual units are integrated to form a functional component.  Diagram shows that three units are combined together to form a functional component.  Each component is tested by the developers and designated testers.  This stage is incremental and continuous so that testers can take the advantage on testing in a simpler environment.  The functional components are then combined into a larger component.  Integration testing is then performed on the larger component.

Finally, all integrated components are combined together to form a whole system.  In this stage, testers would test the system against the system objectives listed in the software requirement specifications.  Moreover, testers would test non-functional requirements and any external interfaces in the system.

Figure 2:  Relationship Between Unit Testing, Integration Testing, and System Testing



It is essential to understand the basis of these relationship in order to increase the efficiency of a team.  The following is a scenario where unit testing is misunderstood to be integration testing.  It leads to additional team effort on testing at the end of the project.

Two developers are working in a team environment.  Both should implement individual programs and perform unit testing.  Then they should integrate these small programs into components and test the integrated components as a team.  If one of the developers disregards unit testing and expects the other developer to catch the problem in the integration testing, then the time and efforts would be incurred twice more than if the developer discovered and fixed the problems in the unit testing phase.


III. Test Techniques

"Test techniques" are a means to achieving a "test objective."  A test objective is always an evaluation of the product under test.  For example, a development organization might have a test objective regarding the reliability of their software.  A measurement of that reliability might come as a result of executing numerous tests.  The way the testing team determines the test cases to run is a test technique.  For example, the development organization may use boundary value analysis as a means for finding test cases.  In turn, those tests that result from boundary value analysis contribute to the team's understanding of their product reliability.  "It must be of good quality since we ran all of our tests" is sort of like saying, "what do you mean I don't have any money in my account, I still have cheques left!"

SWEBOK identifies two alternative decompositions of test techniques, one identified by the basis that the tester uses to come up with test cases, and the other identified by how much the tester knows about the implementation as they come up with test cases [Bertolino, 2001].  A "decomposition" is a means for classifying the test techniques.  Many of the test techniques, therefore, show up in both decompositions.  In this section we review the alternative decompositions and then describe how they help teams to improve their software process.


The "Source of Test Cases" Perspective

There are several high-level bases that a development organization could use:
  • Tester's Intuition and Experience
  • Specifications
  • Source Code
  • Faults
  • Usage
  • Nature of the Application

"Exploratory testing" [Bach, 2002] fits into the first category, where the strength of the test cases is a function of the tester's level of experience and their intuition about what constitutes a good test for the product under test. There are other extremes, such as the tests that Cleanroom Software Engineering (CSE) requires, based exclusively on the formal specifications of the requirements. If you review the source code as you come up with test cases, then you are using the source code as the basis. Similarly, you may choose to use known or expected faults, usage, and the nature of the application as the means of coming up with test cases.

The "usage" category is worth special consideration, since this is a fundamental testing technique that appears to be increasingly popular. Many software organizations now use object-oriented development methods and rely on use cases to help them manage their requirements, since they relate directly to the use of the product under development. The use cases are also a tester's best friend, since the tester can also use the use cases as the basis for the test plan. Almost with no effort, the organization gains a test plan that is integrated with the development plan, merely by basing the high-level test plan on the use cases. Testers can also use the use case description document, if the team writes them, as the basis for the test cases themselves.


The "Implementation Knowledge" Perspective

The extent that the tester knows the details of the implementation of the product under is the second way of classifying the test techniques. Here, the SWEBOK identifies two broad categories:
  • Black box techniques
  • White box techniques

This corresponds to the traditional training that developers might receive over the course of their software engineering education. Hybrids of the two approaches are also prevalent. In that case, the development team uses techniques from both categories in their overall testing process. The advantage of looking at the world from the "black box" testing perspective is that it opens the door to improvements in the implementation. All the team has to do is re-run the black box test cases to confirm that the improved implementation didn't break anything.

Increasingly, developers are intertwining the test and development process. "Test-first" development is creeping into the mainstream. At a high level, what this means is that developers write a test case, watch it fail, write the production code that satisfies the test, and then watches the test pass. The program is finished when the developer runs out of test cases. The developer's primary source of test cases in this type of process alternates between black box and white box techniques. First, the developer identifies a black-box test case. Then writes the code to satisfy it. That usually means that they add more test cases to the list of test cases to run, this time based on the code that they wrote. That would be white box testing. Then they go back to a black box test case, and the process repeats from there.


Test Techniques and Software Process Improvement

The CMM key process area that testing falls into is Software Product Engineering [Paulk, 1993]. According the SEI Maturity Questionnaire, there are six questions that fall under this KPA [Zubrow, 1994], although none of the questions address testing exclusively, since the emphasis in Software Product Engineering is on the set of engineering activities required to deliver the software product. Here are the questions, paraphrased and specialized to the testing conversation:

1. Are the software work products produced according to the project's defined software process? 

In order to answer yes to this question in regards to testing, this means that the software work products that relate to testing must be produced according to a defined testing process. The test artifacts are test plans, test cases, test results, etc. The testing process may be tailored to suit the project requirements. The question also implies that there is a standard for what these artifacts look like. The CMM does not dictate this standard. An XP process, for example, that consistently writes test cases in their implementation language, runs the tests in an automates fashion on a regular basis, and generates standard reports that describe the results of those tests, could answer "yes" to this question. A team that uses a more traditional approach to testing that includes documenting test plans, test cases, etc. in text documents could also answer yes to this question, provided the testing process was also documented and the content of the documents standardized.


2. Is consistency maintained across software work products? 

In order to answer yes to this question in regards to testing, this means that the team should be able to trace the purpose of the test back to its origin. In particular, this applies to functional tests based on the product usage. If a use case says something, then the test case should test for that something and there should be a record of where that test comes from. An XP team that uses in-line comments to record the user story that spawned the test could answer yes to this question. Without that convention, however, the XP team would have to answer "no." A team that uses Rational Unified Process™ ties test cases to use cases (either with or without the assistance of a tool). If the team kept a record of this association, then they could answer "yes."


3. Does the project follow a written organizational policy for performing the software engineering activities? 

A "yes" to this question as it pertains to testing would require that there is an organizational policy regarding the methods, tools, and techniques for testing. Note that this doesn't mean that the process can't be tailored for the specified project. It just means that the root of the tailored process is the common organizational policy. Part of the policy may require, for example, that the project customize it's testing method to match the type of product. For example, there may be specialized testing required for customer-facing web applications versus internally-focused desktop applications.


4. Are adequate resources provided for performing the software engineering tasks? 

To answer "yes" to this question, the team would have to admit that they have enough funding for the test activities (i.e., the testing effort didn't lose budget in order to accommodate a more expensive development phase), there are enough individuals with adequate testing skills on the team, and those individuals are using tools that fit the intent. Note that the tools criteria does NOT require that a specific tool or class of tools be used, it only means that the testing the tools the team does use are appropriate.


5. Are measurements used to determine the functionality and quality of the software products? 

Note that the question doesn't demand a specific measurement, only the intent of the measurement. Recording defects and classifying them, for example, is a type of measurement. More advanced teams will use those measurements in order to improve the process but that's a different key practice area and not unique to testing measurements. If a team has a defect recording standard, a procedure for recording defects, and everyone follows the procedure consistently, then the team could answer yes to this question. We discuss other testing measures in the next section of this document.


6. Are the activities and work products for engineering software subjected to SQA reviews and audits? 

With respect to testing, an SQA team would review the tests, the test results, and the testing process to confirm that they match the requirements. The requirements may come either from the organization policy, or the project policy, however, the project policies cannot override the organizational policy without satisfying the control objectives related to tailoring the software process for a specified project. This is the step that is hard for most organizations to implement, for a variety of reasons.

This represents a relatively tall order for most software projects.  The questions illustrate the CMM's organizational focus (as opposed to the project focus of agile approaches such as XP).  So while an XP team can answer "yes" to some of these questions, they can't answer "yes" to all of them.  Process improvement based solely on satisfying the answers to these questions is also a potential problem, since it lacks the general awareness that teams should have regarding their test activities.  Note that an XP team could answer to some other questions that the CMM assessment doesn't ask.  The best example of this, for example, is that XP requires that developers automate unit tests, 100% of the time, and that all of the unit tests pass before releasing code.  This sounds like a good policy, but in the eyes of the assessment, it doesn't score any more points than does a MS Word document that lists the test cases.


IV. Test Related Measures

This section describes some of the test related measures that team could use as the basis for software process improvement.  Using some measurements would permit the team to answer "yes" to the measurement question in the CMM assessment under Software Product Engineering.  The SWEBOK divides test related measures into two categories: those measurements that evaluate the product under test, and those that evaluate the tests that the team has performed [Bertolino, 2001].  This distinction is important.  The first category forms the basis for test objectives, although they might be worded in some other way.  Reliability objectives, for example, are a type of test objective.  The second category forms an evaluation of the tests that you have performed, so the numbers you collect in this category only tell you something about your tests and your test process, not whether or not you are meeting your test objectives.


Evaluating the Product Under Test

There are several classes of measurements that evaluate the product under test [Bertolino, 2001]:
  • Program measurements that assist in test planning
  • Types and classification of faults (may be more than one required here)
  • Fault intensity
  • Reliability evaluation
  • Reliability growth models

The various program measurements that assist in test planning are measurements such as source lines of code (LOC), function points, complexity number, etc.  A team may use these numbers as the means of identifying the most appropriate number of tests, or to determine the expected duration of the test activities. The number of unit test cases you can choose from per method, for example, is a function of the method's McCabe complexity number, and of the number of input parameters to the method. Similarly, as mentioned above, the number of test groups or test scenarios is a function of the number of use cases in a system.

Recording faults and classifying them is a fundamental measurement, and required for any process improvement effort. For this reason, recording defects and defect prevention activities are central to Humphrey's Personal Software Process (PSP) [Humphrey, 1995]. The weakness in the PSP measurements is that they don't require that practitioners trace the defect back to its root cause module, hampering process improvement slightly. For example, if you have a tendency to make more errors in "logic" modules, then process improvements aimed at improving your ability to develop logic modules become possible to measure over time.

The last three measures are common measures in reliability engineering, and represent the fuzzy boundary between reliability engineering and software testing. The two disciplines are highly related.  Reliability is one of the software quality attributes. Reliability growth models permit engineers to predict the reliability at a certain point in time, enabling the team to know either a) how many faults they have to suffer through before they can release, or b) the length of time that has to go by, given a known rate of debugging, before they can release [Eberlein, 2002]. They simply represent mathematical models of the improvement (growth) of reliability over time.


Evaluating the Tests Performed

The SWEBOK lists the following measures of the tests performed [Bertolino, 2001]:
  • Coverage/thoroughness measures
  • Fault seeding
  • Mutation score

A "mutation" is a variation of the product under test that contains a bug. A good test set will detect (or "kill") the mutant [Zhu, 1997]. The higher the number of mutants that a software test set can detect as compared to the total number of mutants, the better the test set is. Fault seeding is simply playing a fast one of the testing team and introducing a known defect, and then seeing if they can detect it. If there are number of faults seeded, then the number of injected faults that the team detects as compared to the number of non-injected faults they detect can also approximately predict the number of remaining non-injected defects. Both mutant scoring and fault seeding are expensive undertakings.

A more common approach, and comparatively non-invasive, is to measure the extent that the test set runs the product under test. This is called coverage since it indicates how much of the product the tests will "cover." This can be as simple as counting the number of test cases per use case, the number of test cases per program unit, or the number of test cases per method. There are more technical measurements of coverage as well that would require more precise program measurements [Jorgenson, 1995].


Choosing Measurements for Software Process Improvement in Testing

One way to select measurements to use as the basis for improving testing is the "Goal-Question-Metric" paradigm [SEL, 2002]. This method is used to define measurement on the software project or process so that the resulting measurements retain relevance and meaning for all levels of the organization. The method centers on the "goals" of the organization, the "questions" that the organization has that would confirm they are meeting or will meet their goal, and the "metrics" that will provide the answers to the questions.


An organization might have a stated goal of improving the timeliness of their ability to conduct tests, from the viewpoint of the project manager. You can deconstruct this goal into several pieces: it's purpose (to improve), it's quality concern (timeliness), it's process (conducting tests) and it's assessor (the project manager).


In order to assess the organization's ability to know whether or not they have met this goal of "improving the timeliness of their tests," the organization may discover a series of questions something like the following examples:

Q1: What is the current rate of conducting tests?

Q2: As an organization, are we improving the current rate of conducting tests?


The metrics that answer the questions are as follows:

Q1: Average testing time, normalized by product size, standard deviation, and percentage of cases that fall outside the limit

Q2: Current average testing time, expressed as a percentage of the baseline average testing time.

The measurements permit the team to answer the questions, and if you scale up the hierarchy, to ultimately assist the organization to monitor it's progress in meeting the stated goal(s).

Improving Testing

An organization could base a program for improving software testing on any number of things, however, it seems the test levels are a good starting point. Goals to improve the timeliness or the coverage of unit testing, however, might not be the same as the goals they establish for integration or system tests. This is part of the complexity of improving testing capability.

Unit Tests - Goals to improve the coverage and timeliness make sense

Integration Tests - Goals to improve coverage and eliminate obstacles to testing make sense

System Tests - Goals related to agreements that the development organization makes with customers, goals based on risk assessments, and goals based on obstacle reports make sense.

The GQM approach gives us the means to improve the testing practices by helping us to establish the right measurements to start collecting. There is also an organizational management aspect of improving testing and of running a successful testing project. This is the focus of the next section, where we discuss the various aspects of managing the test process.


V. Managing the Test Process

Differences Between Software Developers and Testers

Before discussing the virtues and necessity of having a separate test group, we first examine the differences between software developers and testers.

Focus of Developers

Since the developers have designed and implemented the software product, they have thorough understanding of the source code. They possess the knowledge of the internals of the software such as the algorithm used, decision paths, and so on. They are design experts that excel in creating software. With the complexity and the size of modern software products in general, they are all specialists in particular areas. Some may be the experts in communication protocols, some may be well educated in mathematical algorithms, and some may be experienced in object-oriented design, etc. They are focused on the details of how to construct the software product, i.e. the design. They may respond to a problem reported by the users by saying, “Why do they do that in the first place?” because they know how the software should work according to the design and the implementation. On the other hand, when a developer acknowledged a problem, by nature they will be interested in solving the problem, no matter how unimportant the problem may be to the users/customers.


Focus of Testers

As compared to the developers, testers usually need to have broader overall domain knowledge in order for them to verify the software product. In fact, they usually think like the users of the software products. This helps them to “use the product the way a user would, instead of the way the developer might like them to” [Pettichord, 2000, p. 42]. They are generally ignorant of the design and the implementation of the product. Instead of building software, testers are interested in detecting faults. After all, they are testers and they are encouraged to dig out potential flaws in areas like usability and reliability of the software product. Since they have the overall domain knowledge, they are also more effective at defining the severity of the problems.


Separate Test Group

What is a Separate Test Group?

After looking at the differences between software developers and testers, a separate test group seems to be a good complement to the software development group. One of the reasons why we want to set up separate test group is that we “want a test organization to provide management with independent quality information and to help the organization achieve measurable improvements in software quality” [Hetzel, 1988, p. 191]. The testers in this separate test group should keep detailed reports of the problems found, such that the developers can review the reports to resolve the problems, and the reports can be used as historical data for future quality improvements. The other reason why organizations might want to have a separate test group is that developers may not be able to effectively test their own software products. They must be biased towards their own products; after all they know the implementation details, and who do not proud of what they have accomplished? A separate test group provides an “unbiased, independent perspective” to the software product [Bertolino, 2001, p. 5-12]. As mentioned before, usually developers may not really know how the users use the software. Not to mention they have vertical knowledge about the areas they are familiar with from their development experiences and may not understand the application in the larger context. Members of the separate test group must have “reasonably direct link to the end users” [Humphrey, 1989, p.201]: in this way they can validate the software more effectively then the developers themselves, and they have better idea on the severity of the problems than the developers. With an early involvement to the software development cycles, the separate test group can design verification test cases to compare the functions and behaviors of the software against the requirements. Since this separate group is essential to product quality control, it is not an organization to allocate inexperience new hires or low performers from the software development department.


Separate Test Group Responsibilities

The separate test group should be responsible for testing the software product after the developers have done their unit tests and code module level integration tests. This doesn’t mean that the separate test group participates only after the developers have finished their unit testing. To gain broad domain knowledge about the software products, members of the separate test group should participate early in the development cycle, starting from the definition of the requirements. They should form an independent view of how the software system as a whole should work such that they can design their verification test cases. If possible, delicates from the separate test group should work with the customers/users of the software products through techniques like usability studies and validation tests to understand how they anticipate the software should behave. They should also participate in reviews of high-level design documents and unit/design integration test plans so that they can get a glimpse of all the features in the software products and what the developers have planned to test to avoid redundancy. With their domain knowledge, the separate test group should be able to effectively design and execute system level integration test cases, performance test cases, stress test cases, usability test cases, validation test cases, and verification test cases. Since the responsibility of the separate test group is to uncover problems in the software product as many as possible before the organization delivers it to the customers/users, they should be persistent in urging the problems to be resolved as soon as possible. They should not yield to the pressure from the management of the development department. This is more achievable when the separate test group is not under direct control of the development department. When a problem is found, the details of the problem and its severity should be conveyed to the appropriate development groups. The testers or management of the separate test group should also negotiate a resolute deadline with the developers, and accommodate the re-testing of the problematic feature in their future regression-testing schedule. The separate test group not only take the software product under their scrutiny, they should also look at various auxiliary products like the user manual and requirements documents.


Developers Responsibilities

Having a separated test group in the organization does not mean that the developers do not need to test their own code anymore. Although they might be biased, with the intimate knowledge of their own products the developers are the best candidates to design and execute white box unit test cases, or even code module level integration test cases. When members of the separate test group uncover problems, the developers should work with them to understand and ultimately resolve the problems. They must not see these problem reports as nuisances, nor should they take these problems personally. They should understand that with the complexity of the modern software products, it is inevitable to make mistakes. Not to mention that in large project, individual developers are all assigned to very specific areas that they might miss the overall picture of the system as a whole. The last thing they should do is to make every encounter as a confrontation. The management of the development department should also work with the separate test group and determine when is the best time to resolve the problems such that it has the minimal impact to the quality of the products and the software development schedule.



One of the advantages of having a separate test group is that they did not develop the software product in the first place. When a tester of the separate test group found a problem, this individual will not take it personally. The separate test group is ego-less with regard to the development of the software product. This also gives the separate test group an unbiased view to the product since they do not know the implementation. They can design their test cases with their broad domain knowledge about the product and its users’ requirements. The separate this group will use the software with a wide range of inputs and execute it in execution paths that simulate the real world users’ environment. This way, we can ensure that the verification and validation tests are not influenced by how the software is implemented. The product will be tested according to how the users would use it, not how the developers would like it to be used. With an independent, separate test group, the testers do not need to bow to the pressure from the management of the development department. They do not have to fear the consequences of delaying the delivery of the project by uncovering more problems. They are there to find problems, not to hide them, so there is no conflict of interest. Also, with the separate test group the developers need not worry about system level testing like verification tests and validation tests. Generally, with large project each developer is assigned to very specific areas that they lack the overall view of the product. They may not be well trained and well informed to effectively test the software. The separate test group can potentially relieve the burden on the developers by taking care of this kind of system level testing activities. Simply put, “having different people take different approaches to analyzing the software increases the likelihood that serious problems will be detected and removed” [Pettichord, 2000, page 44].



Running a separate test group does cost more. By introducing an independent test group, this “may easily add 20 percent or more to the total cost of a project” [Hetzel, 1988, page 190]. For many organizations this may not be feasible given that most of the companies are operating in competitive markets. Although the lack of knowledge of the implementation gives the separate test group an unbiased view to the software, in some situations it can also hinder the testers' ability to effectively test the software. Also, since the separate test group is independent from the software development department, tension can rise to interfere any cooperation effort between the two groups that are supposed to work together to ensure the product they deliver to the customers is of high quality.


Teamwork is Important

The two groups, the developers and the testers, must appreciate the differences between them that they “complement one another, each providing perspectives and skills that the other may lack” [Pettichord, 2000, page 42]. They are in together to ensure the software product deliver to the customers/users with quality. They should work closely together throughout the software project. Developers should realize that the members of the separate test group are not their enemies. The testers do not break the code; rather, they discovered problems in the software on behalf of the developers. They are doing the whole organization a service. On the other hand, testers of the separate test group should also work with the developers to see if the anomalies they found are real problem or not, and if they are real problems, can they be resolved without huge impact to the overall quality, schedule, or architecture of the software products. After all, the developers are the experts of the design and implementation of the software products. Testers should be persistence but not unreasonable. Developers should also inform the separate test group on potential “danger zones” in the software such that the testers can put more efforts on the particular functions and features that are deemed as high risks. Testers should also document the problems they discovered in full details and gather as many data as possible for the developers that are working on the cases. Avoid problem report that just says, “It does not work”.


Is this Feasible for Everyone?

As stated in the above, many organizations/projects might not have the resources to support a separate (independent) test group. However, there are benefits having a separate group of test team to ensure that the quality of the software product is acceptable to the customers before delivery, and clearly the software developers alone will not be able to do the job. If it is not feasible to have a truly independent separate test group, the organization should explore alternatives. For instance, "separate" does not mean "dedicated"; internal team members that have not created any of the source code can write and execute test cases. Although this sub-group does not have independence from the development department and may face pressure from its own management, at least it can have an unbiased view of the software product. When considering alternatives, any organization should look at its own  “organization culture”,  “the nature of the business application” and the cost [Hetzel, 1988, page 190].



The criticality of testing cannot be understated. In this report, we have summarized the basic concepts and definitions of testing, test levels, test techniques, test-related measures, and test management concerns. We suggested that organizations wanting to improve their testing capability could use the CMM as the framework for that improvement, or they could use the Goal-Question-Metric paradigm to establish a set of measures that could guide their improvement efforts. The SWEBOK, should it gain continued acceptance and evolve along with the industry, is an important part of standardizing and reusing the testing knowledge that exists in the software engineering community. The fact that successful testing is highly related to product acceptance and adoption (the "endgame") contributes to its criticality. Every software engineer should excel at identifying test cases based on requirements, and learn to play this "endgame." When the chips are down, and the project seems stuck at "95% complete," these skills will shine and the project will finish. Without those skills, the project may continue in a diseased state for an undetermined length of time, leaving both developers and customers exasperated. This is the motivation for testing effectively.


Group Presentation

Presentation in Microsoft Power Point Format
Click here to review group presentation in MS Power Point format.

Presentation in HTML Format
Click here to review group presentation in HTML format.



[Beck]  Kent Beck, Erich Gamma, and et al. “TomcatBook: Chapter on Testing – Junit general Java testing”,
[Beizer, 1990] Beizer, Boris. (1990) "Software Testing Techniques, 2nd Edition." Boston: International Thompson Computer Press.
®[Bertolino, 2001]  Antonia Bertolino, Chapter 5 “Software Testing”, Guide to the Software Engineering Body of Knowledge, Software Engineering Coordinating Committee, IEEE. IEEE Trial (Version 0.95), May 2001.
[Crispin, 2002]  Lisa Crispin, "Testing for Extreme Programming" Draft in Progress, January, 2002.  
[Eberlein, 2002] Eberlein, Armin. SENG609.11 Lecture Notes Winter 2002, Calgary: University of Calgary.
[Hetzel, 1988]  Hetzel, W. (1988). The complete Guide to Software Testing, 2nd Edition, John Wiley and Sons, Inc.
[Hower, 2001]  Rick Hower. “Software QA / Test Resource Center”, 2001.
[Humphrey, 1989]  Watts S. Humphrey, “Managing the Software Process Software Engineering Institute”, Toronto: Addison-Wesley, 1989.
[Humphrey, 1995]  Humphrey, Watts S. A Discipline for Software Engineering. Toronto: Addison-Wesley, 1995.
[Jorgensen, 1995] Paul Jorgensen. "Software Testing: a Craftsman's Approach". Boca Raton: CRC Press, 1995.
®[Junit, 2001]  Object Mentor Incorporation, “Junit Home Page”, 2001.
[Kaner et al, 2002] Kaner, Cem, James Bach, and Bret Pettichord. "Lessons Learned in Software Testing: A Context-Driven Approach" New York: John Wiley & Sons, Inc., 2002.
[Paulk, 1993] Paulk, Mark C., Charles V. Weber, Suzanne M. Garcia, Mary Beth Chrissis, Marilyn Bush. Key Practices of the Capability Maturity Model, Version 1.1. Technical Report CMU/SEI-93-TR-025.
®[Pettichord, 2000]  Pettichord, B. (2000). Testers and Developers Think Differently, Software Testing and Quality Engineering, STQE Magazine
[SEL, 2002] NASA Software Engineering Laboratory Experience Factory
[Whittaker, 2000]  James A. Whittaker, “What is Software Testing?  And Why Is It So Hard?”  IEEE Software, January / February 2000.
[Zhu, 1997] Zhu, H., P.A.V. Hall, and J.H.R. May, Software Unit Test Coverage and Adequacy. ACM Computing Surveys, 29, 4 (December, 1997) 366-427.
[Zubrow, 1994] Maturity Questionnaire. Special Report CMU/SEI-94-SR-7.

The University of Calgary
Up to page above

Software Engineering Research Network
mail to Amy Law
mail to Adam Geras
mail to Joseph Man
Last Modified 17-Feb-2002
SENG 621 Software Process Management