Unit and Regression Testing
The single most important rule of testing is to do it.
--- Brian Kernighan and Rob Pike, The
Practice of Programming
Most programmers don't do enough testing today because:
- They aren't required to.
- It's tedious.
- Existing tools are obscure, hard to use, expensive,
don't actually provide much help, or all three.
- They don't know where to start, when to stop, or
how to tell whether the tests they've written are meaningful.
The Software Carpentry
project's testing category will try to address the second issue
by fixing the third and fourth. Its goal is to do this by building a tool
- encourage developers and projects who don't currently do any testing
to test their software, and
- encourage those who are testing to do more.
The tool's long-term success depends on it being adopted by many
software projects in the Open Source and scientific communities.
The tool must therefore:
- Have a gentle learning curve, especially for developers
without software engineering training. This implies that:
- the tool must be very simple to install and configure;
- once the tool has been used to create an empty skeletal test
suite, it must be very easy to add the first real test; and
- if N tests have been written, it must be very easy to add
- Provide a very simple workflow, so that tests can be added,
modified, inspected, and summarized by developers, managers, and other
stakeholders easily and systematically.
- Provide feedback regarding the quality and thoroughness of
testing (e.g. by summarizing code coverage of test suites) so that
developers will be able to tell how much they have done, how much
remains to be done, and how well third party modules have been tested.
This may include tracking and reporting degradation of the test suite
("rust") due to changes in the code being tested, so that programmers
have an incentive to keep tests up-to-date.
How to Participate
The first Software
Carpentry design competition produced good designs, and good
discussion, but did not lead to the pooling of ideas that is one of open
software's great strengths. The second running of the SC Test
competition will therefore experiment with another model for open design.
The aim of the competition is not to produce a design per se, but to:
- gather information on requirements, priorities, existing systems,
design options, etc., and
- select the team that will actually design the tool.
Anyone who wishes to participate may join the discussion list by
sending mail to
Messages to the list can then be sent to
between now and Friday, October 27, 2000.
(Please note that the mailing list devoted to testing alone has been retired,
in order to facilitate cross-talk between various sub-projects.)
The competition judges will
participate in the discussion, both directly (by posting their own ideas)
and indirectly (by commenting on the contributions of others). In
particular, the judges will identify and summarize those messages they
have found most helpful on a weekly basis. At the same time, the
competition organizers will actively solicit input from several existing
Open Source projects, such as the GNU Compiler Suite and Python.
At the end of October, participants will be asked to put themselves
forward for the final design team, either singly or in groups, and to
indicate how much time they are willing to commit toward producing a final
design. The judges will then select up to four
people to produce a final design. Each of these people will receive $2500
(equivalent to the first-round prizes in the first design competition);
the judges will also award a total of $5000 to other participants for
The design team as a whole will receive an additional $10,000 upon
submission of a completed design in December. This design will be modeled
on the final entries in the first design competition, and on the
Subversion design document
We are experimenting with this new model in order to see whether
it will allow non-developers (such as scientists and engineers) to
help the Open Source community set priorities and directions. We also
hope that it will leave a clearer record of what alternatives were
considered, and how they were evaluated, than the two-round model used
in the first competition.
Participants are urged to familiarize themselves with the xUnit collection of
testing software. These programs are widely used, satisfy many important
requirements for testing tools, and are a good starting point for
The tool must be designed to support regression testing (also known as
"batch testing") of software systems ranging in size from a handful of
functions to the GNU Compiler Suite. To achieve this, the tool must
support the following (in order of importance):
- Static unit testing of functions, classes, and modules in C, Python,
and Java. (These three languages are chosen because they span the range
from low-level to high-level, and loose to strict.)
- Customizable reporting of test results, ranging from a single-line
command-line summary of the number of tests that passed and failed,
through to automatic web publication of charts of test statistics over
- Storing and inspecting test suites, along with current and past test
results, in a structured way. This should include support for sharing
tests (or groups or tests) between test suites.
- Scriptable control of test suite execution, so that portions can be
executed selectively, executed repeatedly under different load
conditions, only executed at certain times of day, and so on.
- Parallel execution of test suites.
- Integration with other tools, such as source control and bug
tracking. This should also include integration with other testing tools,
i.e. it should be possible to incorporate the results of a sub-suite of
tests run (or otherwise managed) by a third-party testing tool.
Given the time and resources available in this project, not every
feature in the design will necessarily be implemented, and some
requirements may not initially be met. The final design must therefore be
very modular, and explicitly describe which features are considered core
and which can be added later (or layered on top of the base tool).
The design may include other features to meet specific testing
requirements of particular classes of software, but these will only be
implemented in version 1.0 if a strong case can be made that they are more
important than those described in the previous section.
- Scripted testing of interactive applications, including those with
graphical user interfaces.
- Testing parallel and/or distributed programs.
- Testing specific to particular application domains, such as
numerical simulations or CGI scripts.
- Support for features or aspects of particular programming languages.
- Automated generation of tests.
Highlights from the Discussion List
October 8, 2000
Most of the last two weeks was taken up with a long thread about
test output, natural language, XML, etc.. It started under the the
headings "SC-Test Output" and "RE: XML, NL, TestTalk, and
transformation libraries..." Mike Donat posted a synthesis of sorts
It turned into a discussion of how tests are are input:
Brian Marick pointed at work by Hans Buwalda on using spreadsheets for input:
and Patrick Campbell-Preston posted on the subject of how to handle millions of tests:
David Alex Lamb provided a nice summary of the pros and cons:
Greg Wilson posted a longer analysis of the pros and cons in:
Declarative test specification is good.
Purely declarative specification is not powerful enough.
Simple languages grow; growth reduces their simplicity.
Designing a special-purpose language for testing would be a dangerous road to take.
Using an existing language is also dangerous.
Dave Thomas's reply pointed out that shutting out options is premature,
since we don't yet have a firm list of requirements:
Greg replied to say that we've got as much input from end users as we're
going to get, and pleaded for commentary on the testing profiles at:
Mike Donat's reply said, "So what you're saying is that you want it
simple, yet complex enough to do everything you could ever want - no
In this message, he pointed out that the mechanisms used to specify what
unit test to run, and what arguments to give it, could also be used to call
back into the testing harness itself, in order to run a program or script
that itself generated more test cases; Greg elaborated on this in:
and Garrett Goebel clarified some of the issues further in:
In other news, there is a Python wrapper for the "Expect" library:
Greg Wilson posted a description of a problem he'd like to use in the tutorial:
Alex Samuel posted some notes on testing GCC:
Brian Marick put up links to his catalogs of common tests:
and to James Bach's paper on Useful Features of a Test Automation System:
David Bennett talking about test output and TET:
September 21, 2000
Bill Nichols (a potential end user from the US Navy's Bettis Lab) joined
the fray with a discussion of basic requirements in
More content from Brian Marick:
a short case study of jUnit in
and James Bach's paper Useful Features of a Test Automation System in
Michael Donat posted an example of the kind of XML that tests could display in
This is a very promising direction,
and was followed up by Bill Nichols in
See also David Bennett's posting about test results:
Michael also commented on running tests, reporting output, and libraries
to support specific kinds of testing (e.g. resource starvation) in
David Bennett replied in
The project coordinator posted two notes regarding the use of an SQL back-end
for storing test information:
one on installation and accessibility
and one on the capabilities of a pure-Python database called Gadfly
David Alex Lamb pointed out that the test tool should share some modules
(for re-executing code) with the build tool in
Steven Knight (winner of the build category) posted a couple of replies:
The project coordinator followed up in
Garrett Goebel posted a note on package management for testing in
David also posted a link to an object model for unit testing:
Some references to academic papers have been posted:
David Bennett discussed some of the benefits of testing in
The project coordinator posted a note on languages, and the use of Python:
David Bennett's reply is
Patrick Campbell-Preston pointed readers at ideas in his first-round
proposal regarding control of the test execution environment in
He also started a thread on the (ab)use of random tests in
*** Garrett Goebel posted some thoughts on a database schema for storing test information in
Patrick Campbell-Preston replied in
See also Piotr Sawuk's message:
another from Patrick:
and Garrett's reply:
September 11, 2000
Tom Van Vleck (a competition judge) posted a three-layer "vision" of
what the testing framework could be:
In particular, this differentiated between things that are "a simple matter of programming"
(e.g. designing a database schema for storing test results),
and things that require plug-in originality.
In an earlier posting, Tom pointed at a testing technique called "hellandizing":
The project coordinator brought up the question of how invasive testing could be:
and asked for specific examples of problems that could only be solved in certain ways.
Patrick Campbell-Preston replied in
The project coordinator posted some profiles of typical users and their testing needs:
Garrett Goebel asked, "What is the testing equivalent of 'scribble'?"
(the standard non-trivial introductory tutorial for Windows MFC programming) in:
In followup, some people suggested using the testing of SC Test itself in the SC Test tutorial,
but others felt that this could be confusing.
*** Ken Martin posted a summary of testing done in the open source
http://www.software-carpentry.com/lists/sc-discuss/Visualization Toolkit project,
and a link to a paper that describes their process:
The project coordinator replied in:
Ken then replied to one point (about validating output) in:
Stephen Lee described the tools his group developed to support testing
at Los Alamos National Laboratory in
Tripp Lilley brought up the idea of test markup in
The project coordinator pointed to an earlier discussion of traceability
The project coordinator pointed out the VTK Quality Dashboard in
David Alex Lamb laid out some meta-requirements related to
roles and features in
The winners in the first
run of this competition all included excellent ideas; their
submissions are listed below.
Other material that participants should examine includes:
- This paper on the testing process used in the
Toolkit is a good summary of a state-of-the-art Open Source testing system.
- The comp.software-testing FAQ.
- The Software Testing Hotlist,
edited by Bret Pettichord.
- The EG3's testing page.
- The Computer Information Center's testing page.
of Software Testing, by Brian Marick (who has helped
re-formulate this category's requirements). Many of his other writings are collected
Computer Software, by Cem Kaner et al. is a good "from the trenches"
look at testing; pages 27-58 are an excellent overview.
Pragmatic Programmer, by Andrew Hunt and Dave Thomas (both
Software Carpentry judges). The book's web site is http://www.pragmaticprogrammer.com/.
- The Extreme Programming
site keeps a list
of implementations of the XUnit unit testing framework in various
- Existing tools include Expect,
- Chang Liu has developed a test suite for Apache, which may be a
example to entrants in this category. The project is at
- Several existing Open Source tools may serve as inspiration in this
category. In particular, entrants may also wish to
look at Expect and DejaGnu,
which can be used to run tests on interactive and non-interactive
- Sun's JavaStar
is a GUI testing tool for Java.
- Entrants are also strongly encouraged to check out the
Testing Craft web
- A quasi-free testing tool that was popular some years ago is
It was originally designed for Posix compatibility testing,
and some of that shows. But it's gotten a fair amount of use.
- Some data-driven approaches are documented on
Bret Pettichord's page.
The paper by Ed Kit describes Hans Buwalda's TestFrame,
which is a mature approach and toolkit. Hans wrote a fairly good paper
describing the approach, but it's not online.
The paper by Bret (Success with Test Automation) also describes
data-driven tests. These are mainly used for GUI tests,
but nothing inherent limits them to that.
- Fewster and Graham's
has a reasonable description of organizing test suites.
It isn't fancy, but most people don't anything too fancy.
Some questions and suggestions from judges, entrants, and bystanders
listed below. Please note that these are not mandatory requirements,
but are instead intended to spark ideas and discussion.
- How do users put tests for statically-compiled languages (such
as Fortran-77 and C++), dynamically-loaded languages (such as Java),
and fully-dynamic languages (such as Scheme and Python) in a single
- How do developers keep tests up to date with the code that is
being tested? In particular, how does the testing framework interact
with the developers' version control system?
- How do users specify what a "correct" answer (e.g. within
numerical roundoff tolerances, or within a certain degree of pixel
accuracy for images)?
- How are summary statistics generated and reported? How is
responsibility for fixes and breakages reported?
- How do test suites manage special features of particular
languages, such as inheritance in object-oriented languages, or
templates in languages such as C++?
Please send other suggestions and ideas to