[Software Carpentry logo]
[ACL Logo]

[CodeSourcery Logo]

SC Test

Unit and Regression Testing

Judges: Frank Alexander
Paul Dubois
Brian Marick
Dave Thomas
Tom Van Vleck

The single most important rule of testing is to do it.
     --- Brian Kernighan and Rob Pike, The Practice of Programming


How to Participate
Optional Considerations
Highlights from the Discussion List
     October 8, 2000
     September 21, 2000
     September 11, 2000


Most programmers don't do enough testing today because:

  1. They aren't required to.
  2. It's tedious.
  3. Existing tools are obscure, hard to use, expensive, don't actually provide much help, or all three.
  4. They don't know where to start, when to stop, or how to tell whether the tests they've written are meaningful.

The Software Carpentry project's testing category will try to address the second issue by fixing the third and fourth. Its goal is to do this by building a tool that will:

  • encourage developers and projects who don't currently do any testing to test their software, and
  • encourage those who are testing to do more.

The tool's long-term success depends on it being adopted by many software projects in the Open Source and scientific communities. The tool must therefore:

  • Have a gentle learning curve, especially for developers without software engineering training. This implies that:
    • the tool must be very simple to install and configure;
    • once the tool has been used to create an empty skeletal test suite, it must be very easy to add the first real test; and
    • if N tests have been written, it must be very easy to add an N+1st.
  • Provide a very simple workflow, so that tests can be added, modified, inspected, and summarized by developers, managers, and other stakeholders easily and systematically.
  • Provide feedback regarding the quality and thoroughness of testing (e.g. by summarizing code coverage of test suites) so that developers will be able to tell how much they have done, how much remains to be done, and how well third party modules have been tested. This may include tracking and reporting degradation of the test suite ("rust") due to changes in the code being tested, so that programmers have an incentive to keep tests up-to-date.

How to Participate

The first Software Carpentry design competition produced good designs, and good discussion, but did not lead to the pooling of ideas that is one of open software's great strengths. The second running of the SC Test competition will therefore experiment with another model for open design. The aim of the competition is not to produce a design per se, but to:

  • gather information on requirements, priorities, existing systems, design options, etc., and
  • select the team that will actually design the tool.

Anyone who wishes to participate may join the discussion list by sending mail to sc-discuss-subscribe@software-carpentry.com. Messages to the list can then be sent to sc-discuss@software-carpentry.com between now and Friday, October 27, 2000. (Please note that the mailing list devoted to testing alone has been retired, in order to facilitate cross-talk between various sub-projects.)

The competition judges will participate in the discussion, both directly (by posting their own ideas) and indirectly (by commenting on the contributions of others). In particular, the judges will identify and summarize those messages they have found most helpful on a weekly basis. At the same time, the competition organizers will actively solicit input from several existing Open Source projects, such as the GNU Compiler Suite and Python.

At the end of October, participants will be asked to put themselves forward for the final design team, either singly or in groups, and to indicate how much time they are willing to commit toward producing a final design. The judges will then select up to four people to produce a final design. Each of these people will receive $2500 (equivalent to the first-round prizes in the first design competition); the judges will also award a total of $5000 to other participants for noteworthy contributions.

The design team as a whole will receive an additional $10,000 upon submission of a completed design in December. This design will be modeled on the final entries in the first design competition, and on the Subversion design document (http://subversion.tigris.org/project_docs.html).

We are experimenting with this new model in order to see whether it will allow non-developers (such as scientists and engineers) to help the Open Source community set priorities and directions. We also hope that it will leave a clearer record of what alternatives were considered, and how they were evaluated, than the two-round model used in the first competition.


Participants are urged to familiarize themselves with the xUnit collection of testing software. These programs are widely used, satisfy many important requirements for testing tools, and are a good starting point for discussion.

The tool must be designed to support regression testing (also known as "batch testing") of software systems ranging in size from a handful of functions to the GNU Compiler Suite. To achieve this, the tool must support the following (in order of importance):

  1. Static unit testing of functions, classes, and modules in C, Python, and Java. (These three languages are chosen because they span the range from low-level to high-level, and loose to strict.)
  2. Customizable reporting of test results, ranging from a single-line command-line summary of the number of tests that passed and failed, through to automatic web publication of charts of test statistics over time.
  3. Storing and inspecting test suites, along with current and past test results, in a structured way. This should include support for sharing tests (or groups or tests) between test suites.
  4. Scriptable control of test suite execution, so that portions can be executed selectively, executed repeatedly under different load conditions, only executed at certain times of day, and so on.
  5. Parallel execution of test suites.
  6. Integration with other tools, such as source control and bug tracking. This should also include integration with other testing tools, i.e. it should be possible to incorporate the results of a sub-suite of tests run (or otherwise managed) by a third-party testing tool.

Given the time and resources available in this project, not every feature in the design will necessarily be implemented, and some requirements may not initially be met. The final design must therefore be very modular, and explicitly describe which features are considered core and which can be added later (or layered on top of the base tool).

Optional Considerations

The design may include other features to meet specific testing requirements of particular classes of software, but these will only be implemented in version 1.0 if a strong case can be made that they are more important than those described in the previous section.

  • Scripted testing of interactive applications, including those with graphical user interfaces.
  • Testing parallel and/or distributed programs.
  • Testing specific to particular application domains, such as numerical simulations or CGI scripts.
  • Support for features or aspects of particular programming languages.
  • Automated generation of tests.

Highlights from the Discussion List

October 8, 2000

Most of the last two weeks was taken up with a long thread about test output, natural language, XML, etc.. It started under the the headings "SC-Test Output" and "RE: XML, NL, TestTalk, and transformation libraries..." Mike Donat posted a synthesis of sorts in:

It turned into a discussion of how tests are are input:
Brian Marick pointed at work by Hans Buwalda on using spreadsheets for input:
and Patrick Campbell-Preston posted on the subject of how to handle millions of tests:

David Alex Lamb provided a nice summary of the pros and cons:
Greg Wilson posted a longer analysis of the pros and cons in:

  1. Declarative test specification is good.
  2. Purely declarative specification is not powerful enough.
  3. Simple languages grow; growth reduces their simplicity.
  4. Designing a special-purpose language for testing would be a dangerous road to take.
  5. Using an existing language is also dangerous.

Dave Thomas's reply pointed out that shutting out options is premature, since we don't yet have a firm list of requirements:
Greg replied to say that we've got as much input from end users as we're going to get, and pleaded for commentary on the testing profiles at:

Mike Donat's reply said, "So what you're saying is that you want it simple, yet complex enough to do everything you could ever want - no sweat! :)"
In this message, he pointed out that the mechanisms used to specify what unit test to run, and what arguments to give it, could also be used to call back into the testing harness itself, in order to run a program or script that itself generated more test cases; Greg elaborated on this in:
and Garrett Goebel clarified some of the issues further in:

In other news, there is a Python wrapper for the "Expect" library:

Greg Wilson posted a description of a problem he'd like to use in the tutorial:

Alex Samuel posted some notes on testing GCC:

Brian Marick put up links to his catalogs of common tests:
and to James Bach's paper on Useful Features of a Test Automation System:

David Bennett talking about test output and TET:

September 21, 2000

Bill Nichols (a potential end user from the US Navy's Bettis Lab) joined the fray with a discussion of basic requirements in

More content from Brian Marick: a short case study of jUnit in
and James Bach's paper Useful Features of a Test Automation System in

Michael Donat posted an example of the kind of XML that tests could display in
This is a very promising direction, and was followed up by Bill Nichols in
See also David Bennett's posting about test results:

Michael also commented on running tests, reporting output, and libraries to support specific kinds of testing (e.g. resource starvation) in
David Bennett replied in

The project coordinator posted two notes regarding the use of an SQL back-end for storing test information: one on installation and accessibility
and one on the capabilities of a pure-Python database called Gadfly

David Alex Lamb pointed out that the test tool should share some modules (for re-executing code) with the build tool in
Steven Knight (winner of the build category) posted a couple of replies:
The project coordinator followed up in
Garrett Goebel posted a note on package management for testing in

David also posted a link to an object model for unit testing:

Some references to academic papers have been posted:

David Bennett discussed some of the benefits of testing in

The project coordinator posted a note on languages, and the use of Python:
David Bennett's reply is

Patrick Campbell-Preston pointed readers at ideas in his first-round
proposal regarding control of the test execution environment in
He also started a thread on the (ab)use of random tests in

*** Garrett Goebel posted some thoughts on a database schema for storing test information in
Patrick Campbell-Preston replied in
See also Piotr Sawuk's message:
another from Patrick:
and Garrett's reply:

September 11, 2000

Tom Van Vleck (a competition judge) posted a three-layer "vision" of what the testing framework could be:
In particular, this differentiated between things that are "a simple matter of programming" (e.g. designing a database schema for storing test results), and things that require plug-in originality. In an earlier posting, Tom pointed at a testing technique called "hellandizing":

The project coordinator brought up the question of how invasive testing could be:
and asked for specific examples of problems that could only be solved in certain ways. Patrick Campbell-Preston replied in

The project coordinator posted some profiles of typical users and their testing needs:

Garrett Goebel asked, "What is the testing equivalent of 'scribble'?" (the standard non-trivial introductory tutorial for Windows MFC programming) in:
In followup, some people suggested using the testing of SC Test itself in the SC Test tutorial, but others felt that this could be confusing.

*** Ken Martin posted a summary of testing done in the open source
http://www.software-carpentry.com/lists/sc-discuss/Visualization Toolkit project,
and a link to a paper that describes their process:
The project coordinator replied in:
http://www.software-carpentry.com/lists/sc-discuss/msg00613.html n
Ken then replied to one point (about validating output) in:

Stephen Lee described the tools his group developed to support testing at Los Alamos National Laboratory in

Tripp Lilley brought up the idea of test markup in
The project coordinator pointed to an earlier discussion of traceability starting with

The project coordinator pointed out the VTK Quality Dashboard in

David Alex Lamb laid out some meta-requirements related to roles and features in


The winners in the first run of this competition all included excellent ideas; their submissions are listed below.

Name      Contact(s)
Apptest Linda Timberlake
TestTalk Chang Liu
Thomas Patrick Campbell-Preston
TotalQuality Alex Samuel
Mark Mitchell

Other material that participants should examine includes:

  • This paper on the testing process used in the Visualization Toolkit is a good summary of a state-of-the-art Open Source testing system.

  • The comp.software-testing FAQ.

  • The Software Testing Hotlist, edited by Bret Pettichord.

  • The EG3's testing page.

  • The Computer Information Center's testing page.

  • The Craft of Software Testing, by Brian Marick (who has helped re-formulate this category's requirements). Many of his other writings are collected at http://www.testing.com/.

  • Testing Computer Software, by Cem Kaner et al. is a good "from the trenches" look at testing; pages 27-58 are an excellent overview.

  • The Pragmatic Programmer, by Andrew Hunt and Dave Thomas (both Software Carpentry judges). The book's web site is http://www.pragmaticprogrammer.com/.

  • The Extreme Programming site keeps a list of implementations of the XUnit unit testing framework in various languages.

  • Existing tools include Expect, DejaGnu, and TET.

  • Chang Liu has developed a test suite for Apache, which may be a useful example to entrants in this category. The project is at http://www.sourcexchange.com/Project2-Summary.html.

  • Several existing Open Source tools may serve as inspiration in this category. In particular, entrants may also wish to look at Expect and DejaGnu, which can be used to run tests on interactive and non-interactive applications.

  • Sun's JavaStar is a GUI testing tool for Java.

  • Entrants are also strongly encouraged to check out the Testing Craft web site.

  • A quasi-free testing tool that was popular some years ago is TET. It was originally designed for Posix compatibility testing, and some of that shows. But it's gotten a fair amount of use.

  • Some data-driven approaches are documented on Bret Pettichord's page. The paper by Ed Kit describes Hans Buwalda's TestFrame, which is a mature approach and toolkit. Hans wrote a fairly good paper describing the approach, but it's not online. The paper by Bret (Success with Test Automation) also describes data-driven tests. These are mainly used for GUI tests, but nothing inherent limits them to that.

  • Fewster and Graham's Software Test Automation has a reasonable description of organizing test suites. It isn't fancy, but most people don't anything too fancy.


Some questions and suggestions from judges, entrants, and bystanders are listed below. Please note that these are not mandatory requirements, but are instead intended to spark ideas and discussion.

  1. How do users put tests for statically-compiled languages (such as Fortran-77 and C++), dynamically-loaded languages (such as Java), and fully-dynamic languages (such as Scheme and Python) in a single framework?

  2. How do developers keep tests up to date with the code that is being tested? In particular, how does the testing framework interact with the developers' version control system?

  3. How do users specify what a "correct" answer (e.g. within numerical roundoff tolerances, or within a certain degree of pixel accuracy for images)?

  4. How are summary statistics generated and reported? How is responsibility for fixes and breakages reported?

  5. How do test suites manage special features of particular languages, such as inheritance in object-oriented languages, or templates in languages such as C++?

Please send other suggestions and ideas to suggestions@software-carpentry.com.

   [Home]       [FAQ]       [License]       [Rules]       [Configure]       [Build]       [Test]       [Track]       [Resources]       [Archives]   

Powered by Zope

Zope management by SPVI

Last modified 2001/09/22 16:04:06.77067 GMT-6