The Structure & Interpretation of Computer Programs

Scheme Programming Tips

Formatting Conventions

At this point, you have seen the most basic core of the Scheme language -- its syntax, evaluation rules, and some special forms. From a purely logistical perspective, that's plenty to get you started writing your own programs. So why, you might wonder, should you care about issues like formatting your code?

The main reason is, ironically, the syntax of the language. Because Scheme has such a simple syntax, it would be very easy to write programs that are correct, but completely unreadable by humans. If you don't believe me, here is a Scheme program to compute the nth Fibonacci number, with all the formatting removed:

    (lambda(n)(let l((p 0)(c 1)(n n))(cond((< n 1)c)(else(l c (+ p
    c)(- n 1))))))

Here is the same thing with nice neat indentation, descriptive variable names, and comments:

    ;; Compute the nth Fibonacci number
    (lambda (n)
      (let loop ((previous 0) (current 1) (remaining n))
        (cond ((< remaining 1)
	       (loop current 
	             (+ previous current)
		     (- remaining 1))))))

The computer does not care how you type your code; but when you are testing and debugging your own programs, you yourself very much will care. (If you don't believe that, take some of the examples which follow, and remove all the line-breaks and tabs. Then try to figure out which expressions go with which operators).

Fortunately, the built-in editor in DrSwindle is very smart about these issues, and will help you out by balancing your parentheses and providing reasonable indentation. But, if you are working in some other text editor, it is good to keep a few simple principles in mind. None of this is crucial to your understanding of the Scheme language, but these rules of thumb can help make your life easier.

Line Up Arguments Vertically

When you are writing a combination that represents a function call, you frequently will not have room to put all the arguments on the same line, especially if those arguments are themselves complex expressions. It is a good idea to line up all the arguments in a column, if you have to put them on separate lines:

      (/ (factorial n)
         (* (factorial r)
            (factorial (- n r))))
         ^  ^----- arguments of '*'
         |-------- arguments of '/'
Indent to Indicate Nesting

When you have a sequence of one or more expressions nested inside a special form such as if or lambda, indent those expressions a couple spaces to indicate that their relationship to the special form:

      (lambda (a b c)
        (display "This is the body of the 'lambda'")

        (let ((b^2 (* b b)))
          (display "This is the body of the 'let'")
          (/ (+ b^2 (sqrt (* 4 a c)))
             (* 2 a))))
        ^ ^------------- body of 'let'
        |--------------- body of 'lambda'
Don't Line Up Closing Parentheses

In C and related programming languages, it is customary to line up closing brackets, to indicate nesting structure, e.g.:

      if(a >= b) {
         if(a >= c) {
	    if(a >= d) {
               return a;
            }          /* closing bracket for inner 'if'  */
	 }             /* closing bracket for middle 'if' */
      }                /* closing bracket for outer 'if'  */

In Scheme programs, it is recommended that you not use this convention. The reason is, because everything is fully parenthesized, you will wind up wasting a lot of space, and it becomes harder to read. If you indent your code carefully, the structure will be obvious anyway. The recommended style is:

      (if (>= a b)
          (if (>= a c)
              (if (>= a d)
                  a)))  ; all closing braces here

      (if (>= a b)
          (if (>= a c)
	      (if (>= a d)
	      ) ; yuck!
	  ) ; yuck!
      ) ; yuck!

By stacking all your closing parentheses on one line, in an editor which balances your parentheses, you can just keep typing close parenthesis until it shows you that your whole expression has been closed.

Use Block Comments

The semicolon (;) is used to introduce comments in a Scheme program. Everything from the semicolon to the end of the line is considered a comment, and is ignored by the Scheme interpreter. As in all other programming languages, it is a good idea to comment your Scheme code.

In general, avoid commenting each individual line of your program -- if someone wants to know what your program does in that much detail, they will read the code itself. Nevertheless, it's worthwhile to put a comment at the top of each function you define, describing how you call it, and what it's supposed to do. For instance:

      ;; (solve n) - computes the solution to Merkwürdig's Equation
      ;; for an integer value n.  Assumes n > 0.
      (define solve
        (lambda ((n <integer>))

It is also good to put comments at the top of each source file, describing what the file contains, who wrote it, any general requirements, and so forth. For example:

      ;;; mycode.scm
      ;;; by Jonathan Q. Public
      ;;; This file contains code to solve systems of low-order complex 
      ;;; Merkwürdig and Schreklich equations in Euclidean space.
      ;;; You need to have a copy of Bob Jones's freeware Frobulator 
      ;;; library in order to use these routines.

Note that the number of semicolons you use doesn't matter; that's merely a point of aesthetics. Only one is necessary, to indicate the comment. One typical style is to use three semicolons for block comments at the top of the file, two for comments describing a function, and one for comments in the middle of the code. Feel free to adopt your own style; in many ways, consistency is more important than form, in this matter.

Comments can be put on the same line as other code, but keep them short and sweet. Block comments outside the body of the code are much easier to deal with.

Use Reasonable Line Lengths

Avoid using line lengths greater than 80 characters. In fact, if you plan to e-mail code, you would do well to limit your lines to 72 characters. This helps other people when they try to read your code on a screen which may well be narrower than yours, and more importantly, makes it so that printouts of your code don't run off the edge of the page.

Other General Suggestions
  • Scheme source files should be given the extension ".scm" to the end of their names. Apart from Windows, this is not required by most operating systems, but it makes it easier for you to quickly identify Scheme source files.

  • Include your name in all source files you write. It's also helpful if you include an indication of what assignment the code is for, and if you have solved multiple problems in one file, which code goes with what problem.

  • Make backups of your working code. If you have something that works, and you want to go back and make some changes or experiment, make a copy of the file, and make your changes to the copy. That way, you can always go back to what you had working before, in case your new stuff doesn't work out.

Programming Efficiency

There are several issues of programming style beyond simple text formatting, that can make your life easier as a programmer. These are not limited to Scheme, although they are presented here using Scheme examples.

Write Short Functions

Write short functions, each of which performs a single, well-defined operation. Small functions are easier to read, write, test, debug, and understand. Small, well-focused functions are also easier to re-use, which makes your future programs easier to construct, also.

Use Descriptive Names

The names of variables should give a clear idea of what the variable represents, and the names of functions should clearly indicate what they do. If the purpose isn't clear from the name, then include a short comment documenting what the variable or function is for.

In Scheme, there are a few unofficial naming conventions which make life a bit easier:

  • Predicates (functions which answer only true or false) are typically given names ending in a question mark (?). Examples include zero?, null?, number?, and odd?. Some other dialects of Lisp, such as Common Lisp, use "p" for this purpose, instead, e.g. zerop, numberp, etc.

  • Functions which destructively modify the contents of a storage location (assignment) are typically given names ending in an exclamation mark (!). Examples include set!, set-car!, set-cdr!, and sort!.

  • Functions that construct a new instance of some data type begin with make-. Examples include make-string and make-vector.

  • Global variables are given names that begin and end with asterisks (*), to keep them from interfering with locally bound variables.

In addition, we will generally give variables used to define classes names surrounded in angle brackets, e.g. <integer>, <string>, etc.

In general, if you have a group of functions or variables with related meaning, consider using some kind of conventional naming strategy. For instance, all of the classes in the DrScheme user interface library are given names ending in a percent sign (%). Conventions of this kind go a long way to make your code understandable to other humans.

Avoid Nesting 'if' and 'let'

Rather than nesting multiple if expressions, consider using the cond special form. Not only is it easier to read (and less indentation work for you), but cond allows multiple expressions in each clause, without having to use begin explicitly.

Similarly, if you have a bunch of local variables, each of which refers to the previous ones, use let* instead of nesting multiple let expressions:

        Bad:                            Good:
	(if (zero? n)                   (cond ((zero? n) 0)
	    0                                 ((= n 1) 1)
	    (if (= n 1)                       (else
	        1                               (* n (fact (- n 1)))))
		(* n (fact (- n 1)))))

        (let ((x (car foo)))            (let* ((x (car foo))
	  (let ((y (car x)))                   (y (car x))
	    (let ((z (+ y 15)))                (z (+ y 15)))
	      (cons z 6))))               (cons z 6)) 
Avoid Global Variables

A common inclination for programmers accustomed to C, Pascal, or Fortran, is to use global variables to pass information around in a program. While this is possible, it is much less necessary in a functional language such as Scheme, which has nested scopes and proper closures. Some reasonable uses of global variables include:

  • To provide hooks into the mechanisms of the program

  • Parameters which, when changed, represent a major change in the behaviour of the program (e.g., default values)

  • To pass information between programs

  • To provide customizable defaults

In general, a global variable is a reasonable thing to use when function A needs to affect function B, but function A does not call function B. Otherwise, they should be avoided, since they can make your program very confusing.

Avoid Assignment

Even functional languages such as Lisp and Scheme provide assignment (e.g. set!), for reasons of efficiency. However, do not be tempted to use assignment as you would in a C or Java program. And, if you do choose to use assignment, try to limit assignment to locally bound variables.

Write Modular Code

Code is modular if it is separated into components which can be individually used and tested in isolation. Actually, modularity isn't so much an all or nothing proposition, but rather a matter of degree. The fewer interdependencies the pieces of your program have, the easier your task will be.

The main advantage of modularity is that it lets you test out each piece of your program on its own, before you put the whole thing together and try to make it run. And, after you have put together the pieces, modularity makes it easier for you to find and eliminate bugs which inevitably crop up.

Another big advantage of modularity is that it makes your code re-usable. It is much easier to re-use code you have already written and tested than to write the same thing from scratch everytime. For small programs, you might not care, but as you write more and more code, you will find that getting into this discipline is incredibly useful.

Test Your Code

How do you know your code works? One way might be to construct a mathematical proof that it is correct. Unless you do that, however, your only alternative is to try the program out on some known input values, and make sure it does what it is supposed to. That is the key idea behind code testing.

Each test generally consists of a set of input values, and the corresponding (expected) output values. To run the test, you invoke your code on the input values, and make sure the actual output values match the expected output values. If they do not match, you know there is a problem with your code. If your tests are chosen carefully, you may be able to gain some information about the nature of the error, by looking at how the actual output values differ from the expected output values.

Broadly speaking, these are some good criteria for choosing test cases:

  • Choose test inputs that will cause all the program code to be executed. For instance, if your program chooses between two alternatives, make sure you choose inputs that can drive each alternative.

  • Choose tests that are easily verified. You should be able to construct the expected output values for your tests easily, and verify that they are correct (the last thing you want to have to do is waste time debugging your tests as well as your code!)

  • Choose some inputs outside the intended set. For instance, if your code is supposed to read a string of digits from the user, include a test case that gives it a string of letters and punctuation, to make sure your code can deal with it.

  • Test boundary cases. These are the values that lie at the extremes of the input data range, or values lying between two parts of the input data range where the function behaves differently. Thus when testing an absolute value function on integers, test -1, 0, and 1. The empty string and empty list are usually boundary cases for string and list processing programs.

  • Avoid redundant cases. If you have a case that tests a particular behaviour of your code, don't include another case for the same behaviour. The smaller your testing set, the easier your job is.

If you establish a good set of tests, then each time you make a change to your code, you can run a regression test, making sure all your old tests still work after the change was made. This helps to keep new bugs from creeping in unnoticed.

Edsger Dijkstra, a well-known computer scientist whose research includes techniques for proving the correctness of computer programs, is quoted as having said that "testing can only reveal the presence of bugs, not prove their absence." However, well-chosen test cases are still a valuable tool in any programmer's arsenal.

CS 18 Home Page

Department of Computer Science, Dartmouth College, Hanover, New Hampshire, USA