Beyond Lisp, some new functional programming languages are starting to emerge after the ongoing research of the eighties
Dick Pountain
The idea of functional programming--namely, that mathematical specifications should be executed directly as computer programs--has been around since the dawn of modern computing. Lisp, the grandparent of functional languages, emerged soon after FORTRAN made its debut in the mid-1950s; in fact, FORTRAN pioneer John Backus devoted most of his subsequent career to studying functional programming systems.
What's kept FPLs (functional programming languages) out of the mainstream thus far has always been their desperately slow performance and memory greed when compared to
imperative languages such as Pascal and C. Only now, following a decade of crucial research breakthroughs, are we seeing the arrival of industrial-strength FPLs that can compete with C in both time and space efficiency.
What is it about FPLs that makes people persevere in this El Dorado quest? In a word, provability. Pure functional programs possess the mathematical property of referential transparency, which roughly means that the same expression always represents the same value. This transparency enables you to reason about program execution, and hence to mathematically prove a program's correctness. The possibility of writing provably correct programs (rather than spending your time picking out bugs) could revolutionize the economics of software production.
Imperative languages such as Pascal and C are not referentially transparent because they're based on destructive updating. For instance, when you execute an assignment statement such as x := 12, the current contents of x are destroyed and
replaced by 12. Subprograms that employ destructive updating have side effects that can alter the execution of other subprograms, and this destroys referential transparency.
The whole history of imperative languages so far has been a battle to gain control over these side effects, first via structured programming, then modular programming, and now via the encapsulated methods of object-oriented languages. On the plus side, destructive updating is very efficient on present-day computers whose CPU registers, RAM, and magnetic-disk storage hardware all work via destructive updating.
By contrast, the variables in a pure FPL program are like those used in algebra: They represent an initially unknown value that, once computed, doesn't change. In a pure FPL program, you can't change the value of a variable once it has become bound, and the only way to pass a value into a function is via its arguments. Execution of an FPL program starts with the evaluation of an initial expression and leads to a tree of
nested sub-expressions whose results depend only on the values of their function arguments.
What's more, the order in which sub-expressions are evaluated can't affect the final result, which means that functional programs are inherently suited to parallel execution. FPLs use recursion instead of looping to perform repetitive computations, and they work with dynamic on-the-fly data structures, such as lists, tuples, and trees, whose memory is allocated and disposed of automatically by the system. A functional programmer is never exposed to memory leaks or dangling pointers. FPLs are so expressive that programs tend to be an order of magnitude shorter than their imperative-language equivalents. Look, for example, at the elegant quicksort for lists expressed in the Miranda language in the listing below.
Implementing FPLs
The downside to FPLs is that their virtues can cause the inefficiency that has made them impractical for commercial use: The functional model of program execution is too far aw
ay from the reality of register-based computers. Recursion and dynamic data structures, which must be copied to be updated, can cause the time and space complexity of functional programs to explode compared to their imperative equivalents, which reuse resources. Historically, languages like Lisp have always compromised by adding destructive assignment and explicit looping, thus abandoning referential transparency.
FPL research throughout the 1980s concentrated on the basic theory of functional computation models, and the new understanding gained is now bearing fruit in a generation of new FPLs, such as Hope, Miranda, Haskell, Concurrent Clean, and Erlang, that combine efficient execution with referential transparency (see the text box ``The Erlang Language'' on page 184).
Pure FPLs can be thought of as deriving from Alonzo Church's lambda-calculus (introduced in 1932), which, when combined with Alan Turing's work, founded the modern theory of computers and computability. Lambda-calculus is a goo
d tool to use for investigating the semantics of FPLs, but it's not so good for implementing them because of tricky problems about variable renaming.
However, several FPLs have been based on a subclass of lambda-calculus called combinators.
Modern FPLs tend instead to be based on TRSes (term-rewriting systems) that use pattern matching to choose among a set of rules that define how each sub-expression will be rewritten, or
reduced, toward the result. Even if the order of reducing sub-expressions can't affect the value of the result, it can crucially affect time and space complexity, as well as whether the evaluation ever terminates.
The science of compiling FPLs hinges on this question of reduction strategy; two key issues that compiler designers must face are lazy versus strict evaluation (explained below) and which strategies are normalizing for a particular class of TRS (which roughly means that they will converge on a unique answer).
Recently, an extension of TRSes cal
led GRSes (graph-rewriting systems) have come to the fore; they represent programs internally as directed graphs (i.e., pointers) rather than terms. GRSes improve efficiency by sharing computations to avoid duplication of work; for instance, where a TRS might have to evaluate the same sub-expression twice, a GRS can redirect the graph to point to the result of a first evaluation.
Concurrent Clean
The Concurrent Clean language, developed at the University of Nijmegen in Holland, is a good example of the new style of efficient FPL. Clean is a pure, lazy, higher-order functional language that supports concurrent processes and distributed execution. A lazy implementation is one in which sub-expressions are reduced only if they are needed to reach the result; the opposite, a strict implementation (e.g., Lisp), always evaluates a function's arguments before reducing the function. Although strict evaluation can be more efficient, lazy evaluation adds greatly to expressive power by handling, for example, in
finite lists that would never terminate if evaluated strictly. Concurrent Clean permits selected arguments to be declared strict as an optimization.
Clean is implemented as a GRS, using a popular reduction strategy known as the Functional Strategy. Although it compiles to native machine code, internally the Clean compiler generates intermediate code for an abstract machine. This ABC machine (so called because it uses three stacks: A, B, and C) has a hybrid architecture, part of which is an idealized graph-rewriting engine with its own graph store and A stack, while the other part mimics a conventional computer that has a program counter and uses the B stack for operands, and the C stack for return addresses.
Concurrent Clean achieves speeds that are comparable to those attained by C by compiling wherever possible to this conventional part of the machine, which can be mapped into, say, Motorola 68020 code as efficiently as C. The compiler employs scores of subtle tricks to delay writing to the re
latively inefficient graph store to avoid building certain subgraphs and to pass arguments via the B stack rather than via graph nodes.
Clean is structured into separately compiled definition and implementation modules along Modula-2 lines. It's a strongly typed language that supports polymorphic, abstract, algebraic, and synonym types. The compiler infers the types of objects and uses type information to generate better code.
Clean's type system also features an enormously powerful new concept called polymorphic uniqueness types. To describe this concept in a nutshell, any argument can be declared to be of Unique type, which means it won't be shared by any other function application and can therefore be destructively updated safely without violating the pure functional properties of the program. If such an argument is not used within its own function body, it gets put in the garbage immediately.
This scheme allows Clean to implement records, arrays, and files, which are as time- and spac
e-efficient as those of imperative languages, to interface directly to C programs and, hence, to perform efficient windowed I/O via GUI operating systems such as the Mac's System 7 and the X Window System.
Free Unix and Mac versions of Concurrent Clean are now available from the University of Nijmegen via ftp (ftp.cs.kun.nl). DOS and OS/2 versions are promised for later this year.
A Functional Future
With the performance penalty of functional languages finally lifted, expect to gradually see more commercial use of these languages, such as Concurrent Clean and Erlang.
The functional paradigm is unlikely to displace C++ anytime soon, but as programmers become more aware that object orientation is not a perfect panacea, there should be room for both, or--dare I suggest it?--for some kind of hybrid approach.
A quicksort in the Miranda language
quick_sort:: [num] -> [num]
quick_sort[] = []
quick_sort(a:x) = quick_sort[b b
<
- x; b
<
= a]++
[a]++
quick_sort[b b
<
- x; b > a]
Dick Pountain is a BYTE contributing editor based in London. You can reach him on the Internet or BIX at
dickp@bix.com
.