Department of Computer Science
Queen Mary and Westfield College
University of London 1
The early sixties saw conventional labels ``got rid of'', and then reintroduced in a nuclear variant that was hoped to be so awesome that they would never be used again except for exceptions. But guilt-free remote control of dangerous instruments was already undermining the entente.
This note is a very personal view of, and from, the first half of that decade, with very close horizons. But there is a lesson to be drawn. In the phrase ``the meaning of rest of the program'', there has been the same snare that sometimes traps people into not recognizing when, and when not, it's safe to confuse a node of a DAG with its reachable sub-DAG.
startsection section1@-3.5ex plus -1ex minus -.2ex2.3ex plus .2ex J
J popped out onto the the third galleys of the 1965 CACM article ``A Correspondence between ALGOL 60 and Church's Lambda Notation'' . So much for refereeing. Indeed, scrutiny of that paper seemed concerned only with whether a space should separate ``algol'' and ``60''.
It was a neat but semantically inessential packaging of the program-closures that had been present in previous versions of that paper. J systemised three ``enhancements'', i.e., corruptions, of strict lambda (viewed operationally), that helped it to model goto's. It might be said that they were so powerful that they could hardly help not.
The three separable actions were: capturing a calculational context, called a ``dump''; attaching a dump to construct a doctored function, called a ``program-closure''; and doctoring an application of such a function so as to override the ``natural continuation''  of that application.
Before rushing to any conclusion it must be noted that this use of the word is scarcely significant in the light of subsequent developments. Such a borrowing from ordinary language can certainly bear several meanings.
A step of dump-capturing embodied (i.e., reified) a dump in the form
of a ``program-closure-constructor'' - a function that did the
attaching. An alternative and semantically equivalent embodiment is
yielded if the dump is attached, by an indivisible step, to the
Taking JI as the primitive rather than J, there is a loss of expressiveness about where forgetting occurs. With J, but not with JI, the context-overriding gave a means of expressing what Strachey, I think, referred to as ``early argument and link disposal'' for an outermost function-call, especially tail-recursions.
The above equality is enough to justify the earlier assertion
concerning the inessentialness of J.
It was the good fortune of continuations to be reaching the christening font around the same time as denotational semantics (of programming languages) was reaching birth. Like most given names the endowment was older than the endowed.
This was also the time when theorems were making forced entry. As an old friend of mine used to say: Show me a theorem and I'll reach for my gun. The grounds for such apparent bigotry are of course that if you want to slow down a creative development, then ``Hold on while we prove some theorems about it'' is as sure a way of pouring treacle over it as setting up a committee of experts, or an evaluation exercise. Early writing on programming is significantly short of proofs. (Let us be generous to the formalisers and accept their inadequate but correct plea that a theorem's nothing really, it's only a sentence.)
To the question posed above, namely: How much of the calculational context?, denotational semantics, and especially domain equations, provided a convenient fudge. You need not commit yourself about whether the ``o'' meant the overall outcome, or the next continuation.
In the above reduction of J to program-closures it is not, of course, significant whether each program-closure emerges from a definiens. The usual let-equivalence does hold, provided there are no top-level J's in the dot-dot-dot's. But the reduction displays the possible loss of control over the phasing of tail-function-economising. As implied above, the identity function does not leave room for much to be done after forgetting. Or, to put it the other way round, it delays the forgetting until there's no point.
Dumps and program-closures are data-items, with all the implied latency for unruly multiple use and other privileges of first-class-citizenship. The same current context can be captured at zero or more steps. The same (calculational occurrence of a) reified dump can be attached to functions and program-closures at zero or more (separate) steps The same (calculational occurrence of a) program-closure can be applied at zero or more (separate) steps.
The last two of these capabilities were drawn upon to model goto's. It is no surprise that with such a powerful tool, contexts (i.e., dumps) have tree-structure with dumps occurring as parts. In the context of sharing or self-referential-ness, this means graph-structure, perhaps cyclic. So these continuations had continuations. Indeed, that was their merit since it provided extra behavioural expressiveness that avoided the notorious pile-up of calculational right-hand brackets, with the concomitant delay of forgetting.
There were two kinds of pile-up threat, the end-of-the-program threat just referred to, and the end-of-the-procedure one. For each, and both, aversion depended on the ``linearity'' of the lambda-expressions concerned , and on the feature that a label was modeled in terms of a ``substantial'' program-closure, not a mere JI.
It is tautologous that if a continuation is the meaning of the rest of the program, and there are lots of them, and the program is single-threaded, then any notion that avoids the pile-up is not a notion of continuation.
The step of attaching is the most direct way of supplying to the function its ``continuation'' argument. Or rather, of supplying it with one that overrides the default ``natural'' one.
But this reveals the sense in which such ``continuations'' were no more, and no less, ``the meaning of the rest of the program'' (to borrow Reynolds'  unattributed quotation), than was a dump in its strict lambda, unreified, unnamed occurrence. A dump was in that sense a continuation whether or not it was anonymous.
The D of an SECD-state is a stack of SEC-triples. (The oldest D is empty.) In each SEC, the S and the C are so balanced that it needs one data-item to make it a performable deliverer of one data-item.
This stack, which is disjoint from the S, can be viewed as a list of functions whose composition delivers the final result. The list shrinks by one as each item is completed, and grows by one whenever, within any one item, a closure is encountered. When that is a program-closure the length of this stack jumps up, or down, or happens to stay the same. A claim can be made that any segmentation of this list constitutes a sequence of continuations.
Thus, in an alternative grouping of an SECD-state, it consists of the function-plus-argument about to be reduced, together with a two-level list of control-items.
However it would be incorrect to infer from this account that E's, or EC's, or SEC's, or SECD's occur exclusively in linear or stack-like configurations. Their incorporation into data-items leads to tree-, or DAG-, or (cyclic) graph-configurations.
It is difficult to imagine these fairly crisp assertions coming to light without the abstract machine that underlies them. They are further facilitated by the terminology of calculation-``steps'' connecting ``data-items''.
startsection section1@-3.5ex plus -1ex minus -.2ex2.3ex plus .2ex The Naur Provenance
Peter Naur, in his 1963 BIT article ``The design of the GIER ALGOL compiler, part I'' , had already introduced program-points as a calculational feature. Program-closures freed them from their dependence on stack-discipline, and hence provided an obvious compositional denotation for a label.
I quote: The treatment of jumps springs from the observation that the symbol ``goto'' in Algol 60 is redundant, and could be ignored by a processor .
So, in addition to the familiar lambda-dot syntax, there was a second binding operator, then written lambda-colon. It occurred only in its void-to-void form, i.e., argumentless and resultless, but otherwise did not differ syntactically. For any (operational) interpretation of a void-to-void lambda-dot expression there was also one for the corresponding lambda-colon expression.. The difference lay exactly in what followed performance of the lambda-body.
A (regular) closure application resumed the natural continuation that had been ``dumped'' at its commencement. By contrast, application of a program-closure resumed the dump current at the point of evaluation of the lambda-expression - function-definition point rather than function-application point.
There is one gory detail that deserves attention here only if, as I believe, the current orthodoxy of ``static analysis'' falls into the same trap that caught language-designers when they bent their languages in the light of performance-costs. Both exigencies compromise sensible choice of denotations.
ALGOL60 (sic) permitted jumping into the middles of conditional arms and other nested subphrases. Since each such phrase was modeled by a function, and the ``natural'' composition was expressed outside them, it became necessary to override the natural continuation by ensuring a terminal jump to a possibly added label. Approximately, this was overriding an override.
Thus the denotation of every imperative was parametrised by the imperative following it - either another imperative, or a label-call, or for a ``genuine'' natural continuation, an ``empty statement''.
So, by a curious fortuity, a model of Algol 60 that sought to avoid them, nevertheless introduced added labels. By contrast with van Wijngaarden's treatment , the technique used there for closed phrases was here resorted to for open phrases.
What is referred to above as the ``continuation'' argument can be supplied in several ways, some anonymous and some named. The ``natural'' default as expressed by the language-syntax can be overridden either by complicating the ontology (as with program closures), or by source-to-source tampering with the text, or by complicating the interpreter. This last was the one chosen by Strachey , who met the same problem in CPL , which was his current test-bed language-design project.
Algol 60's inelegant jumping into middles was a lucky provocation for making a separation between the ``what happens next'' parameter and the way it is supplied.
Continuation was a category confusion, like name. When a program-closure constructed within a (performance of a) lambda-body resumes the natural continuation of that lambda-expression, and then proceeds through one more level to the ``end of the program'', there are, in one of the senses current in 1964, three not one continuation. There is (sic) the steps up to leaving the lambda, the steps thereafter, and the concatenation of the two.
The paradox that continuations have continuations is resolved by observing that the attribute of being a continuation is contextual, i.e., positional, not innate. With the advent of domain-expressions, they became misleadingly used to elevate a positional feature of occurrences of data-items into a type of data-item. There are not certain data-items (be they program-closures, or regular closures, or functions, or whatever) that are continuations, but certain calculational occurrences of them. Recall the futility of dividing the world into names and named, or, for that matter, into uncles and nephews.
In particular a J'd function may occur as a continuation, and its J'd attachment occurs as its continuation. In the phrase ``the meaning of the rest of the program'', there is the same snare that traps people into not recognizing when, and when not, it's safe to confuse a node of a DAG with its reachable sub-DAG.
startsection section1@-3.5ex plus -1ex minus -.2ex2.3ex plus .2ex The McCarthy Provenance
When theory moves fast in computing it's sometimes backwards. No sooner was there ``label-free algol''  (later known as structured programming) than generalised labels happened.
But functional purity is less quick to give birth to instructions from its rib.
Pulling labels out of their textually defining positions, and replacing them by function-definitions, was described by McCarthy , in the case of flat flow-charts of imperatives. There the calculational right-hand brackets could pile up without affront.
Each of McCarthy's functions is the meaning of the rest of the program. Moreover there are phrases that denote it - the name, and the definiens.
McCarthy's reduction was a necessary point of departure for transforming Naur's program-points into definitions - textually ``pp'' for program-point, and semantically program-closures.
In 1964, the ACM office was a few minutes walk away from mine, and a nice lady on their editorial staff was helpfully allowing me repeated proof-copies to tune the printers' alignment of some deeply nested formulas.
The period between submission and publication gave time for more exercises in strict lambda programming. The name NPL, for New Programming Language currently designated a debate about language-design that led to IBM's PL1, and thus there came to be circulated a kite-flying program about thermodynamics that presented an irresistible challenge . By November 1964 this had focussed attention on two shortcomings. Some functional projections (i.e., specialisations) couldn't easily be accomodated hierarchically, and it was disappointing that flatness might beat nested definitions in achieving terseness and clarity. (Anticipations of OOPS?)
Out of this irritating untidiness of the real world there arose however one pattern that could be rescued, but only by resorting to the ominously growing list of deviations from strict lambda. (State, assignment, and sharing had already crept in, though only to two happily circumscribed situations, namely Y and ``delayed'' evaluation  (also, I think, Strachey's term), aka call-by-need.)
The other deviation was ``program-closures''.
Re-engineering a single program did not, of course, bump so hard against the limitations of strict lambda as did compiling a whole language of programs. With the above caveat concerning slight distortions needed to fit hierarchical shape, assignment was harmlessly eliminable when done by hand and mind.
One current programming technique for handling conditions (i.e., for handling raised exceptions) was by disjointly-unioned error indications, delivered level-by-level. But the violence this did to our thermodynamics program was an argument for hiding it in the interpretative mechanism.
Only upon realizing that the handling program could be located in the nesting structure of both performance and text, did a solution emerge.
The ominous list would not grow longer if, to the familiar idea of non-local jumps, was added the less familiar but pressing idea of jumps that had arguments and results and did some calculating in between. The relevance of JI was not, at the time, noticed.
If regular jumps resemble void-to-void procedures then conjecturally ``generalised'' jumps resemble non-void-to-non-void procedures. They had no place in modelling ALGOL60, but became exactly apt for exception-handling.
Aesthetically, they uncovered a pleasing orthogonality between the two deviations from strict lambda. You could, of course, have assignment without jumps, but now within the same framework there were, more dramatically, jumps without assignment.
startsection section1@-3.5ex plus -1ex minus -.2ex2.3ex plus .2ex The Return-link
So reductions were afoot in opposite directions. Goto's were being expressed as calls (without return) . Calls (with return) were being expressed as goto's and then, by reverse wind, eliminated in favour of calls (without return), and thus generalising McCarthy's invention to the non-flat case .
Concerning this there was a certain nostalgia among old programmers of stackless architecture who'd lived through the anguished cries about early fortran's failure to provide ``re-enterability'' (which was of course required merely for a nested performance such as arises from what can loosely be called self-application, and not exclusively for definitional self-referentialness). Shunting arguments, including return links, around had been a familiar chore.
Return-links were inevitably among the arguments of a closed subroutine. Their discovery was probably plural. For example, out of Cambridge (England) had come what IBM, I think, called the Wheeler jump.
The control-address-as-data-item became the stack-frame-as-data-item became the program-closure-as-data-item.
The idea of pulling an ordinary lambda-expression out from inside a program-closure, and the possibility of indicating their relationship uniformly by a name (``J''), should have come sooner. It was exactly another instance of the way Curry recommended looking for the lambda in every binding operator (such as differentiation and integration), and hoping for a higher-order operation to bridge the gap. Had Y emerged from LISP's ``LABEL'' it would have been a copy-book example of the phenomenon.
This occurred November 20 1964, and was surreptiously incorporated in the third galley-proofs, as a small simplification of the expressions for program-closure-construction in my next visit to the ACM office. A more careful account, with one substantial error appeared in . Defined as a feature interpreted by a state-transition-function, it exactly failed to express the early forgetting that was claimed for it. The error was corrected in . Another paper  was submitted later and appeared much later, without these late thoughts.
It is perhaps a tribute to the robustness of the concept that several scope errors, overlooked during that enjoyable bustle, are easily reparable. But a matching claim could be made for goto's!
startsection section1@-3.5ex plus -1ex minus -.2ex2.3ex plus .2ex Conclusion
Finally I would admonish anyone inventing a new kind of control concept to provide one, possibly optional, syntax in which its instances can be named, and can emerge as the denotations of the right-hand-sides of let-clauses. You don't have to tell anyone. And only then start writing your equations.