Closure (computer science)

From Wikipedia, the free encyclopedia

Jump to: navigation, search

In computer science, a closure is a function that is evaluated in an environment containing one or more bound variables. When called, the function can access these variables. The explicit use of closures is associated with functional programming and with languages such as ML and Lisp. Constructs such as objects in other languages can also be modeled with closures.

In some languages, a closure may occur when a function is defined within another function, and the inner function refers to local variables of the outer function. At runtime, when the outer function executes, a closure is formed, consisting of the inner function’s code and references to any variables of the outer function required by the closure.

A closure can be used to associate a function with a set of "private" variables, which persist over several invocations of the function. The scope of the variable encompasses only the closed-over function, so it cannot be accessed from other program code. However, the variable is of indefinite extent, so a value established in one invocation remains available in the next. As a consequence, closures can be used to hide state, and thus to implement object-oriented programming.

The concept of closures was developed in the 60’s and was first fully implemented as a language feature in the programming language Scheme. Since then, many languages have been designed to support closures.


Contents

[edit] Closures and first-class functions

Closures typically appear in languages in which functions are first-class values—in other words, such languages allow functions to be passed as arguments, returned from function calls, bound to variable names, etc., just like simpler types such as strings and integers. For example, consider the following Scheme function:

; Return a list of all books with at least THRESHOLD copies sold.
  (define (best-selling-books threshold)
    (filter 
      (lambda (book) (>= (book-sales book) threshold))
      book-list))

In this example, the lambda expression (lambda (book) (>= (book-sales book) threshold)) appears within the function best-selling-books. When the lambda expression is evaluated, Scheme creates a closure consisting of the code for the lambda and a reference to the threshold variable, which the lambda uses.

The closure is then passed to the filter function, which calls it repeatedly to determine which books are to be added to the result list and which are to be discarded. Because the closure itself has a reference to threshold, it can use that variable each time filter calls it. The function filter itself might be defined in a completely separate file.

Here is the same example rewritten in ECMAScript, another popular language with support for closures:

// Return a list of all books with at least 'threshold' copies sold.
  function bestSellingBooks(threshold) {
    return bookList.filter(
        function(book) { return book.sales >= threshold; }
      );
  }

The function keyword is used here instead of lambda, and an Array.filter method [1] instead of a global filter function, but otherwise the structure and the effect of the code is the same.

A function may create a closure and return it. The following example is a function that returns a function.

In Scheme:

; Return a function that approximates the derivative of f
  ; using an interval of dx, which should be appropriately small.
  (define (derivative f dx)
    (lambda (x) (/ (- (f (+ x dx)) (f x)) dx)))

In ECMAScript:

// Return a function that approximates the derivative of f
  // using an interval of dx, which should be appropriately small.
  function derivative(f, dx) {
    return function(x) {
      return (f(x + dx) - f(x)) / dx;
    };
  }

Because the closure in this case outlives the scope of the function that creates it, the variables f and dx live on after the function derivative returns. In languages without closures, the lifetime of a local variable coincides with the execution of the scope where that variable is declared. In languages with closures, variables must continue to exist as long as any existing closures have references to them. This is most commonly implemented using some form of garbage collection.

[edit] Uses of closures

Closures have many uses:

  • Designers of software libraries can allow users to customize behavior by passing closures as arguments to important functions. For example, a function that sorts values can accept a closure argument that compares the values to be sorted according to a user-defined criterion.
  • Because closures delay evaluation—i.e., they do not "do" anything until they are called—they can be used to define control structures. For example, all Smalltalk's standard control structures, including branches (if/then/else) and loops (while and for), are defined using objects whose methods accept closures. Users can easily define their own control structures as well.
  • Multiple functions can be produced which close over the same environment, enabling them to communicate privately by altering that environment.

In Scheme

(define foo #f)
(define bar #f)
 
(let ((secret-message "none"))
  (set! foo (lambda (msg) (set! secret-message msg)))
  (set! bar (lambda () secret-message)))
 
(display (bar)) ; prints "none"
(newline)
(foo "meet me by the docks at midnight")
(display (bar)) ; prints "meet me by the docks at midnight"
  • Closures can be used to implement object systems [2], and may in fact be better alternatives to objects [3].

Note: Some speakers call any data structure that binds a lexical environment a closure, but the term usually refers specifically to functions.

[edit] Differences in semantics

As different languages do not always have a common definition of the lexical environment, their definitions of closure may vary as well. The commonly held minimalist definition of the lexical environment defines it as a set of all bindings of variables in the scope, and that is also what closures in any language have to capture. It should be noted though that the meaning of a variable binding also differs. In imperative languages, variables bind to locations in memory, which can in turn store values. The binding can never change, but the value in the bound location can. In such languages, since closure captures the binding, any operation on the variable, whether done from the closure or not, is performed on the same memory location. Here is an example illustrating the concept in ECMAScript, which is one such language:

var f, g;
 function foo()
 {
   var x = 0;
   f = function() { return ++x; };
   g = function() { return --x; };
   x = 1;
   print(f()); // "2"
 }
 foo();
 print(g()); // "1"
 print(f()); // "2"

Note how function foo, as well as both closures, use the same memory location bound to local variable x together.

On the other hand, many functional languages, such as ML, bind variables directly to values. In this case, since there is no way to change the value of the variable once it is bound, there is no need to share the state between closures - they just use the same values.

Yet another subset, lazy functional languages such as Haskell, bind variables to a result of a computation in the future. Consider this example in Haskell:

foo x y = let r = x / y
           in (\z -> z + r)
 f = foo 1 0
 main = do putStr (show (f 123))

The binding of r captured by the closure defined within function foo is to the computation (x / y) - which in this case results in division by zero. However, since it is the computation that is captured, and not the value, the error only manifests itself when the closure is invoked, and actually attempts to use the captured binding.

Yet more differences manifest themselves as other lexically scoped constructs, such as return, break and continue statements in C-like languages. Such constructs can in general be considered to be accessible lexical bindings to an escape continuation (in case of break and continue, such interpretation requires looping constructs to be considered in terms of recursive function calls). In some languages, such as ECMAScript, this binding is implicitly reintroduced in every closure, thus hiding the lexical binding from the enclosing scope - thus, a return statement from within the closure transfers control to the code which called it. In Smalltalk, however, only top-level definitions introduce such binding, and closures capture it rather than rebinding it. The following examples in ECMAScript and Smalltalk highlight the difference:

"Smalltalk"
  foo
    | xs |
    xs := #(1 2 3 4).
    xs do: [:x | ^x].
    ^0
  bar
    Transcript show: (self foo) "prints 1"
// ECMAScript
  function foo() {
    var xs = new Array(1, 2, 3, 4);
    xs.forEach(function(x) { return x; });
    return 0;
  }
  print(foo()); // prints 0

Both code snippets seem to do the same thing at the first glance, bearing in mind that ^ in Smalltalk is analogous to ECMAScript return. The difference is that in ECMAScript example, return will leave the closure, but not function foo, whereas in Smalltalk example, ^ will return from method foo, and not just from the closure. The latter allows more expressiveness - in Smalltalk, do: is defined as a plain method, and there is nothing special about it, while ECMAScript has to introduce a special language construct, foreach, to express the same thing. On the other hand, there is an issue of captured continuation outliving the scope in which it was created. Consider:

foo
    ^[ x: | ^x ]
  bar
    | f |
    f := self foo.
    f value: 123 "error!"

When block returned by method foo is invoked, it attempts to return a value from foo. Since the call to foo has already completed, this operation results in an error.

Some languages, such as Ruby, allow the programmer to choose the way he wants return to be captured. An example in Ruby:

def foo
    f = Proc.new { return "return from foo from inside proc" }
    f.call # control leaves foo here
    return "return from foo"
  end
 
  def bar
    f = lambda { return "return from lambda" }
    f.call # control does not leave bar here
    return "return from bar"
  end
 
  puts foo # prints "return from foo from inside proc"
  puts bar # prints "return from bar"

Both Proc.new and lambda in this example are ways to create a closure, but semantics of the closures thus created are different with respect to the return statement.

In Scheme, definition and scope of the return control statement is explicit (and only arbitrarily named 'return' for the sake of the example). The following is a direct translation of the Ruby sample.

(define call/cc call-with-current-continuation)
 
(define (foo)
  (call/cc 
   (lambda (return)
     (define (f) (return "return from foo from inside proc"))
     (f) ; control leaves foo here
     (return "return from foo"))))
 
(define (bar)
  (call/cc
   (lambda (return)
     (define (f) (call/cc (lambda (return) (return "return from lambda"))))
     (f) ; control does not leave bar here
     (return "return from bar"))))
 
(display (foo)) ; prints "return from foo from inside proc"
(newline)
(display (bar)) ; prints "return from bar"

[edit] Implementation and theory

Closures are typically implemented with a special data structure that contains a pointer to the function code, plus a representation of the function's lexical environment (e.g., the set of available variables and their values) at the time when the closure was created.

A language implementation cannot easily support full closures if its run-time memory model allocates all local variables on a linear stack. In such languages, a function's local variables are deallocated when the function returns. However, a closure requires that the free variables it references survive the enclosing function's execution. Therefore those variables must be allocated so that they persist until no longer needed. This explains why typically languages that natively support closures use garbage collection.

A typical modern Scheme implementation allocates local variables that might be used by closures dynamically and stores all other local variables on the stack.

Closures are closely related to Actors in the Actor model of concurrent computation where the values in the function's lexical environment are called acquaintances. An important issue for closures in concurrent programming languages is whether the variables in a closure can be updated and if so how these updates can be synchronized. Actors provide one solution (Will Clinger 1981).

[edit] See also

[edit] References

[edit] External links

Personal tools