Y combinator

August 6, 2023

tags: #plt, #lisp, #lambda calculus, #learning in public


Motivation

Can you implement recursion if a function isn’t allowed to reference itself? A function might not be able to reference itself for a number of reasons: it could be an anonymous function, a combinator1, or perhaps as a purely intellectual exercises you’d like to implement recursion without assuming its existence.

Consider a classic example of recursion:

(define (factorial n)
  (if (< n 2)
      1
      (* n (factorial (- n 1)))))

The body of factorial references itself on the last line, and it can only do that because the function has a name. Regardless of programming language, you’ll find this typical of most recursive functions: they have a name, and that name is used in the body of the function.

Which brings us back to the original question: can you implement recursion if functions aren’t allowed to reference themselves (either because they’re anonymous, non-recursive, or combinators)? 2

Explanation

The answer is yes and the solution is called the Y combinator.

Here’s the Y combinator.

Y = λf.(λx.f (x x)) (λx.f (x x))

In untyped lambda calculus, there are only three notations: 1) variables like x, f, or y, 2) function definitions that use λ to denote the start of the function and . to separate the arguments (on the left) from the body (on the right), and 3) function invocation where you write a function next to its arguments (e.g. f x y).

The Y combinator is a function (as denoted by λ), that takes a single argument f. The body of the function is the two repeating (λx.f (x x)) (λx.f (x x)). It’s not recursive since it doesn’t reference itself in its body.

What you’ll notice is that the body is itself another function and its corresponding argument. The first instance of (λx.f (x x)) is the function and the second instance of (λx.f (x x)) is the argument passed to the first instance. In other words, the second (λx.f (x x)) becomes the x’s in the function body f (x x).

So if you were to substitute the second (λx.f (x x)) into the first (λx.f (x x)), you would get

Y = λf.(f ((λx.f (x x)) (λx.f (x x))))

The only difference is the extra f. As a result, you could again pass the second (λx.f (x x)) as an argument into the first (λx.f (x x)). Giving you

Y = λf.(f (f ((λx.f (x x)) (λx.f (x x)))))

which adds another f. And of course, you can do this as many times as you want, giving you

Y = λf.(f (f (f (f (f (f ... ((λx.f (x x)) (λx.f (x x)))))...))))

In other words, the Y combinator takes a function f and returns another function that looks like f “applied” to itself an infinite number of times. In other words, Y gives us something that looks a lot like recursion despite not being recursive itself.

Note: This infinite “recursion” of f’s is still itself a function. We’re not yet in the territory of working with primitive types like Int for a concrete example like factorial :: Int -> Int. All we’ve done is shown that Y is inherently “recursive” despite not referencing itself.

Looking at this definition of Y might raise a couple question.

Q: Can we even define Y without it leading to a stack overflow?

This is an implementation detail and involves some form of lazy-evaluation, where the first f only calls the second f if needed, and the second f only calls the third f if needed, and so on and so forth. In other words, it doesn’t happen all at once.

Q: Aren’t recursive functions only useful if they can stop at some point and return a value? How would this function know when to stop?

The implementation of f (not the implementation of Y) contains the “knowledge” of when to stop and return a value. We can illustrate this with an example. Suppose we wanted to use Y to implement factorial. What function would we use as input into Y to give us factorial? Suppose we defined the following function:

; Credit to Mike Vanier for the tutorial
; Source: https://mvanier.livejournal.com/2897.html

(define (almost-factorial f)
  (lambda (n)
    (if (= n 0)
        1
        (* n (f (- n 1))))))

Note that this function is also not recursive despite looking pretty close to the implementation of factorial we had above. Instead of referencing itself, it takes as a parameter a function and can either halt (if n equals 0) or invoke the parameter.

If we passed this function almost-factorial to Y, Y would expand it out and give us something like (almost-factorial (almost-factorial (almost-factorial ...))), which is exactly the factorial function we want.

If it isn’t clear why applying almost-factorial to itself an infinite number of times would give us exactly the factorial function we want, first notice that almost-factorial by itself works if n is 0. For example, (almost-factorial 0) would give us the correct answer. However, for n = 1 or higher, it would try and invoke f which doesn’t exist, and thus the function would fail.

On the other hand,(almost-factorial almost-factorial) would work for both n = 0 and n = 1, but not for n = 2 or higher. Continuing one more time, (almost-factorial (almost-factorial almost-factorial)) would work for n <= 2. By induction, using Y to expand almost-factorial as many times as necessary, we’ll have a functioning factorial implementation.

Summary

Step 1: We start with a recursive function.

(define (factorial)
  (lambda (n)
    (if (= n 0)
        1
        (* n (factorial (- n 1))))))

Step 2: We then remove the self-referential call and instead pass it in as a parameter.

; Credit to Mike Vanier for the tutorial
; Source: https://mvanier.livejournal.com/2897.html

(define (almost-factorial f)
  (lambda (n)
    (if (= n 0)
        1
        (* n (f (- n 1))))))

We know the above function would work as intended if we passed it to itself as an argument.

Step 3: The Y combinator takes care of passing a function to itself as many times as necessary.

; Credit to Mike Vanier for the tutorial
; Source: https://mvanier.livejournal.com/2897.html

(define Y 
  (lambda (f)
    ((lambda (x) (f (lambda (y) ((x x) y))))
     (lambda (x) (f (lambda (y) ((x x) y)))))))

This gives us recursion as intended.

Further Reading

footnotes

  1. Functions that don’t have access to free variables (i.e. variables that aren’t parameters or that aren’t defined in the body of the function) are call combinators. For example, the definition of factorial above is not a combinator, since the reference to factorial in the last line is a reference to a free variable.

  2. Anonymous functions by themselves cannot be recursive since they don’t have a name to self-reference. For a similar reason, combinators cannot by themselves be recursive. For the purposes of this discussion, anonymous functions, non-recursive functions, and combinators are the same.