Binary lambda calculus

From Esolang

Jump to: navigation, search

Binary lambda calculus (BLC) is a version of lambda calculus with provisions for binary I/O, a standard binary encoding of lambda terms, and a designated universal machine.

NOTE: The Wikipedia version of this page is properly formatted and more up to date.

Contents

[edit] Background

BLC is designed to provide a very simple and elegant concrete definition of descriptional complexity (Kolmogorov complexity).

Roughly speaking, the complexity of an object is the length of its shortest description.

To make this precise, we take descriptions to be bitstrings, and identify a description method with some computational device, or machine, that transforms descriptions into objects. Objects are usually also just bitstrings, but can have additional structure as well, e.g., pairs of strings.

Originally, Turing machines were used for this purpose, being perhaps the most well known formalism for computation. But they are somewhat lacking in ease of construction and composability. Another classical computational formalism, the lambda calculus, offers distinct advantages in ease of use. The lambda calculus doesn't incorporate any notion of I/O though, so one needs to be designed.

[edit] Binary I/O

Adding a readbit primitive function to lambda calculus, as Chaitin did for LISP, would destroy its referential transparency, unless one distinguishes between an I/O action and its result, as Haskell does with its monadic I/O. But that requires a type system, an unnecessary complication.

Instead, BLC requires translating bitstrings into lambda terms, to which the machine (itself a lambda term) can be readily applied.

Bits 0 and 1 are translated into the standard lambda booleans B0 = True and B1 = False:

True = λx λy . x
False = λx λy . y

which can be seen to directly implement the if-then-else operator.

Now consider the pairing function

〈,〉 = λx λy λz . z x y

applied to two terms M and N

M, N 〉 = λz . z M N.

Applying this to a boolean yields the desired component of choice.

A string s = b0b1bn−1 is represented by repeated pairing as

〈B0, 〈B1 … 〈Bn−1, z 〉 … 〉〉 which is denoted as s : z.

The z appearing in the above expression requires some further explanation.

[edit] Delimited versus undelimited

Descriptional complexity actually comes in two distinct flavors, depending on whether the input is considered to be delimited.

Knowing the end of your input makes it easier to describe objects. For instance, you can just copy the whole input to output. This flavor is call plain or simple complexity.

But in a sense it is additional information. A file system for instance needs to separately store the length of files. The C language, uses the null character to denote the end of a string, but this comes at the cost of not having that character available within strings.

The other flavor is called prefix complexity, named after prefix codes, where the machine needs to figure out from the input read so far, whether it needs to read more bits. We say that the input is self-delimiting. This works better for communication channels, since one can send multiple descriptions, one after the other, and still tell them apart.

In the I/O model of BLC, the flavor is dictated by the choice of z. If we keep it as a free variable, and demand that it appears as part of the output, then the machine must be working in a self-delimiting manner. If on the other hand we use a lambda term specifically designed to be easy to distinguish from any pairing, then the input becomes delimited. BLC chooses False for this purpose but gives it the more descriptive alternative name of Nil. Dealing with lists that may be Nil is straightforward: since

x, yM N = M x y N, and
Nil M N = N

one can write function M and N to deal with the two cases, the only caveat being that N will be passed to M as its third argument.

[edit] Universality

Wouldn't it be nice if a single description method is as good as or better than any other method? Fortunately, such methods exist!

That is, we can find a description method U such that for any other description method D, there is a constant c (depending only on D) such that no object takes more than c extra bits to describe with method U than with method D. And BLC is designed to make these constants relatively small. In fact the constant will be the length of a binary encoding of a D-interpreter written in BLC, and U will be a lambda term that parses this encoding and runs this decoded interpreter on the rest of the input. U won't even have to know whether the description are delimited or not; it works the same either way.

Having already solved the problem of translating bitstring into lambda calculus, we now face the opposite problem: how to encode lambda terms into bitstrings?

[edit] Lambda encoding

First we need to write our lambda terms in a particular notation using what is known as De Bruijn indices. Our encoding is then defined recursively as follows

<math>\widehat{\lambda M}= 00\widehat{M}</math>
<math>\widehat{M\ N}=01\widehat{M}\widehat{N}</math>
<math>\widehat{i}=1^i0</math>

For instance, the pairing function is written <math>\lambda \lambda \lambda. 1 3 2</math> in De Bruijn format, which has encoding <math>00\ 00\ 00\ 01\ 01\ 10\ 1110\ 110</math>.

A closed lambda term is one in which all variables are bound, i.e. without any free variables. In De Bruijn format, this means that an index i can only appear within at least i nested lambdas. The number of closed terms of size n bits is given by sequence A114852 of the On-Line Encyclopedia of Integer Sequences.

The shortest possible closed term is the identity function <math>\widehat{\lambda 1} = 0010</math>. In delimited mode, this machine just copies its input to its output. cat in half a byte!

So, what is this universal machine U?

Here it is, in De Bruijn format (all indices are single digit):

<math>(\lambda 1 1) (\lambda (\lambda \lambda \lambda 1 (\lambda \lambda 1 (\lambda 3 (6 (\lambda 2 (6 (\lambda \lambda 3 (\lambda 1 2 3))) (7 (\lambda 7 (\lambda 3 1 (2 1)))))) (1 (5 (\lambda 1 2)) (\lambda 7 (\lambda 7 (\lambda 2 (1 4))) 3))))) (1 1)) (\lambda 1 ((\lambda 1 1) (\lambda 1 1)))</math>

and in all its binary glory:

01010001101000010000000110000001100001011110011111110000101
11001111111000000111100001011011011100111111110000111111110
00010111101001110100101100111111000011011000010111111110000
11111111000011100110111101110011010000110010001101000011010

[edit] Complexity, concretely

In general, we can make complexity of an object conditional on several other objects that are provided as additional argument to the universal machine. Plain (or simple) complexity KS and prefix complexity KP are defined by

<math>KS(x|y_1,\ldots,y_{k}) = \min \{ \ell(p)\ |\ U\ (p:Nil)\ y_1\ \ldots\ y_{k} = x \}</math>
<math>KP(x|y_1,\ldots,y_{k}) = \min \{ \ell(p)\ |\ U\ (p:z)\ y_1\ \ldots\ y_{k} = \langle x,z \rangle \}</math>

[edit] Theorems, concretely

The identity program <math>\lambda 1</math> proves that

<math>KS(x) \leq \ell(x) + 4</math>

The program <math>\lambda \lambda (\lambda 1 1) (\lambda (\lambda \lambda \lambda 1 (\lambda \lambda \lambda \lambda 1 (\lambda 8 (\lambda 8 (\lambda 1 3 2)) ((\lambda 1 1) (\lambda (\lambda \lambda 1 (\lambda \lambda 2 (1 (\lambda \lambda \lambda \lambda 1 (\lambda \lambda 1) (8 5)) 1) (\lambda 1 (\lambda \lambda 2) 2))) (1 1)) 6))) (2 1)) (1 1)) (\lambda \lambda \lambda 1 3 2) 1 2</math> proves that

<math>KP(x|\ell(x)) \leq \ell(x)+239</math>

The program <math>(\lambda 1 1) (\lambda \lambda \lambda 1 (\lambda 1 (3 (\lambda \lambda 1)) (4 4 ((\lambda 1 1) (\lambda (\lambda \lambda \lambda 1 (\lambda \lambda \lambda \lambda 1 (\lambda 8 (\lambda 8 (\lambda 1 3 2)) ((\lambda 1 1) (\lambda (\lambda \lambda 1 (\lambda \lambda 2 (1 (\lambda \lambda \lambda \lambda 1 (\lambda \lambda 1) (8 5)) 1)</math> <math> (\lambda 1 (\lambda \lambda 2) 2))) (1 1)) 6))) (2 1)) (1 1)) (\lambda 4 ((\lambda 1 1) (\lambda \lambda \lambda 2 (\lambda \lambda \lambda 4 (\lambda \lambda 5 1 2) (3 (\lambda 1) (6 6) 2)) (1 (\lambda \lambda 2) 2)) 1)))))) (\lambda \lambda \lambda 1 3 2)</math> proves that

<math>KP(x) \leq \ell(\overline{x})+401</math>

where <math>\overline{x}</math> is a self-delimiting code for x defined by

<math>\begin{array}{ll}

\overline{0} & = 0 \\ \overline{n+1} & = 1\ \overline{\ell(n)} \\ \end{array}</math> in which we identify numbers and bitstrings according to lexicographic-in-reverse order. This code has the nice property that for all k,

<math>\ell(\overline{n}) \leq \ell(n)+\ell(\ell(n))+\cdots+ \ell^{k-1}(n) + O(\ell^k(n))</math>
Number String Delimited
0 0
1 0 10
2 1 110 0
3 00 110 1
4 10 1110 0 00
5 01 1110 0 10
6 11 1110 0 01
7 000 1110 0 11
8 100 1110 1 000
9 010 1110 1 100

[edit] Symmetry of information

The program

<math>(\lambda (\lambda (\lambda \lambda (\lambda 1 2 (\lambda \lambda 3 1 (5 4) (\lambda \lambda \lambda 1 (\lambda 1 6 4) 2))) ((\lambda 1 1) (\lambda (\lambda \lambda \lambda 1 (\lambda \lambda 1 (\lambda 3 (6 (\lambda 2 (6 (\lambda \lambda 3 (\lambda 1 2 3))) (7 (\lambda 7 (\lambda 3 1 (2 1)))))) (1 (5 (\lambda 1 2))</math> <math>(\lambda 7 (\lambda 7 (\lambda 2 (1 4))) 3))))) (1 1)) (\lambda 1 1))) ((\lambda 1 1) (\lambda (\lambda \lambda \lambda 1 (\lambda \lambda 1 (\lambda (\lambda 4 (7 (\lambda \lambda 4 (8 (\lambda 4 (3 1)) (\lambda \lambda \lambda 3 4)) (9 (\lambda \lambda 10 (\lambda 6 (5 (3 1))) (\lambda \lambda \lambda 2 6 4)))))</math> <math>(\lambda (\lambda 1 1) (\lambda \lambda \lambda 1 (\lambda 1 (3 (\lambda 1 (\lambda 1))) (4 4 (\lambda 4 (16 1))))) (\lambda 8 (\lambda 4 (2 (\lambda \lambda 2) 3 1)) (\lambda \lambda \lambda 1 4)) 4)) (\lambda \lambda 1 5 (\lambda 1 4 3))))) (1 1)) (\lambda \lambda \lambda 3 ((\lambda 1 1) (\lambda \lambda \lambda </math> <math>\lambda \lambda (\lambda 1 1) (\lambda (\lambda \lambda \lambda 2 (\lambda \lambda \lambda \lambda 3 (\lambda 8 5 (\lambda 1 2 7))) (\lambda \lambda 5 2 3 (\lambda 1 (6 2 4)) (\lambda \lambda \lambda 1) (\lambda \lambda \lambda 1)) (\lambda 1 (\lambda \lambda 1 (\lambda \lambda 2)) 2)) (1 1)) (\lambda \lambda \lambda 2 </math> <math> (\lambda \lambda \lambda 2 10 (9 (\lambda \lambda \lambda 1 11))) (12 (\lambda \lambda 1))) (\lambda \lambda \lambda \lambda 2) 2 2 (1 (\lambda 6 6 5 (\lambda 5 (\lambda \lambda \lambda 3 (\lambda \lambda \lambda 2 (\lambda \lambda \lambda 2 (\lambda \lambda \lambda 1 (\lambda 1 (\lambda 1))) (20 11)) 7))) (11 3)))) </math> <math>2 (\lambda 1) (\lambda 1 (\lambda 1)) 1 (\lambda \lambda 2) 1 (\lambda \lambda 1))))) (\lambda \lambda \lambda \lambda 3 (\lambda \lambda \lambda 3 (\lambda \lambda \lambda 1 (10 11 (\lambda 1) (\lambda 1 (\lambda 1))))))) (\lambda \lambda \lambda 1 (\lambda 4 (\lambda 4 (\lambda \lambda 1 4 (3 2)))))</math>

proves that

<math>KP(x,y) \leq KP(x) + KP(y|x^{\ast}) + 1388</math>

where <math>x^{\ast}</math> is a shortest program for x.

This inequality is the easy half of an equality (up to a constant) known as Symmetry of information.

[edit] A quine

The term <math>Q=\lambda (\lambda 1 1) (\lambda \lambda 1 (\lambda \lambda \lambda \lambda 1 4 (6 6 3 2))) 1 1</math> concatenates two copies of its input, proving that

<math>KS(x x) \leq \ell(x) + 74</math>

Applying it to its own encoding gives a 148 bit quine:

<math>U (\widehat{Q}\widehat{Q}:Nil) = \widehat{Q}\widehat{Q}</math>

[edit] Compression

So far, we've seen surprisingly little in the way of actually compressing an object into a shorter description; in the last example, we were only breaking even. For <math>\ell(x)>74</math> though, we do actually compress <math>xx</math> by <math>\ell(x)-74</math> bits.

What could be the shortest program that produces a larger output? The following is a good candidate: the program <math>(\lambda 1 1 1 1 (\lambda \lambda 1 (\lambda \lambda 1) 2)) (\lambda \lambda 2 (2 1))</math>, of size 55 bits, uses Church numerals to output exactly <math>2^{2^{2^2}}=65536</math> ones. That beats both gzip and bzip2, who need 360 and 352 bits respectively, to describe the same output (as a 8192 byte file with a single letter name).

[edit] Halting probability

The halting probability of the prefix universal machine is defined as the probability it will output any term that has a normal form (this includes all translated strings):

<math>\Omega_{\lambda} = \sum_{U (p:z) = \langle x,z \rangle,\,x\in NF} 2^{-\ell(p)}</math>

With some effort, we can determine the first 4 bits of this particular number of wisdom:

<math>\Omega_{\lambda} = .0001\ldots_2</math>

where probability .00012 = 2−4 is already contributed by programs 00100 and 00101 for terms True and False.

[edit] BLC8: byte sized I/O

While bit streams are nice in theory, they fair poorly in interfacing with the real world. The language BLC8 is a more practical variation on BLC in which programs operate on a stream of bytes, where each byte is represented as a delimited list of 8 bits in little-endian order. Adapting U to this model gives a universal byte oriented machine U8. One approach for U8 is to convert the byte-stream into a bitstream prior to parsing a lambda term. After parsing, the remainder of the bitstream is then converted back into a byte stream. Programs can be easily padded to use a whole number of bytes. For instance, a closed term M is equivalent to <math>(\lambda M)n</math>, and one can always choose <math>1 \leq n \leq 8</math> such that <math>\ell(\overline{(\lambda M)n}) = \ell(\overline{M}) + n+5</math> is a multiple of 8.

[edit] Brainfuck

The following BLC8 program

<math>

(\lambda (\lambda (\lambda 1 1) (\lambda (\lambda \lambda \lambda 1 (\lambda 1 (\lambda \lambda 1 (\lambda \lambda 1 (\lambda \lambda 1) (\lambda \lambda 1) (\lambda \lambda (\lambda 3 (1 (7 (5 (\lambda \lambda \lambda \lambda \lambda 1 (\lambda 6 (\lambda 1) 1 4 3)) (\lambda \lambda 2 (\lambda \lambda 1 3 2) 1))(\lambda \lambda 2 (\lambda 1) ((\lambda 1 1) </math> <math> (\lambda \lambda (\lambda \lambda 1 (\lambda \lambda \lambda \lambda 4 (1 (\lambda \lambda 1) (7 6 (\lambda 1) 3)) (1 (\lambda \lambda 2) (7 (\lambda 1) 6 3))) 1) (2 2 1)) 7 1)))) (7 (1 (5 (\lambda \lambda \lambda \lambda 2 (\lambda \lambda 6 (\lambda 1) 2 1 (\lambda 1 6 4))) (\lambda \lambda \lambda \lambda 1 (\lambda 5 (\lambda 1) 1 </math> <math> (\lambda 1 5 4))))) (5 (10 (\lambda 1 (\lambda 1))) (11 (\lambda 2 ((\lambda 1 1) (\lambda \lambda \lambda \lambda (\lambda 1 1) (\lambda \lambda 1 (\lambda \lambda \lambda 3 (5 5 2) (\lambda \lambda 2)) 1) 1 (3 (\lambda 5 5 4 (\lambda 4 (\lambda 3 (2 1))))) (2 (\lambda 1)) 1) 1)))))) </math> <math> (\lambda 11 (\lambda 11 (\lambda 3 (\lambda 3 (\lambda 3 (\lambda 3 (2 1)))))))))))) (1 1)) (\lambda 1 (\lambda \lambda \lambda \lambda 4) 3 2 2)) ((\lambda 1 1) (\lambda \lambda \lambda 1 2 (3 3 2)) 1)) ((\lambda \lambda 2 (2 (2 1))) (\lambda \lambda 2 (2 1)) (\lambda \lambda 1 (\lambda \lambda 2) 2) (\lambda \lambda 1)) </math>

is an unbounded tape Brainfuck interpreter in 936 bits (or, equivalently, 117 bytes). The formatting obscures the fact that line 3 has two double digit indices (10 and 11), while line 4 starts with another 2 occurrences of 11.

This provides a nice example of the property that universal description methods differ by at most a constant in complexity. Writing a BLC8 interpreter in Brainfuck, which would provide a matching upper bound in the other direction, is left as an exercise for die-hard Brainfuck programmers.

The interpreter expects its input to consist of a Brainfuck program followed by a ] followed by the input for the Brainfuck program. The interpreter only looks at bits 0,1,4 of each character to determine which of ,-.+<>][ it is, so any characters other than those 8 should be stripped from a Brainfuck program. Any unconsumed input is appended to the output of the Brainfuck program. Consuming more input than is available results in an error (the non-list result <math>\lambda x.x</math>).

To conform to the more standard behaviour of not changing the current cell on reaching EOF, replace the fragment <math>1 (\lambda 6 (\lambda 1) 1 4 3)</math> on line 1 with <math>(\lambda 2 1 (1 5 2 1)) (\lambda \lambda \lambda 8 (\lambda 1) 3 6 5 2)</math> at a cost of 45 more bits.

This interpreter is a rough translation of the following version written in Haskell

putc = do (     _,      _,b,      _) <- get; tell [b]
getc = do (  left,  right,_,b:input) <- get; put (  left,  right,     b,input)
prev = do (l:left,  right,b,  input) <- get; put (  left,b:right,     l,input)
next = do (  left,r:right,b,  input) <- get; put (b:left,  right,     r,input)
decr = do (  left,  right,b,  input) <- get; put (  left,  right,pred b,input)
incr = do (  left,  right,b,  input) <- get; put (  left,  right,succ b,input)
loop body = do (_,_,b,_) <- get; when (b /= '\0') (body >> loop body)
parseInstr = liftM loop (between (char '[') (char ']') parseInstrs)
            <|> (char '<' >> return prev)
            <|> (char '>' >> return next)
            <|> (char '-' >> return decr)
            <|> (char '+' >> return incr)
            <|> (char '.' >> return putc)
            <|> (char ',' >> return getc)
            <|> (noneOf "]" >> return (return ()))
parseInstrs = liftM sequence_ (many parseInstr)
main = do [name] <- getArgs
         source <- readFile name
         let init = (repeat '\0', repeat '\0', '\0', repeat  '\0')
         putStrLn $ either show (execWriter . (`evalStateT` init)) (parse parseInstrs
name source)

[edit] References

  • Tromp, John, Binary Lambda Calculus and Combinatory Logic, in Randomness And Complexity, from Leibniz To Chaitin, ed. Cristian S. Calude, World Scientific Publishing Company, Oct 2008. (pdf version).

[edit] See also

[edit] External links

Personal tools