CSCI 270: Introduction to Algorithms and Theory of Computing Fall 2017 Prof: Leonard Adleman Scribe: Joseph Bebel We will now discuss computer programs, a concrete manifestation of what we ve been calling algorithms. Computer programs are written in specific programming languages. Here, for simplicity we will consider a simplified version of BASIC that we call SBASIC. It will have the usual arithmetic operations and usual decision logic that most programming languages have (if/then/else, for loops, goto, etc). One major simplification we will make is to assume that the input and output of an SBASIC program are natural numbers, and that input can only occur in the first line of a program and output can only occur on the last line. We say that an SBASIC program halts on a given input, if when executed on that input, the program reaches this last line and produces some output. Note that some SBASIC programs may halt on some inputs and not halt on other inputs. One important concept is that each SBASIC program has an index. Let us put the SBASIC programs in lexicographical order based on their source code. Then there is a 1st SBASIC program, a 2nd SBASIC program, and so on (ad infinitum). Since there are an infinite number of SBASIC programs, each SBASIC program has a unique natural number as its index, and each natural number corresponds to a unique SBASIC program. You can also think of the index as the natural number (binary string) corresponding to that SBASIC program s source code. We can ask, for a set of natural numbers, if there exists an SBASIC program that decides if a number is in that set or not: The Decision Problem for S N INPUT: n, a natural number OUTPUT: 1 if n S 0 if n S For example, consider the set E = {0, 2, 4, 6, 8, 10,...} of even numbers. The decision problem for E is then: on input an even number, output 1, otherwise output 0. DEF: 1. A subset S N is decidable iff there exists an SBASIC program that solves the decision problem for S DEF: 2. A subset S N is undecidable iff there does not exist an SBASIC program that solves the decision problem for S Note that the set E is decidable; the SBASIC program just needs to look at one bit of the input to decide if the input is even or odd. 1
We can now pose the following question: given an SBASIC program and an input, does it halt on that input? We can more precisely ask this question about SBASIC programs that are given their own index as input. It turns out that there is no SBASIC program that can solve that decision problem: Theorem 1. Undecidability of the Halting Problem K = {i B i on input i halts } is undecidable. Proof. Assume there exists an SBASIC program P that solves the decision problem for K. execute P on input G if P outputs 1: A: GOTO A else if P outputs 0: PRINT 0 Let c be the index of B c. Is c K? if c K, then since P solves the decision problem for K, P outputs 1 on input c. Therefore, B c on input c goes into an infinite loop. Therefore, c K, which is a contradiction if c K, then since P solves the decision problem for K, P outputs 0 on input c. Therefore, B c on input c prints 0. Therefore, c K, which is a contradiction Since we reach a contradiction in both cases, our original assumption must be false, and there does not exist an SBASIC program that solves the decision problem for K, and K is undecidable. The Decision Problem for K INPUT: n, a natural number OUTPUT: 1 if n K 0 if n K 2
So have we proven that the decision problem for K is not solvable? Not quite; we ve shown that there does not exist an SBASIC program that solves the decision problem for K, but maybe you can solve it in some other programming language like C++. We make the claim that, actually, every program in another programming language can be translated or compiled into an SBASIC program with equivalent behavior. This concept has been explored over the years and has resulted in something we might call Church s Thesis : Church s Thesis: If an algorithm has an input/output behavior then there is an SBASIC program with the same input/output behavior. Note that Church s Thesis is philosophy, not science or mathematics. It says that every machine that you can conceivably build in the real world, whatever its input/output is, can be simulated by some equivalent SBASIC program. This statement of Church s Thesis is more frequently stated with Turing machines rather than SBASIC programs. But Turing machines are horrible to program, and in principle, it doesn t matter; SBASIC programs (or C++ programs, or Java programs, etc) are all equivalent in this regard. But we cannot prove philosophy, so we have to accept that whatever we prove about SBASIC programs will apply to all computers and computer programs which might exist in the future. So the tool (the computer) which (presumably) we are dedicating our lives to studying and using is fundamentally flawed; in fact, most problems are undecidable, and most problems 3
which are decidable are not solvable in polynomial time. It s the fundamental limits of the tool. Another point to make about the Undecidability of the Halting Problem; it depends greatly on self-reference or what is sometimes called diagonalization. It is like the sentence This sentence is false ; it cannot be true, because then it would imply its own falsehood, and it cannot be false, because then it would be true. While such logical contradictions are more mathematical toys than anything else, the basic concept of self-reference is critical to the Undecidability of the Halting Problem. Let s consider another mind experiment. Can we build a robot that can assemble exact copies of itself? Let s say that we build such a robot and leave it in a big warehouse full of its parts. Assuming you build a sufficiently sophisticated robot, maybe it can assemble the parts into a copy of itself. But then it remains the question of how the new robot gets its software. Somehow, it has to be copied from the old robot. Maybe this is possible to do by ensuring that the software is in some readable form on the disk of the robot. But actually, that is not necessary. Kleene Recursion Theorem: put simply, every program can compute its own index. That is, every program can have a line of code which performs the operation j = MY INDEX. So a program, mid-way through its execution, can somehow (we don t say how) obtain a reference or a copy of itself. Why is this important? Consider the following set and the decision problem for that set: S = {i B i on input 0 outputs 1 } At first glance, it may not be clear whether this problem is decidable or undecidable. On one hand, it seems like just as difficult a problem as the decision problem for K. On the other hand, it is not immediately clear how to obtain self-reference; for the decision problem for K, we only talk about the behavior of each SBASIC program on input its own index, so in the counterexample we just assume the input is the index. However, using Kleene s recursion theorem, we can show this set is also undecidable: Theorem 2. S is undecidable. Proof. Assume there exists an SBASIC program P that solves the decision problem for S. N MY INDEX execute P on input N if P outputs 1: A: GOTO A else if P outputs 0: PRINT 1 Let c be the index of B c. Is c S? 4
if c S, then since P solves the decision problem for S, P outputs 1 on input c. Therefore, B c on input 0 goes into an infinite loop. Therefore, c S, which is a contradiction if c S, then since P solves the decision problem for S, P outputs 0 on input c. Therefore, B c on input 0 prints 0. Therefore, c S, which is a contradiction Since we reach a contradiction in both cases, our original assumption must be false, and there does not exist an SBASIC program that solves the decision problem for S, and S is undecidable. T = {i ( n)[b i on input n halts ]} Theorem 3. T is undecidable. Proof. Assume there exists an SBASIC program P that solves the decision problem for T. N MY INDEX execute P on input N if P outputs 1: A: GOTO A else if P outputs 0: PRINT 0 Let c be the index of B c. Is c T? if c T, then since P solves the decision problem for T, P outputs 1 on input c. Therefore, B c on all inputs goes into an infinite loop. Therefore, c T, which is a contradiction if c T, then since P solves the decision problem for T, P outputs 0 on input c. Therefore, B c on all inputs prints 0. Therefore, c T, which is a contradiction Since we reach a contradiction in both cases, our original assumption must be false, and there does not exist an SBASIC program that solves the decision problem for T, and T is undecidable. 5
So there are decision problems that cannot be solved by SBASIC programs. Can we still say something (computationally) about these undecidable sets, or are computers totally useless here? We will introduce the concept of recursively enumerable/listable; to do so, we relax the definition of SBASIC program to allow output at arbitrary lines (and to continue executing after producing output) DEF: 3. A set S N is recursively enumerable, or listable iff there exists an SBASIC program P that (given no input) prints a sequence of numbers such that: 1. P never prints a number not in S 2. for all elements n S, P will eventually output n (if given enough time) Note that if the set S has infinitely many elements, then a program that lists S must run forever. Every decidable set S is also listable: just loop over all numbers n, run the decision problem for S solver to decide if n is in the set, and if so output n; otherwise go to the next number. But is every listable set also decidable? Theorem 4. K is listable. Proof. The following program lists K: for i 1 to infinity: for j 1 to i: execute B j on input j for i steps if B j halts within i steps, output j. otherwise, continue to next j We need to show that the output follows the definition of listable. Assume a number n is output by this program: then that must have meant that B n was run on input n for some number of steps, and B n halted. Therefore n K and the program correctly output n. Does it output every element of K? Assume n K. Then it must be that B n halts on input n after t steps for some number t. Since i, j loop over all numbers to infinitely, there must be an iteration of the loop where i = n and j = t, in which case B n will be run on input n for t steps, halt, and n is output. Therefore, K is listable. Are there any unlistable sets? Theorem 5. For all X N, if X is undecidable, then either X is not listable or X is not listable. Proof. Assume P lists X and P lists X. Consider the following program: 6
for i 1 to infinity: run P until it outputs i numbers. If the i th number is G, then output 1. run P until it outputs i numbers. If the i th number is G, then output 0. This program decides X. Note that every number n must be in either X or X. Therefore, either P or P will eventually output n. We run both listers until one of them outputs n, then we have decided whether n X. If X is undecidable, then it must be the case that either X or X is not listable. Theorem 6. K is not listable. Is it possible for both a set and its complement to be unlistable? Yes: Theorem 7. T is not listable. Proof. Assume there exists an SBASIC program P that lists T. N MY INDEX execute P until it outputs G numbers if the G th output of P is N: A: GOTO A else: PRINT 0 Let c be the index of B c. Is c T? if c T, then since P lists T, P eventually outputs c. Therefore, for some input G, B c will compute the G th output of P, find it is c, and go into an infinite loop. Therefore, c T, which is a contradiction if c T, then since P lists T, on all possible inputs G, the G th output of P will never be c. Therefore, B c on all inputs prints 0. Therefore, c T, which is a contradiction Since we reach a contradiction in both cases, our original assumption must be false, and there does not exist an SBASIC program that lists T, and T is unlistable. Theorem 8. T is not listable. Proof. Assume there exists an SBASIC program P that lists T. N MY INDEX for i 1 to infinity: execute P until it outputs i numbers 7
if the i th output of P is N: GOTO A else next i A: PRINT 0 Let c be the index of B c. Is c T? if c T, then since P lists T, P eventually outputs c. Therefore, for some value of i, B c will compute the i th output of P, find it is c, and goto the PRINT 0 line. Therefore, c T, which is a contradiction if c T, then since P lists T, for all values of i, the i th output of P will never be c. Therefore, B c will never exit the loop, always incrementing i, and thus never halt. Therefore, c T, which is a contradiction Since we reach a contradiction in both cases, our original assumption must be false, and there does not exist an SBASIC program that lists T, and T is unlistable. Therefore T and T are both undecidable and both unlistable. Is unlistable the worst thing that a set can be? Shortest Program: Consider the string 0000000000000000000000. Would you think that this string was generated randomly (e.g. by flipping a coin and writing 0 if you get heads and 1 if you get tails)? Of course it is possible to get this sequence by coin flips, but it s not likely. If our concern is getting roughly an equal number of 0 s and 1 s, then what about the string 0101010101010101010101? Is this likely to come from coin flips? Again, it is possible to get this string randomly, but there is something about it that suggests it was not random. It can sometimes be difficult to tell. For example, the string 0110111001011101111000. This might begin to look more random, but in fact it is the digits 0-8, each written in binary, concatenated together. One problem with trying to call these strings random or not random is that they are all equally likely to occur as any other 22 bit string. That is: you get each possible string with probability 2 22 including 0000000000000000000000 and 1101110110101101010101 (an actual string generated randomly). So why does one string seem more random than the other? Kolmogorov gave a computational answer to this question. He observed that non-random strings have short programs that output them. For example, consider a 1-million bit long string of only 0 s. There is a program of length approximately 1 million bits that outputs that string: simply hard-code the string into the program, and output it. But there is a much shorter program, which simply has a loop that counts from 1 to 1 million and outputs 0 each iteration. This program is much shorter than the hard-coding program, yet produces the same output. 8
Similar short programs can output the other non-random strings we ve discussed. However, for a string generated randomly, with high probability it does not have a shorter program than the one that outputs a hard-coded string. DEF: 4. For all numbers n, an SBASIC program P is a shortest program for n iff: 1. On input 0, P outputs n 2. There does not exist another program P that on input 0 outputs n, and the length of P is less than the length of P. The length of a program P is the number of ASCII characters in its source code. Sh = {i ( n)[b i is a shortest program for n]}. DEF: 5. For all X N, X is immune iff 1. X is infinite 2. X has no infinite subset that is listable. Theorem 9. Sh is immune. Proof. Assume there exists an SBASIC program P that lists some infinite subset of Sh. N MY INDEX for i 1 to infinity execute P until it produces its i th output, k if B k is an SBASIC program longer than B N : execute B k on input 0 PRINT whatever B k outputs else next i What is the output of B c on input 0? It will compute its own index first, then run P until it outputs a shortest program that is longer than B c. Note that this must occur if P lists an infinite subset of Sh, since there are only finitely many programs shorter than or equal in length to B c (assuming SBASIC programs are written in ASCII). So after outputting that finite number of elements of Sh, P must eventually output the index of an SBASIC program B k longer than B c. When B c now executes that other SBASIC program, B c will produce the same output as B k. However, B c is shorter than B k, contradicting the claim that B k was a shortest program and that k Sh. Therefore, P does not list an infinite subset of Sh, and Sh is immune. 9
The study of shortest programs has implications for data compression. Consider a PNG image, which is a lossless compression of a bitmap. Some images are very compressible; they are very large images with very short representations. The compressed form can be considered to be a short program that outputs that image. Yet we know that a random bitmap image is incompressible; there does not exist a representation that is much shorter than the raw data. 10