Math 4242 Polynomial Time algorithms, IndependentSet problem

Math 4242 Polynomial Time algorithms, IndependentSet problem Many of the algorithms we have looked at so far have a reasonable running time. Not all algorithms do. We make this idea more precise. Definition: An algorithm is said to be a polytime or polynomial-time algorithm if its algorithmiccomplexity is at most a POLYNOMIAL function of the size of its input. More precisely an algorithm is said to be a polynomial-time algorithm if there exists a polynomial p and a natural number N with the property that whenever t > N and when the algorithm is given an input of length t (machine-words), then the number of steps (machine-word operations) required by the algorithm is at most p(t). EndDefinition We must talk about the size of the input. It is common to measure the length of input in machinewords (so groupings of 32 or 64 wires/bits). We must also compare this to the algorithmic complexity of the algorithm, which we measure in terms of the number of machine-word operations. To show an algorithm is a polynomial-time algorithm, one typically has a function describing (upperbounding) the algorithmic complexity of the algorithm, and show this function is less than or equal to another function a polynomial evaluated at the length of the input to the function. Example: The selection sort algorithm takes as input a list of t numbers, each of which is a fixedprecision number that fits in a machine-word. The selection sort algorithm consists of O(t 2 ) operations; each of those operations requires comparing or moving these numbers (so each operation is a machineword operation). So the number of machine-word operations required by the selection sort algorithm is at most Ct 2, where C is a constant depending on the implementation, but does not depend on n the length of the list. Let p(t) = Ct 2, and let N = 1, and let t > N. Then observe that since algoritmic complexity of selection-sort p(t) Ct 2 p(t) = Ct 2. So the size of the input is t machine words, and the running time of the algorithm is at most Ct 2 machine-word operations, which is (at most) a polynomial function of the length of the input size t. Hence selection sort is a polynomial-time (polytime) algorithm. Example: The binary search algorithm takes as input a list of n numbers (again each number is a fixed-precision number in a machine-word), and a given element x (also in a machine word). Thus the input to binary search has size t = n + 1 machine words. Binary search requires O(log 2 (n)) machine-word operations. Hence the number of word-operations required for binary search is at most C log 2 (n), for some constant real number C (independent of n.) Consider the polynomial p(x) = Cx. Evaluating this polynomial at the size of the input (which is n + 1) gives p(n + 1) = C(n + 1). Since algorithmic complexity of binary search C log 2 (n) C(n + 1) = p(n + 1), we see the algorithmic complexity of binary search (measured in word-operations) is at most a polynomial function of the size of the input to binary search (again measured in number of words).

So binary search is a polynomial-time algorithm. More concisely, since the input to binary search is n + 1 machine words, and the binary search algorithm requires at most C(n + 1) machine-word operations (in fact much less than this!) we see that the binary search algorithm is a polynomial-time algorithm. Example: The algorithm for multiplying two binary numbers takes as input two length n lists. So the input to the binary multiplication algorithm has length t = 2n machine words. The runtime of the algorithm was O(n 2 ), so the runtime is at most Cn 2 for some real number C. Hence the runtime is at most Ct 2 /4 = C t 2 which is clearly a polynomial in t the length of the input. Thus this algorithm is a polytime algorithm. Example: There are polynomial time algorithms for solving linear programs. Meaning if one describes a linear program as some sequence of data (that data would involve numbers for the coefficients of the objective function, and numbers describing the coefficients involved in the constraints, and so on.) There are algorithms that output the optimal solution to the optimization problem described by this linear program, and the runtime of the algorithm is at most a polynomial in the length of the input (the amount of data required to describe the linear program.) Most of the algorithms we have studied have had big-o complexity of either O(log 2 (n)), or O(n), or O(n log 2 (n)), or O(n 2 ), and for the most part using arguments like the above one easily sees these algorithms are polytime algorithms also. Non-example Some algorithms take inputs of very small length and have to output a lot of data. The following algorithm is an example of this sort and is NOT a polynomial time algorithm. Consider an algorithm that takes as input an array of length t of the form [0, 0, 0, 0, 0,..., 1] representing a number n = 2 t 1 in binary. The algorithm outputs the list of numbers [1, 2, 3,..., n]. The size of the input of the algorithm is t. However this algorithm clearly requires at least Cn = C2 t 1 steps to do its job, because the output list has length n. So this algorithm has an input of size t machine words, and requires C2 t 1 machine-word operations to run. Is this algorithm s runtime at most polynomial function of its input? In other words can we ever have a polynomial p such that 2 t 1 p(t) for all large t? No. So this algorithm is not a polynomial time algorithm. The number of steps required is vastly more than the size of the input in fact the number of steps is an EXPONENTIAL function of the size of the input. So not all algorithms are polynomial time algorithms. Computer scientists sometimes refer to polynomial-time algorithms as efficient. So if you see a computer scientist or book referring to an algorithm as efficient, they probably don t mean the colloquial meaning of the word, they mean polytime. There are good theoretical and practical reasons for using the word efficient, but always remember what it really means is polytime. Another Example Recall that an independent set in a graph G = (V, E) is a set of vertices S with the property that for each e E, e S 1. In other words there are no edges between any two of the vertices in S. For example, here is a description of a graph and a set of vertices in G. V = {1,2,3,4,5,6,7} E = { # these are the the vertices of the graph

{1,2}, {1,4}, {2,3}, {2,4}, {2,5}, {3,4}, {5,6}, {5,7}, } S = {1,3,6,7} Consider an algorithm that takes the above information as input. To make things concrete lets imagine that this information is contained in a file and we are writing the program in python. Can we come up with an algorithm that confirms that the set S is an independent set in G? How might it do this and how many operations are required? Let us describe the size of the input, approximately. Suppose the V = n, E = e, and S = m. For now lets ignore the comma s, brackets in the input. Then the number of machine words required to hold the input is at most n + 2e + m. We assume each of these fits in a machine word and so this is n + 2e + m machine words in total. If we now consider the number of brackets and commas required we get a total of 2n + 4 for the first line, then at most 4(e + 2) characters for the lines describing the edge set, and finally at most 2m + 4 characters for the line describing the set S. Hence the total size of the input is t = 2n + 4e + 2m + 16. You can imagine an algorithm in python that reads this file in line by line and picks the lines apart, and from them creates a python list that holds the vertex set, another for the edge set, and another for the set S. To check that S is indeed an independent set, the algorithm could try all possible PAIRS of vertices in S and for each one check whether that pair is NOT an edge in the graph. If we ever find a pair of vertices in S that is an edge in G, then S is not an independent set. Otherwise no pair of vertices in S form an edge and so S is an independent set. The amount of work required by this algorithm comes in two parts. First the part that reads in the 2n + 4e + 2m + 16 characters in the file and makes the lists and the second part would be checking whether any of the m(m 1) pairs of vertices in S form an edge (by looking in the list of edges). There are 2 many ways to do this, but one can imagine a simple nested loop to try all pairs of vertices i, j in S and for each another loop that traverses the list of e edges in G checking that none of them is {i, j}. (There are faster ways to do this, for example using a hash table. We don t care right now.) An estimate of the number of machine-word operations required by this algorithm is: C 1 (2n + 4e + 2m + 16) + C 2 ( m(m 1) )e). 2 We claim that this expression is at most a polynomial in t the total size of the input. For example note that the above expression is obviously less than C 1 (t) + C 2 (t 3 ),

which is a polynomial in t. Since the amount of work required by this algorithm is at most a polynomial in the size of the input to the algorithm, this algorithm is a POLYNOMIAL TIME ALGORITHM. 1 THE DECISION-INDEPENDENT-SET problem. Recall a graph is a pair consisting of a vertex set and edge set. You may have also seen the definition of an independent set of vertices in a graph is a set of vertices in a graph that have no edge of the graph between any two of them. The decision-independent set problem is as follows. Given an encoding of a graph (a string, say, describing the vertex and edge set of the graph) and and integer m, determine if the graph described contains AN independent set of size m. The input to such an algorithm would look like (as above): V = {1,2,3,... n} E = {... {vertex, vertex } } m # desired minimum size of independent set of vertices An algorithm solving the decision-independent set problem would take, as input, a string encoding the above data, and the job of the algorithm is to always correctly output yes or no, depending on whether or not the given graph has an independent set of size at least m or not. It is important to understand that there ARE algorithms that (correctly!) solve the problem, in that they output the correct answer for a given input. For example given the input string describing a graph on n vertices and an independent set of size at least k, one could exhaustively try every possible vertex set of size k and test whether it is an independent set in the graph. However this exhaustive algorithm has very very many steps in comparison to the size of its input and it is easy to show this exhaustive search algorithm is not a polynomial time algorithm. To elaborate on the above, suppose we were interested in whether a graph G with n vertices contains an independent set of size n/2. To answer this question using the above method, we would have to try each of the n choose n/2 possible vertex subsets of size n/2, and for each of them use the previous algorithm to check whether that vertex subset is an independent set in the graph of not. So at a minimum then, this algorithm would have to perform n chooose n/2 machine word operations, and in reality it would be a good deal more than this. However, note that ( ) 2 n n n + 1, n/2 which can be seen by considering the binomial expansion of (1+1) n, and noting that the largest binomial coefficient is the middle one, and this largest one must be larger than the average value of all the binomial coefficients.

This means that the above algorithm is not a polynomial time algorithm, because if it was we would have a polynomial in n with the property that but since this would give that cost of algorithm p(n) ( ) 2 n n n + 1 cost of algorithm n/2 2 n a polynomial in n which is clearly impossible using what we know about exponential functions an exponential function will always be (eventually!!! for large enough n) larger than any polynomial in n. Hence the ABOVE algorithm for answering the decision independent set problem (when m = n/2 at least) is NOT a polynomial time algorithm. You might wonder whether we are just being stupid above and maybe there is a better way to solve this problem. However, although many smart people have tried, as of today nobody knows an algorithm which answers the decision independent set problem above (with m = n/2) (correctly!) and whose worst-case runtime is at most a polynomial function of the size of the input. This means all known algorithms we have for solving the decision independent set problem have a worst-case run-time that is too big to be of use for all except very very small graphs. This is an important problem that shows up in many places you would not expect. So people are interested in solving it. But we don t have good (polytime) algorithms to solve it. We will explore this problem and how it relates to other hard problems in the coming lectures.