Computational complexity Heuristic Algorithms Giovanni Righini University of Milan Department of Computer Science (Crema)
Definitions: problems and instances A problem is a general question expressed in mathematical terms. Usually the same question can be expressed on many examples: they are instances of the problem. For instance: Problem: Is n prime? Instance: Is 7 prime? A solution S is the answer corresponding to a specific instance. Formally, a problem P is a function that maps instances from a set I into solutions (set S): P : I S A priori, we do not know how to compute it: we need an algorithm.
Definitions: algorithms An algorithm is a procedure with the following properties: it is formally defined it is deterministic it made of elementary operations it is finite. An algorithm for a problem P is an algorithm whose steps are determined by an instance I I of P and produce a solution S S A : I S An algorithm defines a function and it also computes it. If the function is the same, the algorithm is exact; otherwise, it is heuristic.
A heuristic algorithm should be Algorithms characteristics 1. effective: it should compute solutions with a value close to the optimum; 2. efficient: its computational complexity should be low, at least compared with an exact algorithm; 3. robust: it should remain effective and efficient for any possible input. To compute a solution, an algorithm needs some resources. The two most important ones are space (amount of memory required to store data); time (number of elementary steps to be performed to compute the final result).
Complexity Time is usually considered as the most critical resource because: time is subtracted from other computations more often than space; it is often possible to use very large amounts of space at a very low cost, but not the same for time; the need of space is upper bounded by the need for time, because space is re-usable. It is intuitive that in general the larger is an instance, the larger is the amount of resources that are needed to compute its solution. However how the computational cost grows when the instance size grows is not always the same: it depends on the problem and on the algorithm. By computational complexity of an algorithm we mean the speed with which the consumption of computational resources grows when the size of the instance grows.
Measuring the time complexity The time needed to solve a problem depends on: the specific instance to be solved the algorithm used the machine that executes the algorithm... We want a measure of the time complexity with the following characteristics: independent of the technology, i.e. it must be the same when the computation is done on different hardware; synthetic and formally defined, i.e. it must be represented by a simple and well-defined mathematical expression; ordinal, i.e. it must allow to rank the algorithms according to their complexity. The observed computing time, does not satisfy these requirements.
Time complexity The asymptotic worst-case time complexity of an algorithm provides the required measure in this way: 1. we measure the number T of elementary operations executed (which is computer-independent); 2. we compute a number n which determines the number of bits needed to define the size of any instance (e.g., the number of elements in the ground set in a combinatorial optimization problem); 3. we find the maximum number of elementary operations needed to solve instances of size n T (n) = max I I n T (I) n N (this reduces the complexity to a function T : N N) 4. we approximate T (n) with a simpler funcion f (n), for which we are only interested in the asymptotic trend for n + (complexity is more important when instances are larger) 5. finally we can collect these functions in complexity classes.
Notation: Θ means that T (n) Θ(f (n)) c 1, c 2 R +, n 0 N : c 1 f (n) T (n) c 2 f (n) for all n n 0 where c 1, c 2 and n 0 are constant values, independent on n. T (n) is between c 1 f (n) and c 2 f (n) for a suitable small value c 1 for a suitable large value c 2 for any size larger than n 0 T(n) f(n) c f(n) 2 T (n) A c f(n) 1 Asymptotically, f (n) is an estimate of T (n) within a constant factor: for large instances, the computing time is proportional to f (n). n 0 n
Notation: O means that T (n) O(f (n)) c R +, n 0 N : T (n) c f (n) for all n n 0 where c and n 0 do not depend on n. T (n) is upper bounded by cf (n) for a suitable large value c for any n larger than a suitable n 0 n 0 Asymptotically, f (n) is an upper bound for T (n) within a constant factor: T(n) c f(n) T (n) A for large instances the computing time is at most proportional to f (n). f(n) n
Notation: Ω means that T (n) Ω(f (n)) c > 0, n 0 N : T (n) c f (n) for all n n 0 where c and n 0 do not depend on n. T (n) is lower bounded by cf (n) for some suitable small value di c for any n larger than n 0 T(n) f(n) T (n) A Asymptotically, f (n) is a lower bound of T (n) within a constant factor: for large instances the computing time is at least proportional to f (n) n 0 c f(n) n
Combinatorial optimization In combinatorial optimization problems it is natural to define the size of an instance as the cardinality of its ground set. An explicit enumeration algorithm considers each subset S E, evaluates whether it is feasible (x X) in α(n) time, evaluates the objective function f (x) in β(n) time, records the best value found. Since the number of solutions is exponential in n, its complexity is at least exponential, even if α(n) and β(n) are polynomials (as often occurs).
Polynomial and exponential complexity In combinatorial optimization, the main distinction is between polynomial complexity: T (n) O ( n d) for a constant d > 0 exponential complexity: T (n) Ω(d n ) for a constant d > 1 The algorithms of the former type are efficient; those of the latter type are inefficient. In general, heuristic algorithms are polynomial and they are used when the corresponding exact algorithms are exponential. Assuming 1 operation/µsec n n 2 op. 2 n op. 1 1µ sec 2µ sec 10 0.1 msec 1 msec 20 0.4 msec 1 sec 30 0.9 msec 17.9 min 40 1.6 msec 12.7 days 50 2.5 msec 35.7 years 60 3.6 msec 366 centuries
Problem transformations and reductions Some times it is possible and convenient to reformulate an instance of a problem P into an instance of a problem Q and then to transform back the solution of the latter into a solution of the former. Polynomial transformation P Q: given any instance of P a corresponding instance of Q is defined in polynomial time the instance of Q is solved by a suitable algorithm, providing a solution S Q from S Q a corresponding solution S P is obtained in polynomial time Example: VCP SCP, MCP MISP and MISP MCP.
Problem transformations and reductions Polynomial reduction P Q: given any instance of P an algorithm A is executed a polynomial number of times; to solve instances of a problem Q obtained in polynomial time from the instance of P and from the results of the previous runs; from the solutions computed, a solution of the instance of P is obtained. Examples: BPP PMSP and PMSP BPP. In both cases if A is polynomial/exponential, the overall algorithm turns out to be polynomial/exponential if A is exact/heuristic, the overall algorithm turns out to be exact/heuristic
Optimization vs. decision A polynomial reduction links optimization and decision problems. Optimization problem: given a function f and a feasible region X, what is the minimum of f in X? f = min x X f =? Decision problem: given a function f, a value k and a feasible region X, do solutions with a value not larger than k exist? x X : f (x) k? The two problems are polynomially equivalent: the decision problem can be solved by solving the optimization problem and then comparing the optimal value with k; the optimization problem can be solved by repeatedly solving the decision problem for different values of k, tuned by dichotomous search.
Drawbacks of worst-case analysis The worst-case time complexity has some relevant drawbacks: it does not consider the performance of the algorithm on the easy/small instances; in practice the most difficult instances could be rare or unrealistic; it provides a rough estimate of the computing time growth, not of the computing time itself; the estimate can be very rough, up to the point it becomes useless; it may be misleading: algorithms with worse worst-case computational complexity can be very efficient in practice, even more than algorithms with better worst-case computational complexity.
Other complexity measures To overcome these drawbacks one could employ different definitions of computational complexity: parameterized complexity expresses T as a function of some other relevant parameter k besides the size of the instance n: T (n, k) average-case complexity assumes a probability distribution on I and it evaluates the expected value of T (I) on I n T (n) = E [T (I) I I n ] If the distribution has some parameter k, the average-case complexity is also parameterized, i.e. it provides T (n, k).
Average-case complexity Average-case complexity analysis and classification is more reliable when algorithms are efficient on almost all instances (e.g. the simplex algorithm for linear programming). We would like to evaluate the expected value of T (I) on I n for each n N T (n) = E [T (I) I I n ] This requires to define the probability distribution of the instances. The most frequent hypothesis is equiprobability; (when we do not have any other information.) other assumptions must be based on some specific probabilistic model of the problem (often depending on some parameters.)
Random instances: binary matrices Associating a probability with every instance of a problem is useful for two reasons: for a priori studying the average-case complexity of an algorithm; for a posteriori evaluating the efficiency of the algorithm. In case of heuristic algorithm we also want to evaluate their effectiveness (the value of the solutions obtained and the distance from the optimum). Random binary matrices of given size (m, n): 1. model with uniform probability p: Pr [ a ij = 1 ] = p (i = 1,...,m; j = 1,...,n) If p = 0.5 it provides equiprobability of all instances. 2. model with fixed density δ: given the mn entries of the matrix, δmn are randomly selected with uniform probability distribution and are set to 1. The two models tend to be similar for p = δ.
Random instances: graphs Random graphs of size n can be generated as follows: 1. Gilbert model: G(n, p), i.e. uniform probability p: Pr [(i, j) E] = p (i V, j V \{i}) Graphs with the same given number of edges m have the same probability p m : (1 p) n(n 1)/2 m (different for each m) If p = 0.5 it coincides with the model where all graphs have the same probability. 2. Erdős-Rényi model: G(n, m): given the number o edges m, m unordered vertex pairs are randomly selected with uniform probability distribution and an edge is generated for each of them. The two models tend to be similar for p = 2 m n(n 1).
Phase transitions Different values of the parameters of the probability distributions correspond to different regions of the instance space. For several problems we observe that the computing time of the algorithms is significantly different in different regions. In case of heuristic algorithms the same holds for the quality of the solutions. This has to do with the robustness of the algorithms. In some cases the changes occur suddenly, for some critical values of the parameters, reminding the phase transitions in physical systems.
Two things we can do The design and analysis of heuristic algorithms proceeds in two directions: proving theoretical properties on the algorithms, such as: worst-case time complexity (usually polynomial); average-case time complexity or parameterized time complexity; approximation guarantees; evaluating the practical usefulness of the algorithms: computing time; approximation; robustness to instances and to parameters (phase transitions). The termination is often (arbitrarily) decided on the basis of the number of iterations or the computing time elapsed or the lack of improvements for a certain time. It is used to calibrate the trade-off between approximation and computing time.