Discrete Optimization. Lecture Notes 2 Disjunctive Constraints Defining variables and formulating linear constraints can be straightforward or more sophisticated, depending on the problem structure. The language of linear constraints is surprisingly expressive. As another illustration of their modelling power and of the tricks we may have to apply, we discuss a more complicated example involving disjunctive constraints: Sometimes we have pairs of constraints only one of which must be satisfied (logical OR). For instance, in several job scheduling problems, a machine can process only one job at a time, and running jobs cannot be interrupted. Thus one of the following two propositions must be true for any two jobs i, j running on the same machine: i precedes j OR j precedes i. Suppose that job k needs processing time p k. One can formulate different objectives (such as: finish the jobs as soon as possible), but at the moment we are only concerned with the formulation of the constraints. We can assume that all work starts at time 0. A schedule is completely described by the start time t k 0 of every job k. Obviously our variables must fulfill t i + p i t j OR t i + p i t j for all pairs i, j of jobs. But how can we express this OR by linear constraints connected by AND? One way is to introduce new 0,1-variables: x ij = 1 if job i precedes job j, and x ijk = 0 else. (Note that we obtain a MIP.) Let M > 0 be some large enough constant. We create the following constraints for all i, j: t i t j p i + M(1 x ij ) AND t j t i p j + Mx ij. These constraints express that the jobs do not overlap. (Think about it.) We remark that scheduling is a big field, almost a science in itself, with an own standard terminology and a huge variety of optimization problems: Jobs can have release times and due times, preemption may be allowed or not, machines can be identical or different, there can be precedence constraints between jobs, one may aim at minimizing the total work, the makespan or some weighted combinations, and much more. 1
Minimizing the Sum Norm In many optimization tasks one is interested in a solution vector x with minimum l 1 -norm, that is, min x i under some linear constraints. Because of the absolute-value terms this objective function is not linear. An obvious idea to turn it into an LP is: Introduce new variables y i to replace x i, and express y i = x i by the constraints. We may state y i 0, further y i = x i or y i = x i, write them as inequalities, then apply the trick for disjunctive constraints, and so on. However in this case there exists a much more elegant way: Instead of the constraints proposed above, we simply introduce a pair of constraints y i x i and y i x i. This does not seem to solve our problem, as this pair of constraints itself does not express y i = x i, rather it only says y i x i. However, we are minimizing y i in the end. Assume that y i > x i for some i, in any solution. Then we can decrease y i to y i = x i, without violating any constraints. This is because y i appears only in the new constraints. This guarantees y i = x i in any optimal solution, and this is all we need. Wrap-up Some general remarks are appropriate here. We have seen that various problems can be written as LP, ILP, or MIP. However there is much freedom in the definition of variables and the choice of constraints. Often, already this modelling phase is not straightforward and a creative act, or a matter of experience. The constraints must describe the feasible set, but it does not mean that they are uniquely determined. For example, infinitely many sets of linear constraints describe the same set of integer points! It arises the question which of many equivalent formulations is favourable. One main criterion is computational complexity of the algorithms that we can apply. So far we have addressed problem formulations, but not solution methods. Which algorithms can solve a given problem, once we have, e.g., an ILP for it? Are they fast enough? Will they always output optimal solutions? If not, how close are the solutions to the optimal ones? Finally, (I)LP is not the only way to model optimization problems. It is often preferred because the mathematical theory of LP is rather general, well understood and powerful, and there exist generic algorithms, implemented in software packages. But we are not forced to squeeze every problem into ILP form. For many problems, special-purpose algorithms can be simpler 2
and faster, as they take advantage of special structural features of a problem. In the opposite direction, LP can be further generalized by nonlinear programming, convex optimization, constraint programming, etc., to mention only a few keywords. We conclude with a literature hint. This is not closely tied to the mathematical course contents, but might be some inspiring reading about modelling issues in real-world optimization problems: R. Barták, C. Sheahan, A. Sheahan: MAK(Euro) - a system for modelling, optimising, and analysing production in small and medium enterprises. SOFSEM 2012, Lecture Notes in Computer Science (Springer), vol. 7147, pp. 600-611 (should be accessible electronically through Chalmers Library). Algorithmic Complexity of LP and ILP The next phase after modelling is to actually solve the problems. difficult is this? How The Simplex Algorithm and the Geometry of LP We outline a classical algorithm for solving LP. Canonical and standard form of LP are equivalent, since we can transform them into each other. To transform a standard LP into a canonical LP, replace Ax = b with Ax b and Ax b. Transforming a canonical LP into a standard LP is more interesting: Replace Ax b with Ax+s = b and s 0, where s is a vector of m new variables, called the slack variables. Introducing yet another variable z that represents the objective function value, we can write this form of an LP as a so-called tableau: s = b Ax, z = c T x. Due to the minus sign, our goal is now to maximize z. In the following we will assume b 0, which is the case in many LPs arising from natural applications. The general case where b may contain also negative entries is handled later. In our tableau we may set x := 0 which implies s = b and z = 0. Since b 0, this is a feasible solution where n of the n + m variables are 0. We call it a basic feasible solution. Next we try to improve this solution, i.e., to raise z. In order to describe the general step, we introduce the general notion of a tableau. It looks as follows: x B = β Λx N, z = z 0 + γ T x N. Here, x B and x N is a vector of m and n nonnegative variables called basic and nonbasic variables, respectively. The other symbols stand for constants (matrices, vectors, numbers), and 3
β 0 is required. Note that our initial tableau s = b Ax, z = c T x fits in this scheme. By x N := 0 we get a basic feasible solution with z = z 0. Now suppose that γ j > 0 holds for some j. If we increase the jth nonbasic variable, we obviously improve z. We can increase it as long as none of the basic variables becomes negative. As soon as some of the positive basic variables reaches 0, we remove it from the basis, while the increased nonbasic variable is moved to the basis. After this exchange we have to rewrite the tableau. (For the moment we skip the details.) Property β 0 is preserved, since β = x B if x N = 0, and x B 0 holds by construction. This exchange is also called a pivot step. We repeatly apply pivot steps until γ 0. At this moment we know that the current solution is optimal, since any feasible solution must satisfy x N 0. This algorithm that successively improves basic feasible solutions in pivot steps exchanging basic and nonbasic variables is called the simplex algorithm. Its name is explained by the geometric interpretation. Note that linear inequality constraints are satisfied by an intersection of halfspaces, that is, a convex polytope. Specifically, in the (n + m)-dimensional space of variables and slack variables, the m equality constraints describe an n-dimensional subspace in which the feasible set is a convex polytope, also called a simplex. The basic feasible solutions are the vertices of this polytope, because n variables are 0. Since the objective function is linear, it attains its optimum at some vertex of the polytope. It follows that some optimal solution must be a basic feasible solution. The simplex algorithm proceeds from a vertex to a neighbor vertex (along an edge of the polytope) with a better objective value, as long as possible. From convexity it follows that a local optimum is also a global optimum. We have to discuss the computational details of tableau rewriting. After every pivot step we must express the new basic variable x j in terms of the nonbasic variables. We take the equation which had the new nonbasic variable on the left-hand side and solve it for x j. It contains x j with negative coefficient, since it was this equation that limited the increase of x j. Then we substitute x j in all other equations. In a pivot step it may happen that the selected nonbasic variable increases forever. But then the LP itself is unbounded and has no finite optimal value. Hence this case is not a problem. Nevertheless, the simplex algorithm suffers from other problems. It may happen that no nonbasic variable can increase, because some basic variable is already 0 and would 4
become negative. We speak of a degeneracy. In the geometric language, this case appears if more than n bounding hyperplanes of the polytope go through the current vertex. Still we can exchange two variables, but without improving z. In the worst case we may run into a cycle of degenerate tableaus. A simple trick to break such degeneracies is to add small perturbations to b, thus splitting a degenerate vertex into several regular vertices close to each other. Thus we can escape from every degeneracy. In the end we can undo the perturbations and get an exact solution. It also follows that the simplex algorithm always terminates, because the number of different vertices is bounded by ( n+m) m. Remember that we assumed b 0 in the beginning. It remains to discuss LP, say in canonical form min c T x, Ax b, x 0, with an arbitrary vector b. We introduce a variable x 0 and consider the auxiliary problem min x 0, Ax x 0 1 b, x 0 0, x 0, where 1 denotes the vector of m entries 1. Now we start from the tableau s = b Ax+x 0 1, z = x 0. We set x = 0 and increase x 0 until s 0. At this moment we have b+x 0 1 0, moreover some slack variable is 0. Exchanging this slack variable with x 0 yields a feasible tableau. Hence we can from now on use the simplex algorithm to solve the auxiliary problem. If the optimal x 0 is nonzero then, by construction, the original LP has no feasible solution. If x 0 = 0, we can finally ignore x 0 and get a feasible tableau for the original problem, hence we can continue with the simplex algorithm. (If x 0 is currently a basic variable, first exchange it once more, and then ignore it.) This procedure settles the case of arbitrary vectors b. In a pivot step we have in general the choice between several nonbasic variables. We may choose any of them, but we would prefer a choice rule that leads us to the optimum as quickly as possible. Several heuristic rules work well in most cases. Besides the simplex algorithm, the so-called interior point methods (not discussed here) are also widely used. ILP is NP-complete We presume that you know already the notions of polynomial reduction, NP-completeness, and the satisfiabilty problem (SAT) for Boolean formulas in conjunctive normal form (CNF). An important fact is that ILP is NP-complete. To see this, we reduce the NP-complete SAT problem to ILP. In other words, we reformulate any instance of this hard logical problem in polynomial time as an ILP. The idea 5
is really simple: Transform every clause of the given CNF into a linear constraint as follows. The Boolean values 0,1 are interpreted as real numbers, the logical OR ( ) is replaced with a usual addition of numbers (+). A Boolean variable x i is interpreted as a real variable x i. A negated Boolean variable x i is replaced with 1 x i. Now, a clause is true if and only if the sum of these terms is at least 1. Hence our ILP has a feasible solution if and only if the given CNF formula is satisfiable. It also follows that MIP is NP-complete. A consequence is that an ILP formulation alone does not yield a fast algorithm for an optimization problem. We must also utilize specific features of the problem to get good solutions in reasonable time. Therefore we need various approaches to solve such problems. Beware of a frequent misunderstanding: The result does not mean that every single ILP is hard to solve, it only says that (probably) no fast algorithm exists that would be able to solve all ILP. However, one example of a specific NP-complete integer optimization problem is 0,1-Knapsack. (We do not prove this here.) This might be astonishing, because this problem has only one linear constraint. One might expect that integer problems are easier than the corresponding problems with real variables, but the opposite is true: There exist polynomial-time algorithms for LP. The situation is even more bizarre: Polynomial-time algorithms for LP are barely practical. While the simplex method needs exponential time in the worst case, it is practical, in the sense that it works much faster for the vast majority of practical instances. This was known empirically for a long time. Finally this fact has also found a theoretical explanation by an exciting result of Spielman and Teng (Journal of the ACM 51 (2004), pp. 385 463). They analyzed the average runtime in some neighborhood of any instance, that is, an initial instance is slightly modified in a randomized way. The average time is polynomial, even if the initial instance is nasty. 6