Cloud Branching MIP workshop, Ohio State University, 23/Jul/2014 Timo Berthold Xpress Optimization Team Gerald Gamrath Zuse Institute Berlin Domenico Salvagnin Universita degli Studi di Padova This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation s express consent.
MIP & Branching min s.t. c T x Ax b x Z I 0 R C 0 Mixed Integer Program: linear objective & constraints integer variables continuous variables Branching for MIP: improve dual bound fractional variables the optimum of the LP relaxation x i xi x i xi for i I and xi / Z
MIP & Branching min s.t. c T x Ax b x Z I 0 R C 0 Mixed Integer Program: linear objective & constraints integer variables continuous variables Branching for MIP: improve dual bound fractional variables the optimum of the LP relaxation x i xi x i xi for i I and xi / Z
MIP & Branching min s.t. c T x Ax b x Z I 0 R C 0 Mixed Integer Program: linear objective & constraints integer variables continuous variables Branching for MIP: improve dual bound fractional variables the optimum of the LP relaxation x i xi x i xi for i I and xi / Z
MIP & Branching min s.t. c T x Ax b x Z I 0 R C 0 Mixed Integer Program: linear objective & constraints integer variables continuous variables Branching for MIP: improve dual bound fractional variables the optimum of the LP relaxation x i xi x i xi for i I and xi / Z
MIP & Branching min s.t. c T x Ax b x Z I 0 R C 0 Mixed Integer Program: linear objective & constraints integer variables continuous variables Branching for MIP: improve dual bound fractional variables the optimum of the LP relaxation x i xi x i xi for i I and xi / Z
MIP & Branching min s.t. c T x Ax b x Z I 0 R C 0 Mixed Integer Program: linear objective & constraints integer variables continuous variables Branching for MIP: improve dual bound fractional variables the optimum of the LP relaxation x i xi x i xi for i I and xi / Z
MIP & Branching min s.t. c T x Ax b x Z I 0 R C 0 Mixed Integer Program: linear objective & constraints integer variables continuous variables Branching for MIP: improve dual bound fractional variables the optimum of the LP relaxation x i xi x i xi for i I and xi / Z
MIP & Branching min s.t. c T x Ax b x Z I 0 R C 0 Mixed Integer Program: linear objective & constraints integer variables continuous variables Branching for MIP: improve dual bound fractional variables the optimum of the LP relaxation x i xi x i xi for i I and xi / Z
MIP & Branching min s.t. c T x Ax b x Z I 0 R C 0 Mixed Integer Program: linear objective & constraints integer variables continuous variables Branching for MIP: improve dual bound fractional variables the optimum of the LP relaxation x i xi x i xi for i I and xi / Z
Degeneracy naïvely: 1 optimal solution, determined by n constraints/bounds
Degeneracy naïvely: 1 optimal solution, determined by n constraints/bounds at a second thought: 1 optimal solution, k > n tight constraints
Degeneracy naïvely: 1 optimal solution, determined by n constraints/bounds at a second thought: 1 optimal solution, k > n tight constraints but really: an optimal polyhedron
Degeneracy naïvely: 1 optimal solution, determined by n constraints/bounds at a second thought: 1 optimal solution, k > n tight constraints but really: an optimal polyhedron
Degeneracy naïvely: 1 optimal solution, determined by n constraints/bounds at a second thought: 1 optimal solution, k > n tight constraints but really: an optimal polyhedron Goal of this talk: branch on a set (a cloud) of solutions
How degenerated is the typical MIP? Nonbasic columns (incl. slacks) with reduced costs 0.0 120 80 40 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0 MIPLIB 3, MIPLIB2003, MIPLIB2010, Cor@l (462 instances) only 12.5% not dual degenerated dichotomy: either few degeneracy or a lot of
How degenerated is the typical MIP? Nonbasic columns (incl. slacks) with reduced costs 0.0 120 80 40 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0 MIPLIB 3, MIPLIB2003, MIPLIB2010, Cor@l (462 instances) only 12.5% not dual degenerated dichotomy: either few degeneracy or a lot of presolving/cuts do not change this picture by much
Two questions Goal of this talk: branch on a cloud of solutions 1. How do we get extra optimal solutions? 2. How do we use this extra knowledge?
Generating cloud points 1. How do we get extra optimal solutions? restrict LP to optimal face min/max each variable (OBBT) feasibility pump objective (pump-reduce [Achterberg2010])
Generating cloud points 1. How do we get extra optimal solutions? restrict LP to optimal face min/max each variable (OBBT) feasibility pump objective (pump-reduce [Achterberg2010]) x 1 = 0.4 x 2 = 0.55
Generating cloud points 1. How do we get extra optimal solutions? restrict LP to optimal face min/max each variable (OBBT) feasibility pump objective (pump-reduce [Achterberg2010]) min x 1 x 1 = 0.4 x 2 = 0.55
Generating cloud points 1. How do we get extra optimal solutions? restrict LP to optimal face min/max each variable (OBBT) feasibility pump objective (pump-reduce [Achterberg2010]) max x 1 x 1 [0.4, 1.0] x 2 [0.55, 0.7]
Generating cloud points 1. How do we get extra optimal solutions? restrict LP to optimal face min/max each variable (OBBT) feasibility pump objective (pump-reduce [Achterberg2010]) min x 2 x 1 [0.4, 1.0] x 2 [0.0, 0.7]
Generating cloud points 1. How do we get extra optimal solutions? restrict LP to optimal face min/max each variable (OBBT) feasibility pump objective (pump-reduce [Achterberg2010]) max x 2 x 1 [0.4, 1.0] x 2 [0.0, 1.0]
Generating cloud points 1. How do we get extra optimal solutions? restrict LP to optimal face min/max each variable (OBBT) feasibility pump objective (pump-reduce [Achterberg2010]) max x 2 x 1 [0.4, 1.0] x 2 [0.0, 1.0] intervals instead of values
Generating cloud points 1. How do we get extra optimal solutions? restrict LP to optimal face min/max each variable (OBBT) feasibility pump objective (pump-reduce [Achterberg2010]) min (x, x) = x j x j x 1 = 0.4 x 2 = 0.55
Generating cloud points 1. How do we get extra optimal solutions? restrict LP to optimal face min/max each variable (OBBT) feasibility pump objective (pump-reduce [Achterberg2010]) min (x, x) = x j x j = x 1 + x 2 x 1 = 0.4 x 2 = 0.55
Generating cloud points 1. How do we get extra optimal solutions? restrict LP to optimal face min/max each variable (OBBT) feasibility pump objective (pump-reduce [Achterberg2010]) min (x, x) = x j x j = x 1 + x 2 x 1 [0.4, 0.55] x 2 [0.55, 0.9]
Generating cloud points 1. How do we get extra optimal solutions? restrict LP to optimal face min/max each variable (OBBT) feasibility pump objective (pump-reduce [Achterberg2010]) min (x, x) = x j x j = +x 1 + x 2 x 1 [0.4, 0.55] x 2 [0.55, 0.9]
Generating cloud points 1. How do we get extra optimal solutions? restrict LP to optimal face min/max each variable (OBBT) feasibility pump objective (pump-reduce [Achterberg2010]) min (x, x) = x j x j = +x 1 + x 2 x 1 [0.4, 0.9] x 2 [0.55, 1.0]
Generating cloud points 1. How do we get extra optimal solutions? restrict LP to optimal face min/max each variable (OBBT) feasibility pump objective (pump-reduce [Achterberg2010]) min (x, x) = x j x j = +x 1 + x 2 x 1 [0.4, 0.9] x 2 [0.55, 1.0] heuristic intervals stalling is cheap!
Branching rules in MIP 2. How do we use this extra knowledge? 1. Pseudocost branching [BenichouEtAl1971] try to estimate LP values, based on history information effective, cheap, but weak in the beginning combine with strong branching (reliability, hybrid) 2. Strong branching [ApplegateEtAl1995] solve tentative LP relaxations for some candidates effective w.r.t. number of nodes, expensive w.r.t. time 3. Most infeasible branching often referred to as a simple standard rule computationally as bad as random branching!
Exploiting cloud points (pseudocost branching) 2.1 How do we use this extra knowledge? z = c T x z = c T x j + j x j x j pseudocost update x j ς + j = xj x and ς j j = xj xj pseudocost-based estimation + j = Ψ + j ( x j x j ) and j x j = Ψ j (x j x j ) x j x j
Exploiting cloud points (pseudocost branching) 2.1 How do we use this extra knowledge? z = c T x z = c T x j + j x j l j x j pseudocost update u j x j ς + j = xj x... better: ς + j j = xj u j pseudocost-based estimation + j = Ψ + j ( x j x j ) and j x j = Ψ j (x j x j ) x j x j
Exploiting cloud points (pseudocost branching) 2.1 How do we use this extra knowledge? z = c T x z = c T x j j + j + j x j l j x j pseudocost update u j x j ς + j = xj x... better: ς + j j = xj u j pseudocost-based estimation + j = Ψ + j ( x j x j )... better: + j x j l j x j u j = Ψ + j ( x j u j ) x j
Observation Lemma Let x be an optimal solution of the LP relaxation at a given branch-and-bound node and xj l j xj u j xj. Then 1. for fixed and, it holds that ς + j ς + j and ς j ς j, respectively; 2. for fixed Ψ + j and Ψ j, it holds that + j + j and j j, respectively.
Exploiting cloud points (strong branching) 2.2 How do we use this extra knowledge? Full strong branching: solves 2 #frac. var s many LPs uses product of improvement values as branching score improvement on both sides better than on one
Exploiting cloud points (strong branching) 2.2 How do we use this extra knowledge? Full strong branching: solves 2 #frac. var s many LPs uses product of improvement values as branching score improvement on both sides better than on one Benefit of cloud intervals: frac. var. gets integral in cloud point one LP spared cloud branching acts as a filter new frac. var. s new candidates (one side known) Idea: Use 3-partition of branching candidates
Exploiting cloud points (most inf branching) 2.3 How do we use this extra knowledge? Most infeasible branching branch on variable with maximal fractionality min(xj xj, xj xj ) computationally as bad as random branching! maybe because xj is random within cloud interval?
Exploiting cloud points (most inf branching) 2.3 How do we use this extra knowledge? Most infeasible branching branch on variable with maximal fractionality min(xj xj, xj xj ) computationally as bad as random branching! maybe because xj is random within cloud interval? x i x i x i x j x j x j Most infeasible with cloud intervals: select x i with maximal min( y z : y [l i, u i ], z Z) direction in which the optimal face has a small diameter
Exploiting cloud points (most inf branching) 2.3 How do we use this extra knowledge? Most infeasible branching branch on variable with maximal fractionality min(xj xj, xj xj ) computationally as bad as random branching! maybe because xj is random within cloud interval? x i l xi i u i x j l j u j x j Most infeasible with cloud intervals: select x i with maximal min( y z : y [l i, u i ], z Z) direction in which the optimal face has a small diameter
Experimental environment implementation within SCIP pristine settings: No cuts, no heurs, DFS cutoff = optimum branch to prove optimality MIPLIB MIPLIB 3.0, 2003, 2010 industry and academics 168 benchmark instances Cor@l huge collection of 350 instances many combinatorial ones mainly collected from NEOS server Focus: Tree size reduction by cloud information
Cloud intervals at root node Cloud statistics: interval size: average: 0.56 g. mean: 0.25 success rate: average: 65% g. mean: 39% LP iter/success: average: 172.15 g. mean: 31.30 From now on, exclude: No objective, solved at root, no dual degeneracy at root, highly symmetric leaves 158 instances in testbed Note: Did not use cloud points for anything else (heuristics, cuts)
Computational results Pseudocost branching pscost cloud solved 118 94 time (all opt) 24.7 83.4 nodes (all opt) 1945 1441 OBBT takes a lot of time! fewer nodes better branching choices more reliable estimates
Computational results Most infeasible branching mostinf cloud solved 88 81 time (all opt) 27.7 58.8 nodes (all opt) 2553 1414 running time doubles but: enumerational effort (=nodes) nearly halved
Computational results Most infeasible branching mostinf cloud solved 88 81 time (all opt) 27.7 58.8 nodes (all opt) 2553 1414 running time doubles but: enumerational effort (=nodes) nearly halved Better than random! :-)
Cloud strong branching algorithm frac. var. gets integral in cloud point one LP spared Idea: Use 3-partition F 2, F 1, F 0 of branching candidates improvements in both directions for F 2 disregard F 1 F 0 improvement in one direction for F 2 F 1 disregard F 0 alternatively: Only use F 2
Cloud strong branching algorithm frac. var. gets integral in cloud point one LP spared Idea: Use 3-partition F 2, F 1, F 0 of branching candidates improvements in both directions for F 2 disregard F 1 F 0 improvement in one direction for F 2 F 1 disregard F 0 alternatively: Only use F 2 stop pump-reduce procedure when new cloud point does not imply new integral bound
Computational results Full strong branching full test sets, default settings, heuristic pump-reduce strong branch cloud branch Test set Nodes Time (s) Nodes Time (s) MIPLIB 691 72.0 661 68.2 COR@L 593 157.3 569 118.3 little less nodes 5.5% faster on MIPLIB (few affected instances) 30% faster on COR@L
Wrap up Conclusion: exploit knowledge of alternative relaxation optima mostinf: 44% nodes, pscost: 26%, hybrid: 13% full strong branching: 30% running time Outlook: faster generation of cloud intervals? predictions when cloud branching is worthwhile? cloud points from alternative relaxations (MINLP!) SB: nonchimerical + cloud + propagation =?
Commercial break Xpress Optimization Suite 7.7 came out in April 2014 Highlights: parallel dual simplex (mini iterations) MIP presolving w.r.t. objective Mosel support for robust constraints and uncertainty improved performance on MISOCP XPRESS MOSEK GUROBI CPLEX SCIP 1.0 1.24 2.85 5.84 16 (MI)SOCP Benchmark http://plato.asu.edu/ftp/socp.html (17/Jun/2014)
Commercial break Xpress Optimization Suite 7.7 came out in April 2014 Highlights: parallel dual simplex (mini iterations) MIP presolving w.r.t. objective Mosel support for robust constraints and uncertainty improved performance on MISOCP XPRESS MOSEK GUROBI CPLEX SCIP 1.0 1.24 2.85 5.84 16 (MI)SOCP Benchmark http://plato.asu.edu/ftp/socp.html (17/Jun/2014) Coming soon: Cloud branching
Thank You Timo Berthold timoberthold@fico.com This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation s express consent.