Lagrangian Relaxation in CP Willem-Jan van Hoeve CPAIOR 016 Master Class
Overview 1. Motivation for using Lagrangian Relaxations in CP. Lagrangian-based domain filtering Example: Traveling Salesman Problem. Relaxed Decision Diagrams Example: Disjunctive Scheduling. Lagrangian Propagation Improve communication between constraints Acknowledgements: Presentations from Andre Cire and Hadrien Cambazard
Lagrangian Relaxation
Lagrangian Relaxation Move subset (or all) of constraints into the objective with penalty multipliers μ:
Lagrangian Relaxation Move subset (or all) of constraints into the objective with penalty multipliers μ: Weak duality: for any choice of μ, Lagrangean L(μ) provides a lower bound on the original LP
Lagrangian Relaxation Move subset (or all) of constraints into the objective with penalty multipliers μ: Weak duality: for any choice of μ, Lagrangean L(μ) provides a lower bound on the original LP Goal: find optimal μ (providing the best bound) via
Lagrangian Relaxations are Awesome Can be applied to nonlinear programming problems (NLPs), LPs, and in the context of integer programming Can provide better bounds than LP relaxation: z LP z Lagr z IP Provides domain filtering analogous to that based on LP duality Can be efficiently and/or heuristically solved Lagrangian relaxation can dualize difficult constraints Can exploit the problem structure, e.g., the Lagrangian relaxation may decouple, or L(μ) may be very fast to solve combinatorially
Example Application: TSP Visit all vertices exactly once, with minimum total distance Graph G = (V,E) with vertex set V and edge set E V = n w(i,j): distance between i and j 5
CP model for the TSP (version 1) Permutation model variable x i represents the i-th city to be visited introduce copy of vertex 1: vertex n+1 min z s.t. z = i w(x i, x i+1 ) alldifferent(x 1,, x n ) x n+1 = x 1 6
CP Solving min z s.t. z = i=1.. w(x i, x i+1 ) (1) alldifferent(x 1,, x ) () x 5 = x 1 () Variable domains: D(z) = { 0..inf } D(x i ) = {1,,,} for i=1,,5 1 7 9 5 8 6 Propagate (1) to update D(z) = {16,..,6} 7
CP Solving cont d min z s.t. z = i=1.. w(x i, x i+1 ) (1) alldifferent(x 1,, x ) () x 5 = x 1 () 1 7 9 5 8 6 x 1 = 1 x 1 1 Propagate () : D(x ) = {,,} D(x ) = {,,} D(x ) = {,,} Propagate () : D(x 5 ) = {1} Propagate (1) : no updates 8
CP Solving cont d min z s.t. z = i=1.. w(x i, x i+1 ) (1) alldifferent(x 1,, x ) () x 5 = x 1 () x 1 = 1 x 1 1 x = x 1 7 9 5 8 6 x = x D(x ) = {} D(z) = {7} 9
Drawback of first CP Model Objective is handled separately from constraints Interaction via domain propagation only Weak bounds 10
Other CP models for the TSP Successor model variable next i represents the immediate successor of city i min z = i w(i, next i ) s.t. alldifferent(next 1,, next n ) path(next 1,, next n ) 11
Other CP models for the TSP Successor model variable next i represents the immediate successor of city i min z = i w(i, next i ) s.t. alldifferent(next 1,, next n ) path(next 1,, next n ) objective and constraints still decoupled 11
Other CP models for the TSP Successor model variable next i represents the immediate successor of city i min z = i w(i, next i ) s.t. alldifferent(next 1,, next n ) path(next 1,, next n ) objective and constraints still decoupled Integrated model using optimization constraint [Focacci et al., 1999, 00] min z s.t. alldifferent(next 1,, next n ) [redundant] WeightedPath(next, w, z) 11
Optimization Constraint WeightedPath(next, w, z) Ensures that variables next i represent a Hamiltonian path such that the total weight of the path equals variable z Benefits: Stronger bounds from constraint structure, e.g., based on LP relaxation Cost-based domain filtering: if next i =v leads to path of weight > max(d(z)), remove v from D(next i ) 1
Relaxations for TSP Observe that the TSP is a combination of two constraints The degree of each node is The solution is connected (no subtours) 5 Relaxations: relax connectedness: Assignment Problem relax degree constraints: 1-Tree Relaxation 1 6 1
1-Tree Relaxation Relax the degree constraints [Held & Karp, 1970, 1971] E.g., minimum spanning tree (has n-1 edges) 1 7 9 P.S. An MST can be found in O(m α(m,n)) time 5 8 6 1
1-Tree Relaxation Relax the degree constraints [Held & Karp, 1970, 1971] E.g., minimum spanning tree (has n-1 edges) 1-Tree extends this with one more edge Choose any node v (which is called the 1-node) Build a minimum spanning tree T on G = (V\{v}, E) Add the smallest two edges linking v to T 1 7 9 P.S. An MST can be found in O(m α(m,n)) time 5 8 6 1
1-Tree Relaxation Relax the degree constraints [Held & Karp, 1970, 1971] E.g., minimum spanning tree (has n-1 edges) 1-Tree extends this with one more edge Choose any node v (which is called the 1-node) Build a minimum spanning tree T on G = (V\{v}, E) Add the smallest two edges linking v to T 1 7 9 P.S. An MST can be found in O(m α(m,n)) time 5 8 6 1
1-Tree Relaxation Relax the degree constraints [Held & Karp, 1970, 1971] E.g., minimum spanning tree (has n-1 edges) 1-Tree extends this with one more edge Choose any node v (which is called the 1-node) Build a minimum spanning tree T on G = (V\{v}, E) Add the smallest two edges linking v to T 1 7 9 P.S. An MST can be found in O(m α(m,n)) time 5 8 6 1
1-Tree Relaxation Relax the degree constraints [Held & Karp, 1970, 1971] E.g., minimum spanning tree (has n-1 edges) 1-Tree extends this with one more edge Choose any node v (which is called the 1-node) Build a minimum spanning tree T on G = (V\{v}, E) Add the smallest two edges linking v to T 1 7 9 P.S. An MST can be found in O(m α(m,n)) time 5 8 6 1
Improved 1-Tree: Lagrangian relaxation Let binary variable x e represent whether edge e is used 15
Improved 1-Tree: Lagrangian relaxation Let binary variable x e represent whether edge e is used 15
Improved 1-Tree: Lagrangian relaxation Let binary variable x e represent whether edge e is used Lagrangian multipliers π i (penalties for node degree violation) 15
Held-Karp Bound 16
Held-Karp Bound 16
Held-Karp Bound 16
Held-Karp Bound How to find the best penalties π? In this case, we can exploit a combinatorial interpretation No need to solve an LP 16
Held-Karp Iteration Find minimum 1-tree T w.r.t. w (i,j) = w(i,j) π i π j Lower bound: cost(t) + i π i If T is not a tour, update multipliers as π 1 π i += (-degree(i) )*β and repeat π 7 9 5 8 (step size β different per iteration) π 6 π 17
Example 0 β = 7 9 0 0 5 6 8 0 bound: 18
Example 0 β = 0 7 9 0 0 0 5 6 8 0 - bound: 18
Example 0 β = 0 7 9 6 7 7 0 0 0 5 6 8 0-7 6 6 bound: 18
Example 0 β = 0 7 9 6 7 7 0 0 0 5 6 8 0-7 6 6 bound: 18
Example 0 β = 0 7 9 6 7 7 0 0 0 5 6 8 0-7 6 6 bound: bound: 5 (optimal) 18
Example 0 7 9 β = 6 0 7 7 Cost-based propagation? 0 0 0 5 6 8 0-7 6 6 bound: bound: 5 (optimal) 18
Embed 1-Tree in CP We need to reason on the graph structure manipulate the graph, remove costly edges, etc. Not easily done with next and position variables e.g., how can we enforce that a given edge e=(i,j) is mandatory? Ideally, we want to have access to the graph rather than local successor/predecessor information modify definition of our global constraint 19
One more CP model for the TSP Integrated model based on graph representation min z s.t. weighted-circuit(x, G, z) G=(V,E,w) is the graph with vertex set V, edge set E, weights w Set of binary variables X = { x e e E } representing the tour In CP, X can be modeled using a set variable Variable z represents the total weight [Benchimol et al. 01] 0
Propagation Given constraint weighted-circuit( X, G, z) Apply the Held-Karp relaxation to remove sub-optimal edges (x e = 0), i.e., total weight > max(d(z)) force mandatory edges (x e = 1) update bounds of z 1
Propagation Given constraint 0 weighted-circuit( X, G, z) 6 7 7 Apply the Held-Karp relaxation to remove sub-optimal edges (x e = 0), i.e., total weight > max(d(z)) - 7 0 6 6 force mandatory edges (x e = 1) update bounds of z bound: 5 1
Propagation Given constraint 0 weighted-circuit( X, G, z) 6 7 7 Apply the Held-Karp relaxation to remove sub-optimal edges (x e = 0), i.e., total weight > max(d(z)) force mandatory edges (x e = 1) update bounds of z 0 7 6-6 bound: 5 Suppose D(z) = {5} 1
Propagation Given constraint weighted-circuit( X, G, z) Apply the Held-Karp relaxation to remove sub-optimal edges (x e = 0), i.e., total weight > max(d(z)) force mandatory edges (x e = 1) update bounds of z Effectiveness depends on multipliers! Suppose D(z) = {5} [Sellmann, 00]
Propagation Given constraint 0 weighted-circuit( X, G, z) 5.5 7 7.5 Apply the Held-Karp relaxation to remove sub-optimal edges (x e = 0), i.e., total weight > max(d(z)) force mandatory edges (x e = 1) update bounds of z Effectiveness depends on multipliers! 0 6.5 6.5-1.5 1.5 6 bound: 5 Suppose D(z) = {5} [Sellmann, 00]
Propagation Given constraint 0 weighted-circuit( X, G, z) 5.5 7 7.5 Apply the Held-Karp relaxation to remove sub-optimal edges (x e = 0), i.e., total weight > max(d(z)) force mandatory edges (x e = 1) update bounds of z Effectiveness depends on multipliers! 0 6.5 6.5-1.5 1.5 6 bound: 5 Suppose D(z) = {5} [Sellmann, 00] [Benchimol et al. 010, 01] [Regin et al. 010]
Impact of Propagation (st70 from TSPLIB) upper bound = 700
Impact of Propagation (st70 from TSPLIB) upper bound = 700 upper bound = 675
Impact on Complete (Exact) Solver randomly generated symmetric TSPs, time limit 1800s average over 0 instances per size class
Impact on Complete (Exact) Solver randomly generated symmetric TSPs, time limit 1800s average over 0 instances per size class
Impact on Complete (Exact) Solver randomly generated symmetric TSPs, time limit 1800s average over 0 instances per size class Extended to ATSP [Fages & Lorca, 01] and degree-constraint MST [Fages et al., 016]
Many More Applications TSP [Caseau et al., 1997] TSPTW [Focacci et al., 000, 00] Traveling Tournament Problem [Benoist et al., 001] Capacitated Network Design [Sellmann et al., 00] Automated Recording Problem [Sellmann et al, 00] Network Design [Cronholm et al, 00] Resource-Constrained Shortest Path Problem [Gellermann et al., 005] [Gualandi, 01] Personnel Scheduling [Menana et al., 009] Multileaf Sequencing Collimator [Cambazard et al., 009] Parallel Machine Scheduling [Edis et al., 011] Traveling Purchaser Problem [Cambazard, 01] Empirical Model Learning [Lombardi et al., 01] NPV in Resource-Constrained Projects [Gu et al., 01] 5
Disjunctive Scheduling / Sequencing Sequencing and scheduling of activities on a resource Activities Processing time: p i 0 1 Release time: r i Deadline: d i Resource Nonpreemptive Process one activity at a time Activity 1 Activity Activity 6
Variants and Extension Precedence relations between activities Sequence-dependent setup times Various objective functions Makespan Sum of setup times (Weighted) sum of completion times (Weighted) tardiness number of late jobs 7
Variants and Extension Precedence relations between activities Sequence-dependent setup times Various objective functions Makespan Sum of setup times (Weighted) sum of completion times (Weighted) tardiness number of late jobs Includes: TSP with time windows single-machine scheduling sequential ordering problem 7
Running Example Tasks Release Date r = tasks to perform on a machine Each task has a processing time and a release time The machine can perform at most one task at a time, non-preemptively Objective: minimize completion time 8
Running Example Tasks Release Date r = 0 9
Running Example Tasks Release Date r = 0 9
Running Example Tasks Release Date r = 0 9
Running Example Tasks Release Date r = 0 11 9
Running Example Tasks Release Date r = Completion Time: 11 0 11 9
Running Example Tasks Release Date r = 0 11 0
Running Example Tasks Release Date r = 0 11 0
Running Example Tasks Release Date r = 0 1 11 0
Running Example Tasks Release Date r = 0 1 10 11 0
Running Example Tasks Release Date r = Completion Time: 10 0 1 10 11 0
CP Model min C max s.t. NoOverlap(s j p j : all j) C max = max{s 1 +p 1,, s n +p n } s i {r i,..., T}, all i Conventional propagation Use Cartesian product of variable domains as relaxation Propagate domains to strengthen relaxation 1
Relaxed Decision Diagram Arc Tasks Release Date r =
Relaxed Decision Diagram Arc Tasks Release Date r = Layered Acyclic Graph
Relaxed Decision Diagram r Arc Tasks Release Date r = Layered Acyclic Graph t
Relaxed Decision Diagram r Arc Tasks Release Date r = Layered Acyclic Graph Arcs have weights t
Relaxed Decision Diagram 5 r Arc Tasks Release Date r = Layered Acyclic Graph Arcs have weights t
Relaxed Decision Diagram 5 r Arc Tasks Release Date r = t Layered Acyclic Graph Arcs have weights Paths correspond to solutions, path lengths the solution costs
Relaxed Decision Diagram 5 r First task Arc Tasks Release Date r = t Layered Acyclic Graph Arcs have weights Paths correspond to solutions, path lengths the solution costs
Relaxed Decision Diagram 5 r First task Arc Tasks Release Date r = Second task t Layered Acyclic Graph Arcs have weights Paths correspond to solutions, path lengths the solution costs
Relaxed Decision Diagram 5 r First task Arc Tasks Release Date r = Second task t Third task Layered Acyclic Graph Arcs have weights Paths correspond to solutions, path lengths the solution costs
Relaxed Decision Diagram 5 r First task Arc Tasks Release Date r = Second task Third task t
Relaxed Decision Diagram 5 r First task Arc Tasks Release Date r = Second task Completion Time: 11 Third task t 11
Relaxed Decision Diagram 5 r First task Arc Tasks Release Date r = Second task Third task t
Relaxed Decision Diagram 5 r First task Arc Tasks Release Date r = Second task It is a relaxation Third task t
Relaxed Decision Diagram 5 r First task Arc Tasks Release Date r = Second task t Third task It is a relaxation All feasible solutions encoded by some path, but it may contain some infeasible solutions as well
Relaxed Decision Diagram 5 r First task Arc Tasks Release Date r = Second task Third task t 5
Relaxed Decision Diagram 5 r First task Arc Tasks Release Date r = Second task Third task t 9 5
Relaxed Decision Diagram 5 r First task Arc Tasks Release Date r = Second task Third task t 9 Infeasible 5
Relaxed Decision Diagram 5 r First task Arc Tasks Release Date r = Second task Completion Time: 9 Third task t 9 Infeasible 5
Relaxed Decision Diagram 5 r First task Arc Tasks Release Date r = Second task Third task t 6
Relaxed Decision Diagram 5 r First task Arc Tasks Release Date r = Second task Third task Shortest path t 6
Relaxed Decision Diagram 5 r First task Arc Tasks Release Date r = Second task t Third task Shortest path Length: Lower bound on the optimal solution value 6
Relaxed Decision Diagram 5 r First task Arc Tasks Release Date r = Second task Completion Time: 8 Third task t 1 8 7
Uses of Relaxed Decision Diagrams Obtain optimization bounds (as LP relaxations do) As an inference mechanism For example, as global constraints in CP (MDD-based CP) To guide search (in new branch-and-bound methods) 8
Issues 5 r First task Solutions of a relaxed DD may violate several constraints of the problem Second task Third task t 9
Issues 5 r First task Solutions of a relaxed DD may violate several constraints of the problem Second task Third task t 0
Issues 5 r First task Solutions of a relaxed DD may violate several constraints of the problem Violation: All tasks performed once Second task for all tasks i Third task t 0
Remedy: Lagrangian Relaxation min s.t. z = shortest path e v(e)=i x e = 1, for all tasks i (+other problem constraints) [Bergman et al., 015] 1
Remedy: Lagrangian Relaxation min s.t. z = shortest path e v(e)=i x e = 1, for all tasks i (+other problem constraints) [Bergman et al., 015] Lagrangian multipliers λ i 1
Remedy: Lagrangian Relaxation min s.t. z = shortest path e v(e)=i x e = 1, for all tasks i (+other problem constraints) [Bergman et al., 015] Lagrangian multipliers λ i min z = shortest path + i λ i (1 - e v(e)=i x e ) s.t. (other problem constraints) 1
Remedy: Lagrangian Relaxation min s.t. z = shortest path e v(e)=i x e = 1, for all tasks i (+other problem constraints) [Bergman et al., 015] Lagrangian multipliers λ i min z = shortest path + i λ i (1 - e v(e)=i x e ) s.t. (other problem constraints) This is done by updating shortest path weights! 1
General Approach We penalize infeasible solutions in a relaxed DD: Any separable constraint of the form f 1 (x 1 ) + f (x ) + + f n (x n ) c that must be satisfied by solutions of an MDD can be dualized We need only to focus on the shortest path solution Identify a violated constraint and penalize Systematic way directly adapted from LP Shortest paths are very fast to compute
Improving Relaxed Decision Diagram 5 r First task Arc Tasks Release Date r = Second task Third task t
Improving Relaxed Decision Diagram 5 r First task Arc Tasks Release Date r = Second task Third task t
Improving Relaxed Decision Diagram 5 r First task Arc Tasks Release Date r = Second task Completion Time: 8 Third task t
Improving Relaxed Decision Diagram 5 r First task Arc Tasks Release Date r = Second task Completion Time: 8 t Third task Penalization: If a task is repeated, increase its arc weight If a task is unused, decrease its arc weight
Improving Relaxed Decision Diagram 5 r First task Arc Tasks Release Date r = Second task Completion Time: 8 t Third task Penalization: If a task is repeated, increase its arc weight If a task is unused, decrease its arc weight 5
Improving Relaxed Decision Diagram 5 r First task Arc Tasks Release Date r = Second task Completion Time: 8 t Third task Penalization: If a task is repeated, increase its arc weight If a task is unused, decrease its arc weight 6
Improving Relaxed Decision Diagram 5 r First task Arc Tasks Release Date r = Second task Third task t 7
Improving Relaxed Decision Diagram 5 r First task Arc Tasks Release Date r = Second task New shortest path: 10 Third task t 7
Improving Relaxed Decision Diagram 5 r First task Arc Tasks Release Date r = Second task Third task New shortest path: 10 Guaranteed to be a valid lower bound for any penalties t 7
Additional Filtering 5 r t 8
Additional Filtering 5 r If minimum solution value through an arc exceeds max(d(z)) then arc can be deleted t 8
Additional Filtering 5 r If minimum solution value through an arc exceeds max(d(z)) then arc can be deleted Suppose a solution of value 10 is known t 8
Additional Filtering 5 r If minimum solution value through an arc exceeds max(d(z)) then arc can be deleted Suppose a solution of value 10 is known t 8
Additional Filtering 5 r If minimum solution value through an arc exceeds max(d(z)) then arc can be deleted Suppose a solution of value 10 is known MDD filtering extends to Lagrangian weights: More filtering possible t 8
Impact on TSP with Time Windows 9
Impact on TSP with Time Windows TSPTW instances (Dumas and GendreauDumasExtended) (Constraints, 015) 9
Beyond Single Optimization Constraints Constraint propagation is CP s major strength Rich modeling interface, fast domain filtering, combinatorial programming but also its major weakness (when using domains) Cartesian product of variable domains is weak relaxation, Conventional domain propagation has limited communication power Propagating relaxed MDDs can help, but not in all cases 50
Beyond Single Optimization Constraints Constraint propagation is CP s major strength Rich modeling interface, fast domain filtering, combinatorial programming but also its major weakness (when using domains) Cartesian product of variable domains is weak relaxation, Conventional domain propagation has limited communication power Propagating relaxed MDDs can help, but not in all cases Up next: Lagrangian Propagation instead of domain propagation Via Lagrangian decomposition [Bergman et al. CP015], [Ha et al., CP015] 50
Lagrangian Decomposition [Guignard & Kim, 1987] 51
Lagrangian Decomposition [Guignard & Kim, 1987] 51
Lagrangian Decomposition [Guignard & Kim, 1987] 51
Lagrangian Decomposition [Guignard & Kim, 1987] 51
Lagrangian Decomposition [Guignard & Kim, 1987] 51
Lagrangian Decomposition Bound from Lagrangian Decomposition at least as strong as Lagrangian relaxation from either dualizing Ax b or Cx d [Guignard & Kim, 1987] 51
Motivating CP Example 5
Motivating CP Example 5
Motivating CP Example Domain consistent; even pairwise consistent Constraint propagation has no effect here 5
Motivating CP Example Domain consistent; even pairwise consistent Constraint propagation has no effect here 5
Motivating CP Example Domain consistent; even pairwise consistent Constraint propagation has no effect here 5
Lagrangian Decomposition for CP 5
Lagrangian Decomposition for CP Represent as So 5
Lagrangian Decomposition for CP Represent as So 5
Lagrangian Decomposition for CP Represent as So 5
Final Decomposition 5
Final Decomposition sum gives upper bound on satisfiability 5
Final Decomposition sum gives upper bound on satisfiability Can be used for feasibility problems and optimization problems Synchronize the support solutions within the constraints Systematic method to improve bounding in CP Generic relaxation 5
Final Decomposition sum gives upper bound on satisfiability Can be used for feasibility problems and optimization problems Synchronize the support solutions within the constraints Systematic method to improve bounding in CP Generic relaxation Extended cost-based domain filtering! 5
Global Lagrangian Domain Filtering Consider Lagrangian Decomposition with subproblems j=1..m Let be the objective value of j-th subproblem, subject to x i =v 55
Global Lagrangian Domain Filtering Consider Lagrangian Decomposition with subproblems j=1..m Let be the objective value of j-th subproblem, subject to x i =v We have, given lower bound B: If j < B then v can be removed from D(x i ) 55
Global Lagrangian Domain Filtering Consider Lagrangian Decomposition with subproblems j=1..m Let be the objective value of j-th subproblem, subject to x i =v We have, given lower bound B: If j < B then v can be removed from D(x i ) E.g., for our example, we can deduce x 1 a, x b, x c, x 5 c 55
Discussion Keep Lagrangian multipliers under control Apply to a select subset of constraints Can solve at root and keep multipliers fixed during search Optimality not required for Lagrangian relaxation Computing of z j x i=v may be challenging Depends on constraint structure Can use relaxation of the constraint instead (any bound holds) Can use Relaxed Decision Diagram: Automatic extension 56
Experimental Results: Alldifferent root node bound improvement performance plot comparison Systems of overlapping alldifferent constraints, with weighted sum as objective. Each MDD represents a subset of constraints 57
Experimental Results: Set Covering optimality gap Single MDD relaxation widths,000 and,0000 Lagrangian Decomposition split constraints into multiple MDDs Problem instances with increasing bandwidth Lagrangian Decomposition much more stable! 58
More References Fontaine, Michel, Van Hentenryck [CP 01] Ha, Quimper, Rousseau [CP 015] Bergman, Cire, v.h. [CP 015] Chu, Gange, Stuckey, Lagrangian Decomposition via Subproblem Search CPAIOR 016 Monday May 0, 10:5-1:00 session 59
Summary Lagrangian Relaxation and Decomposition provide systematic and efficient approach to improve optimization bounds in CP improve constraint propagation CP s is ideal environment for automated Lagrangian relaxations problem is represented with building blocks (constraints) that communicate combinatorial programming 60