Introduction to Constrained Optimization Duality and KKT Conditions Pratik Shah {pratik.shah [at] lnmiit.ac.in} The LNM Institute of Information Technology www.lnmiit.ac.in February 13, 2013 LNMIIT MLPR Convex Optimization 1/14
Geometry of the Problem LNMIIT MLPR Convex Optimization 2/14
Geometry of the Problem Let us try to solve optimization problem min f (x) = min x 2 x R2 x R 2 1 + x2 2 (1) LNMIIT MLPR Convex Optimization 2/14
Geometry of the Problem Let us try to solve optimization problem subject to min f (x) = min x 2 x R2 x R 2 1 + x2 2 (1) g(x) = 2 x 1 x 2 0 (2) LNMIIT MLPR Convex Optimization 2/14
Geometry of the Problem Let us try to solve optimization problem subject to min f (x) = min x 2 x R2 x R 2 1 + x2 2 (1) g(x) = 2 x 1 x 2 0 (2) Constrained optimization demands more than stationarity at optimal point, i.e. x f (x) = 0 is not sufficient. Why? LNMIIT MLPR Convex Optimization 2/14
How do we treat Constraint? LNMIIT MLPR Convex Optimization 3/14
How do we treat Constraint? Solve this constrained optimization with a Lagrange multiplier L(x, α) = f (x) + αg(x) (3) = (x 2 1 + x 2 2 ) + α(2 x 1 x 2 ) LNMIIT MLPR Convex Optimization 3/14
How do we treat Constraint? Solve this constrained optimization with a Lagrange multiplier and the solution is L(x, α) = f (x) + αg(x) (3) = (x 2 1 + x 2 2 ) + α(2 x 1 x 2 ) x = (x 1, x 2 ) = (1, 1) (4) Before proceeding further let us first fix some notations and concepts LNMIIT MLPR Convex Optimization 3/14
Convex Optimization [1] Study of mathematical optimization problems of the form, minimize f (x) x R n subject to x C (5) x R n is a vector known as the optimization variable, f : R n R is a convex function that we want to minimize, and C R n is a convex set describing the set of feasible solutions Recall the definition of a Convex set and a Convex function 1 1 A function f : S R is convex if S is a convex set, and for any x, y S and λ [0, 1], we have f (λx + (1 λ)y) λf (x) + (1 λ)f (y). A function f is concave if f is convex. LNMIIT MLPR Convex Optimization 4/14
Lagrange Duality LNMIIT MLPR Convex Optimization 5/14
Lagrange Duality The theory of Lagrange duality is the study of optimal solutions to convex optimization problems LNMIIT MLPR Convex Optimization 5/14
Lagrange Duality The theory of Lagrange duality is the study of optimal solutions to convex optimization problems In this lecture we will discuss about differentiable convex optimization problems of the form minimize f (x) x R n subject to g i (x) 0, i = 1,..., m h i (x) = 0, i = 1,..., p (6) where f and g i are differentiable convex functions and h i are affine functions LNMIIT MLPR Convex Optimization 5/14
The Lagrangian LNMIIT MLPR Convex Optimization 6/14
The Lagrangian Intuitively, the Lagrangian can be thought of as a modified version of the original objective function. The Lagrangian L : R n R m R p R, is defined as L(x, α, β) = f (x) + m α i g i (x) + i=1 p β i h i (x). (7) i=1 LNMIIT MLPR Convex Optimization 6/14
The Lagrangian Intuitively, the Lagrangian can be thought of as a modified version of the original objective function. The Lagrangian L : R n R m R p R, is defined as L(x, α, β) = f (x) + m α i g i (x) + i=1 p β i h i (x). (7) i=1 Primal variables : x R n Dual variables : α i R m and β i R p LNMIIT MLPR Convex Optimization 6/14
The Lagrangian Intuitively, the Lagrangian can be thought of as a modified version of the original objective function. The Lagrangian L : R n R m R p R, is defined as L(x, α, β) = f (x) + m α i g i (x) + i=1 p β i h i (x). (7) i=1 Primal variables : x R n Dual variables : α i R m and β i R p The Lagrange multipliers α i and β i can be though of as costs associated with violating different constraints LNMIIT MLPR Convex Optimization 6/14
Intuition behind Lagrange Duality For any convex optimization problem, there always exist settings of the dual variables such that the unconstrained minimum of the Lagrangian with respect to the primal variables (keeping the dual variables fixed) coincides with the solution of the original constrained optimization problem [2]. LNMIIT MLPR Convex Optimization 7/14
Primal Problem Consider the optimization problem, { } min max L(x, α, β) x α,β:α i 0, i } {{ } call this θ P (x) = min x θ P (x) (8) The function θ P : R n R is called the primal objective, and the unconstrained optimization problem on the right hand side is known as the primal problem A point x R n is primal feasible if g i (x) 0, i = 1,..., m and h i (x) = 0, i = 1,..., p The optimal value of the primal objective is denoted by p = θ P (x ) and x R n is called the solution of primal problem LNMIIT MLPR Convex Optimization 8/14
Interpretation of Primal Problem 2 θ P (x) = max α,β:α i 0, i = max α,β:α i 0, i = f (x) + max α,β:α i 0, i L(x, α, β) (9) [ ] m p f (x) + α i g i (x) + β i h i (x) i=1 [ m α i g i (x) + i=1 i=1 ] p β i h i (x) i=1 2 Observe that the primal objective, θ P(x), is a convex function of x LNMIIT MLPR Convex Optimization 9/14
Interpretation of Primal Problem 2 θ P (x) = max α,β:α i 0, i = max α,β:α i 0, i = f (x) + max α,β:α i 0, i L(x, α, β) (9) [ ] m p f (x) + α i g i (x) + β i h i (x) i=1 [ m α i g i (x) + i=1 i=1 ] p β i h i (x) i=1 θ P (x) = { 0 if x is primal feasible f (x) + }{{} if x is primal infeasible original objective }{{} barrier function for carving away infeasible solutions (10) 2 Observe that the primal objective, θ P(x), is a convex function of x LNMIIT MLPR Convex Optimization 9/14
Dual Problem Switching the order of min and max we obtain an entirely different optimization problem { } max min L(x, α, β) = max θ α,β:α i 0, i } x D(α, β) (11) α,β:α {{} i 0, i call this θ D (α,β) The function θ D : R m R p R is called the dual objective, and the unconstrained optimization problem on the right hand side is known as the dual problem Generally, we say that (α, β) are dual feasible if α i 0, i = 1,..., m The optimal value of the dual objective is denoted by d = θ D (α, β ) and (α, β ) R m R p is called the solution of the dual problem LNMIIT MLPR Convex Optimization 10/14
Recall the Example Let us re-visit the example that we solved in the beginning min f (x) = min x 2 x R2 x R 2 1 + x2 2 (12) LNMIIT MLPR Convex Optimization 11/14
Recall the Example Let us re-visit the example that we solved in the beginning subject to min f (x) = min x 2 x R2 x R 2 1 + x2 2 (12) g(x) = 2 x 1 x 2 0 (13) LNMIIT MLPR Convex Optimization 11/14
Recall the Example Let us re-visit the example that we solved in the beginning subject to min f (x) = min x 2 x R2 x R 2 1 + x2 2 (12) g(x) = 2 x 1 x 2 0 (13) Do you see primal and dual formulations? LNMIIT MLPR Convex Optimization 11/14
Interpretation of Dual Problem LNMIIT MLPR Convex Optimization 12/14
Interpretation of Dual Problem Lemma 1. If (α, β) are dual feasible, then θ D (α, β) p. 2. (Weak Duality) For any pair of primal and dual problems, d p. 3. (Strong Duality) For any pair of primal and dual problems which satisfy certain technical conditions called constraint qualification, then d = p. 4. (Complementary slackness) If strong duality holds, then α i g i(x ) = 0 for each i = 1,..., m. LNMIIT MLPR Convex Optimization 12/14
The KKT Conditions [3] LNMIIT MLPR Convex Optimization 13/14
The KKT Conditions [3] Theorem Suppose that x R n, α R m and β R p satisfy the following conditions: LNMIIT MLPR Convex Optimization 13/14
The KKT Conditions [3] Theorem Suppose that x R n, α R m and β R p satisfy the following conditions: 1. (Primal feasibility) g i (x ) 0, i = 1,..., m and h i (x ) = 0, i = 1,..., p, LNMIIT MLPR Convex Optimization 13/14
The KKT Conditions [3] Theorem Suppose that x R n, α R m and β R p satisfy the following conditions: 1. (Primal feasibility) g i (x ) 0, i = 1,..., m and h i (x ) = 0, i = 1,..., p, 2. (Dual feasibility) α i 0, i = 1,..., p, LNMIIT MLPR Convex Optimization 13/14
The KKT Conditions [3] Theorem Suppose that x R n, α R m and β R p satisfy the following conditions: 1. (Primal feasibility) g i (x ) 0, i = 1,..., m and h i (x ) = 0, i = 1,..., p, 2. (Dual feasibility) α i 0, i = 1,..., p, 3. (Complementary slackness) α i g i(x ) = 0 for each i = 1,..., m, and LNMIIT MLPR Convex Optimization 13/14
The KKT Conditions [3] Theorem Suppose that x R n, α R m and β R p satisfy the following conditions: 1. (Primal feasibility) g i (x ) 0, i = 1,..., m and h i (x ) = 0, i = 1,..., p, 2. (Dual feasibility) α i 0, i = 1,..., p, 3. (Complementary slackness) α i g i(x ) = 0 for each i = 1,..., m, and 4. (Lagrangian stationarity) x L(x, α, β ) = 0. LNMIIT MLPR Convex Optimization 13/14
The KKT Conditions [3] Theorem Suppose that x R n, α R m and β R p satisfy the following conditions: 1. (Primal feasibility) g i (x ) 0, i = 1,..., m and h i (x ) = 0, i = 1,..., p, 2. (Dual feasibility) α i 0, i = 1,..., p, 3. (Complementary slackness) α i g i(x ) = 0 for each i = 1,..., m, and 4. (Lagrangian stationarity) x L(x, α, β ) = 0. Then x is primal optimal and (α, β ) are dual optimal. furhtermore, if strong duality holds, then any primal optimal x and dual optimal (α, β ) must satisfy the conditions 1 through 4. LNMIIT MLPR Convex Optimization 13/14
The KKT Conditions [3] Theorem Suppose that x R n, α R m and β R p satisfy the following conditions: 1. (Primal feasibility) g i (x ) 0, i = 1,..., m and h i (x ) = 0, i = 1,..., p, 2. (Dual feasibility) α i 0, i = 1,..., p, 3. (Complementary slackness) α i g i(x ) = 0 for each i = 1,..., m, and 4. (Lagrangian stationarity) x L(x, α, β ) = 0. Then x is primal optimal and (α, β ) are dual optimal. furhtermore, if strong duality holds, then any primal optimal x and dual optimal (α, β ) must satisfy the conditions 1 through 4. These conditions are known as the Karush-Kuhn-Tucker (KKT) conditions. LNMIIT MLPR Convex Optimization 13/14
References S. Boyd and L. Vandenberghe. Convex Optimization. Berichte über verteilte messysteme. Cambridge University Press, 2004. Chuong B. Do. Convex optimization overview (cntd), 2009. J. A. Nocedal and S. J. Wright. Numerical Optimization. Springer Series in Operations Research Series. Springer-Verlag GmbH, 1999. LNMIIT MLPR Convex Optimization 14/14