Kernels and Constrained Optimization

Size: px

Start display at page:

Download "Kernels and Constrained Optimization"

Gyles Hart
5 years ago
Views:

1 Machine Learning 1 WS2014 Module IN2064 Sheet 8 Page 1 Machine Learning Worksheet 8 Kernels and Constrained Optimization 1 Kernelized k-nearest neighbours To classify the point x the k-nearest neighbours finds the k training samples N = {x (s 1), x (s 2),..., x (s k) } that have the shortest distance x x (s i) 2 to x. Then the label that is mostly represented in the neighbour set N is assigned to x. Problem 1: Formulate the k-nearest neighbours algorithm in feature space by introducing the feature map φ(x). Then rewrite the k-nearest neighbours algorithm so that it only depends on the scalar product in feature space K(φ(x), φ(y)) = φ(x) T φ(y). The distance to a training sample in feature space is given by φ(x) φ(x (s i) ) 2. We can replace this by the squared distance because this will not change which points are nearest to x. Thus we have φ(x) φ(x (s i) ) 2 2 = (φ(x) φ(x (s i) )) T (φ(x) φ(x (s i) )) = φ(x) T φ(x) 2φ(x) T φ(x (s i) ) + φ(x (s i) ) T φ(x (s i) ). The first term is a constant when searching for the k training samples that minimize this function. Hence we can drop the first term and must find the k training samples x (s i) that minimize φ(x (s i) ) T φ(x (s i) ) 2φ(x) T φ(x (s i) ) = K(x (s i), x (s i) ) 2K(x, x (s i) ). 2 Convex functions Problem 2: Given two convex functions f(x) and g(x) show that the sum h(x) = f(x) + g(x) and the scaled function u(x) = cf(x) with c 0 are convex. We have for arbitrary x, y and 0 α 1: h((1 α)x + αy) = f((1 α)x + αy) + g((1 α)x + αy) (1 α)f(x) + αf(y) + (1 α)g(x) + αg(y) = (1 α)h(x) + αh(y)

2 Machine Learning 1 WS2014 Module IN2064 Sheet 8 Page 2 and u((1 α)x + αy) = cf((1 α)x + αy) c[(1 α)f(x) + αf(y)] = (1 α)u(x) + αu(y) Problem 3: Consider the family of convex functions f (x), that is for every R the function f (x) is convex. Prove that the pointwise maximum g(x) = max f (x) is convex. By definition of convexity for every we have for arbitrary x, y and 0 α 1 By definition of the maximum we have (1 α) max f ((1 α)x + αy) (1 α)f (x) + αf (y). f (x) + α max f (y) (1 α)f (x) + αf (y) for any. We choose to be the maximizer of f ((1 α)x + αy) w.r.t.. Thus we have (1 α) max f (x) + α max f (y) (1 α)f (x) + αf (y) f ((1 α)x + αy) = max f ((1 α)x + αy). Problem 4: Show that the Lagrange dual function g(α) = min x L(x, α) is concave. (A function f(x) is concave if and only if f(x) is convex.) The Lagrangian is given by L(x, α) = f 0 (x) + i α i f i (x). This can also be interpreted as a family of functions L x (α) = L(x, α), that means we interpret x as fixed. Thus L x (α) has the form L x (α) = C 0,x + i C i,x α i, where the Cs are constants. By applying the definition of a convex function we see immediately that L x (α) is convex for any Cs. The Lagrange dual function is given by the pointwise minimum of this family of functions, g(α) = min x L(x, α) = min L x (α) = max ( L x x(α)). x

3 Machine Learning 1 WS2014 Module IN2064 Sheet 8 Page 3 The negative Lagragian L x (α) is also convex because the negation only changes the sign of the constants C 0,x and C i,x. Thus as shown in the previous exercise g(α) is convex and g(α) is concave. Note that g(α) is concave even when the objective function and/or the constraints are not. 3 A simple constrained optimization problem Consider the simple optimization problem minimize f 0 (x) = x subject to f 1 (x) = (x 2)(x 4) 0. Problem 5: Plot the objective f 0 (x) and the constraint f 1 (x) versus x in one plot. Show the feasible points. Use this plot to directly give the solution of the optimization problem The objective f 0 (x) is in blue, the constraint f 1 (x) in red. The feasible region [2, 4] is marked in pink. As f 0 (x) is increasing with x the minimizer x is at the smallest x that is in the feasible region. Thus x = 2 minimizes the optimization problem with the minimal value f 0 (2) = 5. Problem 6: Derive the Lagrangian L(x, α) and use a computer program to plot it for α {0, 0.5, 1, 1.5, 2, 3, 4, 5, 8}. For which regions is the value of the Lagrangian larger than the objective function? For which regions is the value of the Lagrangian smaller than the objective function? Which points are unaffected? What is the upper bound of min x L(x, α) for all α 0? The Lagrangian is given by L(x, α) = f 0 (x) + αf 1 (x) = x α(x 2)(x 4).

4 Machine Learning 1 WS2014 Module IN2064 Sheet 8 Page The plot series shows raising α from left to right. The Lagrangian is larger for all regions where the constraint is violated, because then f 1 (x) > 0 and α is positive, and smaller for all regions where the constraint is satisfied. Points for which the constraint is satisfied with equality, f 1 (x) = 0, are unaffected. Thus it penalizes the target function at regions where the constraint is violated. The amount of punishment is controlled by the Lagrange multiplier α. Problem 7: Derive and plot the Lagrange dual function g(α). State the dual problem. We calculate the minimum of L(x, α) w.r.t. x, L x x min (α) = = 2x + α[(x 4) + (x 2)] = (2 + 2α)x 6α = 0 3α α + 1. The Lagrange dual function is given by g(α) = L(x min (α), α) = 10 α α.

5 Machine Learning 1 WS2014 Module IN2064 Sheet 8 Page The dual problem is maximize g(α) = 10 α α subject to α 0. Problem 8: Find the dual optimal value and the dual optimal solution α. The dual function g(α) is concave, thus its maximums are given by dg dα = 1 + 9(1 + α) 2 = 0 (1 + α) 2 = α = ±3 α 1 = 2 α 2 = 4 Only α 1 satisfies the constraint α 0, thus the dual optimal solution is α = 2 and the dual optimal value is g(α ) = 5. Problem 9: Is the dual optimal value also the minimum of the original optimization problem? Yes, it is. f 0 and f 1 are both convex and at the point x = 3 we have f 1 (3) < 0. Thus Slater s constraint qualification applies and the duality gap is zero.

6 Machine Learning 1 WS2014 Module IN2064 Sheet 8 Page 6 Problem 10: Is the constraint f 1 active or inactive? Can you also see this from the plot of the primal problem? What does it mean when a constraint is active, i.e. what is the effect of an active constraint on the solution? The constraint f 1 is active (fullfilled with equality, f 1 (x ) = 0) because the corresponding Lagrange multiplier α = 2 0 is not zero. See the KKT condition Complementary Slackness. We can see this from the plot in exercise 3 because the minimum of the constrained problem is on the edge of the feasible region and the objective function f 0 (x) is not minimal (ignoring the constraints) there. If a constraint is active it means that it limits the solution, that is if the constraint was dropped we could obtain a lower value of the objective function.

Lecture 7: Support Vector Machine

Lecture 7: Support Vector Machine Hien Van Nguyen University of Houston 9/28/2017 Separating hyperplane Red and green dots can be separated by a separating hyperplane Two classes are separable, i.e., each