Efficient Iterative Semi-supervised Classification on Manifold

Size: px

Start display at page:

Download "Efficient Iterative Semi-supervised Classification on Manifold"

Barrie Lindsey
5 years ago
Views:

1 . Efficient Iterative Semi-supervised Classification on Manifold... M. Farajtabar, H. R. Rabiee, A. Shaban, A. Soltani-Farani Sharif University of Technology, Tehran, Iran. Presented by Pooria Joulani University of Alberta December 11, 2011 M. Farajtabar et al. Efficient Iterative Semi-supervised Classification on Manifold December 11, / 22

2 ...1 Introduction Graph Transduction...2 The Algorithm Analysis...3 The Algorithm Analysis...4 Setup Scenarios Summary and Future Works M. Farajtabar et al. Efficient Iterative Semi-supervised Classification on Manifold December 11, / 22

3 . Semi-supervised Learning Introduction Graph Transduction Semi-supervised Learning: utilize unlabeled data to to enhance classification Manifold assumption: the labeling function varies smoothly with respect to the underlying manifold Manifold structure is modeled by the neighborhood graph of the data points Application such as image segmentation, handwritten digit recognition, text classification, and etc SSL is advantageous when there is large amount of unlabeled data which leads to better utilization of the underlying geometry Large-scale setting; time and memory limitation Efficient implementation M. Farajtabar et al. Efficient Iterative Semi-supervised Classification on Manifold December 11, / 22

4 . Graph Transduction Algorithms Introduction Graph Transduction Graph Transduction: a simple form of manifold regularization algorithms Can be formulated as: arg min x where A R n n and b, x R n 1 2 x T Ax b T x, (1) Equivalent to solving the system of linear equations, Ax = b A is fortunately a sparse symmetric positive definite matrix M. Farajtabar et al. Efficient Iterative Semi-supervised Classification on Manifold December 11, / 22

5 . Naive Solutions Outline Introduction Graph Transduction Require O(n 3 ) operations Methods that take into account the sparse structure of A can cost much less Taking the inverse of A directly is an obvious bad choice for various reasons Requires O(n 3 ) operations regardless of the sparsity A may be near singular in which case the inverse operation is numerically unstable The inverse of A is usually not sparse in which case a large amount of memory is needed to store and process A 1. M. Farajtabar et al. Efficient Iterative Semi-supervised Classification on Manifold December 11, / 22

6 . Two Approaches Introduction Graph Transduction Reformulate the manifold regularization problem Linear kernel Sparsified regularizer Solve the original formulation via Factorization methods LQ LU Cholesky Optimization algorithms Gradient descent Conjugate gradient Quasi Newton Iterative methods LP LGC LNP M. Farajtabar et al. Efficient Iterative Semi-supervised Classification on Manifold December 11, / 22

7 . Problem Statement Introduction Graph Transduction Let X u = {x 1,..., x u } and X l = {x u+1,..., x u+l } be sets of unlabeled and labeled data points, respectively, where n = u + l Let y be a vector of length n with y i = 0 for unlabeled x i and y i equals to the 1 or 1 corresponding to the class labels Our goal is to predict labels of X = X u X l as f Let W be the weight matrix of the k-nn graph of X, where σ is the bandwidth parameter W (i, j) = exp( x i x j 2 /2σ) (2) M. Farajtabar et al. Efficient Iterative Semi-supervised Classification on Manifold December 11, / 22

8 . Problem Statement (cont.) Introduction Graph Transduction The family of graph transduction algorithms can be formulated as the following optimization problem: arg min f T Qf + (f y) T C(f y) (3) f where Q is a regularization matrix and C is a diagonal matrix with C ii equal to the importance of the i th node to stick to y i The first term represents smoothness of the predicted labels with respect to the underlying manifold The second term is squared error of the predicted labels compared with the initial ones weighted by C. Choosing different Qs and Cs leads to various manifold classification methods: Thikhonv Regularization Label Propagation and Harmonic Solution Local and Global Consistency M. Farajtabar et al. Efficient Iterative Semi-supervised Classification on Manifold December 11, / 22

9 . Problem Statement (cont.) Introduction Graph Transduction Defining diagonal matrix D with D(i, i) = n j=1 W (i, j), symmetrically normalize W by S = D 1/2 WD 1/2. The Laplacian matrix is L = I S In Local and Global Consistency (LGC), Q = L and C = µi, i.e. we want to minimize R(f ) = f T Lf + (f y) T C(f y). (4) It may easily be shown that the solution is equal to: f = (L + C) 1 Cy = (I αs) 1 y, (5) where α = 1 µ+1 An iterative algorithm to compute this solution: f (t+1) = αsf (t) + (1 α)y. (6) M. Farajtabar et al. Efficient Iterative Semi-supervised Classification on Manifold December 11, / 22

10 . Gradient descent The Algorithm Analysis Gradient of the objective function is R = 2(Lf + C(f y)), Gradient descent update rule: f (t+1) = f (t) 2α(Lf + C(f y))). (7) The stopping criterion is R η. Choosing α appropriately is essential for convergence Applying exact line search at iteration t: t log ( R (0) R log (1/z) R (t) R ) which z is a constant equal to 1 λ min(l+c) λ max (L+C).. (8) M. Farajtabar et al. Efficient Iterative Semi-supervised Classification on Manifold December 11, / 22

11 . Gradient Descent (cont.) The Algorithm Analysis. Theorem 1.. The maximum number of iterations for gradient descent with exact line. search and fixed (η, µ) is O(log n)... To be exact: t (2+µ) log ( 2 n η ) 2 log (1 + µ 2 ). (9) Each iteration costs a sparse matrix-vector multiplication plus vector sums O(n) for each iteration given neighborhood size, k, is constant and small An O(n log n) rate of growth with respect the number of data, n The bound is valid for other graph transduction algorithms M. Farajtabar et al. Efficient Iterative Semi-supervised Classification on Manifold December 11, / 22

12 . Newton s algorithm The Algorithm Analysis Newton s update rule for our problem is approximating the inverse Hessian. f (t+1) = f (t) α( 2 R) 1 R (10) ( 2 R) 1 = 1 2 (L + C) 1 = 1 (I S + C) 1 2 = 1 ( I (I + C) 1 S ) 1 (I + C) 1 (11) 2 = 1 ( Σ ( i=0 (I + C) 1 S ) ) i (I + C) 1 2 Using the m first terms in the above summation leads to an approximation of the inverse Hessian: ( ( 2 R) 1 ( (I + C) 1 S ) ) i (I + C) 1. (12) Σ m 1 i=0 M. Farajtabar et al. Efficient Iterative Semi-supervised Classification on Manifold December 11, / 22

13 . Approximate Newton s algorithm The Algorithm Analysis Rewriting Newton s method with the approximated inverse Hessian and doing some math: where f (t+1) = H m f (t) + g m, (13) H = (I + C) 1 S (14) m 1 g m = ( H i )(I + C) 1 Cy. (15) i=0 This update rule is performed iteratively from an initial f (0) until the stopping criterion R η is reached. LGC s default iterative procedure is a especial case of the proposed method with m = 1. M. Farajtabar et al. Efficient Iterative Semi-supervised Classification on Manifold December 11, / 22

14 . Analysis Outline The Algorithm Analysis. Theorem 2... The approximate Newton s method converges to the solution of LGC.... Theorem 3.. For the approximate Newton s method the stopping criterion R η is. reached in O(log n) iterations... To be exact: t log ( (2+µ)n η ) m log (1 + µ) (16) m is empirically set to 1,2, or 3. A larger m disturbs sparsity. Given neighborhood size, k, is constant and small, cost of each iteration is equal to a sparse matrix-vector multiplication, i.e., O(n). Given η and µ are constant, the time complexity is O(n log n). M. Farajtabar et al. Efficient Iterative Semi-supervised Classification on Manifold December 11, / 22

15 . Illusteration Outline The Algorithm Analysis Optimization for two data points from MNIST Gradient descent Approx. Newton m = 1 Approx. Newton m = 2 Gradient Descent LGC (m = 1) Approximate method m = 2 Consider the directions which the methods find M. Farajtabar et al. Efficient Iterative Semi-supervised Classification on Manifold December 11, / 22

16 Setup Scenarios Summary and Future Works data from two classes of MNIST; handwritten digit recognition data from two classes of Covertype; forest cover prediction 7000 data from Classic dataset; text categorization Comparison with CHOLMOD and LGC s default implementation 5-NN for neighborhood construction Bandwidth size set to mean of standard deviation of data 2 % of data points are labeled µ is set to 0.5 η = empirically ensures convergence to the optimal solutions Number of Iterations, accuracy, and distance to optimum are reported by average of 10 runs for different random labelings M. Farajtabar et al. Efficient Iterative Semi-supervised Classification on Manifold December 11, / 22

17 . Number of iterations Setup Scenarios Summary and Future Works Number of Iterations 35 LGC Approx. Newton m = 2 30 Gradient Descent Number of data (a) MNIST Number of Iterations Number of data x 10 4 (b) Covertype 35 Number of Iterations Number of data (c) Classic M. Farajtabar et al. Efficient Iterative Semi-supervised Classification on Manifold December 11, / 22

18 . Accuracy Outline Setup Scenarios Summary and Future Works Accuracy 1.05 LGC Approx. Newton m = 2 Gradient Descent 1 CHOLMOD 0.95 Accuracy Number of data (d) MNIST Number of data x 10 4 (e) Covertype Accuracy Number of data (f) Classic M. Farajtabar et al. Efficient Iterative Semi-supervised Classification on Manifold December 11, / 22

19 . Distance form optimum Setup Scenarios Summary and Future Works f (t) f * 150 LGC Approx. Newton m = 2 Gradient Descent f (t) f * Number of iterations (g) MNIST Number of iterations (h) Covertype f (t) f * Number of iterations (i) Classic M. Farajtabar et al. Efficient Iterative Semi-supervised Classification on Manifold December 11, / 22

20 . Time Outline Setup Scenarios Summary and Future Works Duration (Sec) 4Approx. Newton m = 2 CHOLMOD Duration (Sec) 0.08LGC Approx. Newton m = Gradient Descent Number of data (j) MNIST Number of data (k) MNIST M. Farajtabar et al. Efficient Iterative Semi-supervised Classification on Manifold December 11, / 22

21 Setup Scenarios Summary and Future Works Summary A novel approximation to Newton s method is proposed for solving graph transduction problems A theoretical analysis on the number of iterations for the proposed method and the gradient descent method The number of iterations have logarithmic dependence on the number of data A reasonable approach when a large amount of data is being classified Future works: Analysis of robustness against noise Incorporating a low cost line search with the proposed method M. Farajtabar et al. Efficient Iterative Semi-supervised Classification on Manifold December 11, / 22

22 Setup Scenarios Summary and Future Works Thanks for your Attention. M. Farajtabar et al. Efficient Iterative Semi-supervised Classification on Manifold December 11, / 22

(Sparse) Linear Solvers

(Sparse) Linear Solvers Ax = B Why? Many geometry processing applications boil down to: solve one or more linear systems Parameterization Editing Reconstruction Fairing Morphing 2 Don t you just invert