Generalized trace ratio optimization and applications

Size: px

Start display at page:

Download "Generalized trace ratio optimization and applications"

Allen Gardner
5 years ago
Views:

1 Generalized trace ratio optimization and applications Mohammed Bellalij, Saïd Hanafi, Rita Macedo and Raca Todosijevic University of Valenciennes, France PGMO Days, 2-4 October 2013 ENSTA ParisTech PGMO S DAYS Generalized trace ratio optimization and applications 1 / 37

2 Outline 1 Trace Ratio Optimization Problem (TROP) 2 Applications 3 Fisher s Linear Discriminant Analysis 4 Generalized TROP (GTROP - STROP) 5 Mathematical analysis 6 Summary PGMO S DAYS Generalized trace ratio optimization and applications 2 / 37

3 Trace Ratio Optimization Problem (TROP) Outline 1 Trace Ratio Optimization Problem (TROP) 2 Applications 3 Fisher s Linear Discriminant Analysis 4 Generalized TROP (GTROP - STROP) 5 Mathematical analysis 6 Summary PGMO S DAYS Generalized trace ratio optimization and applications 3 / 37

4 Trace Ratio Optimization Problem (TROP) Trace Ratio Optimization Problem maximize (TROP) s.t. Tr(X T AX) Tr(X T BX) X T CX = I X Ω where A, B, C are matrices with appropriate dimensions, and I identity matrix. Tr(.) is the trace of a square matrix. Continuous case : Ω = R m d or [0,1] m d or [ 1,1] m d Binary case : Ω = {0,1} m d or { 1,1} m d TROP is a Hard Problem There is no loss of generality in assuming that C = I. PGMO S DAYS Generalized trace ratio optimization and applications 4 / 37

5 Trace Ratio Optimization Problem (TROP) Discrete Trace Ratio Optimization Problem Discrete TROP is more difficult than the continuous case. Continuous TROP is a relaxation of Discrete TROP. Continuous TROP provides an upper bound on the optimal value of the discrete TROP. Linear algebra techniques are powerful for solving continuous TROP : much lower computational cost and guaranteed approximation quality. PGMO S DAYS Generalized trace ratio optimization and applications 5 / 37

6 Applications Outline 1 Trace Ratio Optimization Problem (TROP) 2 Applications 3 Fisher s Linear Discriminant Analysis 4 Generalized TROP (GTROP - STROP) 5 Mathematical analysis 6 Summary PGMO S DAYS Generalized trace ratio optimization and applications 6 / 37

7 Applications Many real-world application problems can be modeled as trace ratio or more generally as sum of the trace ratio optimization problem. It can arise in : The Fisher discriminant analysis in pattern recognition (data mining, machine learning) The downlink of a multi-user MIMO system (wireless communication and signal processing) The cell formation problem (Cellular manufacturing). PGMO S DAYS Generalized trace ratio optimization and applications 7 / 37

8 Applications Data Mining is the process of analyzing data in order to extract useful knowledge such as : Clustering : Unsupervised learning Classification : Supervised learning Feature selection : Suppression of irrelevant or redundant features. PGMO S DAYS Generalized trace ratio optimization and applications 8 / 37

9 Applications Dimensionality reduction : Major tool of Data Mining Most machine learning & data mining techniques may not be effective for high-dimensional data Dimensionality reduction plays a fundamental role in data mining : Map the data in high-dimensional space to a low-dimensional space Reduce noise and redundancy in data before performing a task. Dimensionality reduction usually entails embedding the data in a space of reduced dimension that preserves most of its interesting details. PGMO S DAYS Generalized trace ratio optimization and applications 9 / 37

10 Fisher s Linear Discriminant Analysis Outline 1 Trace Ratio Optimization Problem (TROP) 2 Applications 3 Fisher s Linear Discriminant Analysis 4 Generalized TROP (GTROP - STROP) 5 Mathematical analysis 6 Summary PGMO S DAYS Generalized trace ratio optimization and applications 10 / 37

11 Fisher s Linear Discriminant Analysis Some Techniques Several techniques for dimensionality reduction end with solving a TROP. Fisher s Linear Discriminant Analysis Support Vector Machines or Kernel Methods Graph embedding, and so on. PGMO S DAYS Generalized trace ratio optimization and applications 11 / 37

12 Fisher s Linear Discriminant Analysis Fisher s Linear Discriminant Analysis (LDA) LDA attempts to find linear projections of the data that are optimal for discrimination The intra-cluster separability : a measure of how well separated two distinct classes are The inter-cluster compactness : a measure of how well clustered items of the same class are Goal : Search for a transformation matrix that maximizes intra-cluster separability and at the same time minimizes inter-cluster compactness The natural model for these dual objectives is to optimize a trace ratio problem PGMO S DAYS Generalized trace ratio optimization and applications 12 / 37

13 Fisher s Linear Discriminant Analysis LDA : Objective deduction Let µ = mean of X, and µ (k) = mean of the k th class (of size n k, k = 1,...,c). Define A = c n k (µ (k) µ)(µ (k) µ) T, B = c (x i µ (k) )(x i µ (k) ) T. x i k=1 k=1 Criterion : maximize the ratio of two traces Tr(X T AX) Tr(X T BX) Constraint : X T X = I (orthogonal projection)... Alternative : Solve the easier problem instead : max Tr(X T AX) X T BX =I Solution : largest generalized eigenvectors X i AXi = λ i BXi However, its solution may deviate from the original objective PGMO S DAYS Generalized trace ratio optimization and applications 13 / 37 i.e.

14 Generalized TROP (GTROP - STROP) Outline 1 Trace Ratio Optimization Problem (TROP) 2 Applications 3 Fisher s Linear Discriminant Analysis 4 Generalized TROP (GTROP - STROP) Discriminative hypergraph partitioning Cell formation problem 5 Mathematical analysis 6 Summary PGMO S DAYS Generalized trace ratio optimization and applications 14 / 37

15 Generalized TROP (GTROP - STROP) Discriminative hypergraph partitioning Discriminative hypergraph partitioning 1 Hypergraph partitioning seeks for an optimal partitioning of the vertex set V of a given graph G = (V,A) into κ disjoint subsets 2 The discriminative hypergraph partitioning criterion considers both the inter-cluster compactness and the intra-cluster separability, using a similarity matrix S, and thus aims to solve κ maxg(x) = 1 κ j=1 Xj T SX j Xj T QX j s.t. X {0,1} Nxκ,X1 κ = 1 N where Q = D S, with D is a diagonal matrix with D ii = j S ij, and X j is the j-th column of X. PGMO S DAYS Generalized trace ratio optimization and applications 15 / 37

16 Generalized TROP (GTROP - STROP) Discriminative hypergraph partitioning κ maxg(x) = 1 κ j=1 [X j (Xj T X j ) 1 2 ] T S[X j (Xj T X j ) 1 2 ] [X j (Xj T X j ) 1 2 ] T Q[X j (Xj T X j ) 1 2 ] s.t. X {0,1} Nxκ,X1 κ = 1 N Li et al (2013) show that this problem is equivalent to maxg(x) = 1 κ Pj T SP j κ j=1 Pj T s.t. P T P = I κ (1) QP j where P = (P 1 P 2...P k ) = X(X T X) 1 2. PGMO S DAYS Generalized trace ratio optimization and applications 16 / 37

17 Generalized TROP (GTROP - STROP) Discriminative hypergraph partitioning The problem (1) consists in maximizing the sum of the trace ratio ( specifically : sum of the generalized Rayleigh quotients of the pencil (S,Q)). Problem difficult to solve : Existence of multiple local non-global maxima. So, (1) is approximated as maxf (P) = 1 κj=1 Pj T SP j κ κj=1 Pj T = tr(pt SP) QP j tr(p T QP) s.t. P T P = I κ which is a continuous trace ratio problem. PGMO S DAYS Generalized trace ratio optimization and applications 17 / 37

18 Generalized TROP (GTROP - STROP) Cell formation problem A new generalized (singular) trace ratio problem : maximize (STROP) s.t. Tr(X T AY ) 1+Tr(X T BY ) X Ω Y We show that the cell formation problem can be formulated as (2). (2) PGMO S DAYS Generalized trace ratio optimization and applications 18 / 37

Generalized TROP (GTROP - STROP) Cell formation problem Cell formation problem Application : Group technology or cellular manufacturing System : machines and parts interacting Partition the system

19 Generalized TROP (GTROP - STROP) Cell formation problem Cell formation problem Application : Group technology or cellular manufacturing System : machines and parts interacting Partition the system into subsystems to maximize efficiency : Interactions between the machines and the parts within a subsystem are maximized Interactions between the parts of other systems are reduced as much as possible PGMO S DAYS Generalized trace ratio optimization and applications 19 / 37

20 Generalized TROP (GTROP - STROP) Cell formation problem Problem formulation : Singular Trace Ratio I : set of M machines, J : set of P parts, K : set of C cells { 1, if machine i processes part j a ij = 0, otherwise Measure of autonomy Eff = a aout 1 a +a0 In a = M i=1 Pj=1 a ij : total number of entries equal to 1 in matrix A a Out 1 :number of exceptional elements a In 0 : number of zero entries in the diagonal blocks Objective : Maximize Eff PGMO S DAYS Generalized trace ratio optimization and applications 20 / 37

21 Generalized TROP (GTROP - STROP) Cell formation problem Consider { 1, if machine i belongs to cell k x ik = 0, otherwise, i = 1,...,M, k = 1,...,C { 1, if part j belongs to cell k y jk = 0, otherwise, j = 1,...,P, k = 1,...,C Some constraints : Each machine and each part assigned to exactly one cell : C x ik = 1 (i = 1,...,M) and C y jk = 1 (j = 1,...,P) k=1 k=1 At least one machine and one part in each cell : We verify that M x ik 1 (k = 1,...,C) and i=1 a1 Out C M P = a a ij x ik y jk k=1i=1j=1 P y jk 1 (k = 1,...,C) j=1 a0 In C M P = (1 a ij )x ik y jk k=1i=1j=1 PGMO S DAYS Generalized trace ratio optimization and applications 21 / 37

22 Generalized TROP (GTROP - STROP) Cell formation problem Objective function Ck=1 Mi=1 Pj=1 a ij x ik y jk Eff = a + C Mi=1 Pj=1 k=1 (1 a ij )x ik y jk The cell formation problem may be formulated as generalized (singular) trace ratio problem Remarks : Maximize (STROP) s.t. A = ( a ij a ) and B = ( 1 a ij a ). Tr(X T A Y ) 1+Tr(X T B Y ) X Ω Y X Ω and Y X T X and Y T Y : Regular diagonal matrices. PGMO S DAYS Generalized trace ratio optimization and applications 22 / 37

23 Generalized TROP (GTROP - STROP) Cell formation problem Another Generalized TROP Maximization of the Sum of the Trace Ratio on the Stiefel Manifold L.H. Zhang, R.C. Li 2013 maximize (GTROP) Tr(X T AX) Tr(X T BX) +Tr(X T CX) s.t. X T X = I (I : identity matrix of size d) X R m d (d < m). PGMO S DAYS Generalized trace ratio optimization and applications 23 / 37

24 Mathematical analysis Outline 1 Trace Ratio Optimization Problem (TROP) 2 Applications 3 Fisher s Linear Discriminant Analysis 4 Generalized TROP (GTROP - STROP) 5 Mathematical analysis Lanczos & Newton method for TROP Single Value Decomposition & Newton Method for STROP 6 Summary PGMO S DAYS Generalized trace ratio optimization and applications 24 / 37

25 Mathematical analysis Lanczos & Newton method for TROP Closed form solution Given a symmetric matrix A of dimension m m and an arbitrary unitary matrix X of dimension m d Standard eigenvalue problem max{tr(x T AX) : X T X = I,X R m d } = λ λ d. The eigenvalues λ 1,,λ d are labeled decreasingly X is an orthogonal basis of the eigenspace of A associated with the d algebraically largest eigenvalues. Generalized eigenvalue problem Assuming that B positive definite, we have max Tr(X T AX) = Tr(X X R n d T AX ) = λ λ d X T BX = I (Eigenvalues labeled decreasingly for the generalized problem Aw = λbw, X eigenvectors associated with the first d eigenvalues, with X T BX = I) PGMO S DAYS Generalized trace ratio optimization and applications 25 / 37

26 Mathematical analysis Lanczos & Newton method for TROP Existence and Uniqueness of a Solution of TROP The Trace Ratio Optimization Problem (TROP) does not have a closed-form solution. Trace ratio vs. Ratio trace : Often TROP is simplified into a more accommodating one : max X T X =I Tr((X T BX) 1 (X T AX)). The obtained solution does not necessarily best maximize the corresponding trace ratio problem. TROP admits a finite maximum value ρ : It does not have a local non-global solution. ρ is reached for certain (nonunique) orthogonal matrices X. (T. T. Ngo, M. B. and Y. Saad 2012) PGMO S DAYS Generalized trace ratio optimization and applications 26 / 37

27 Mathematical analysis Lanczos & Newton method for TROP From trace ratio to trace difference Solving the trace ratio problem is equivalent to finding the solution of the scalar equation f (ρ) = 0, where f (ρ) = max X T X =I Tr(X T (A ρb)x) Proposition 1 f is a strictly decreasing function of ρ 2 f (ρ) = 0 iff ρ = ρ 3 f (ρ) = Tr(X(ρ) T BX(ρ)) So, finding the optimal solution a search for the unique root of f (ρ). PGMO S DAYS Generalized trace ratio optimization and applications 27 / 37

28 Mathematical analysis Lanczos & Newton method for TROP Newton-Lanczos algorithm for TROP Select initial m d unitary matrix X compute ρ = Tr(X T AX) Tr(X T BX) While Not convergence Do Call the Lanczos algorithm to compute the d largest eigenvalues of A ρb and associated eigenvectors [w 1,w 2,,w d ] X Set ρ := Tr(X T AX) Tr(X T BX) End While PGMO S DAYS Generalized trace ratio optimization and applications 28 / 37

29 Mathematical analysis Single Value Decomposition & Newton Method for STROP Singular trace ratio optimization problem Given real matrices A and B of dimension m n. Let O p,k = { Z R p k : Z T Z = I k } and Om,n,k = O m,k O n,k. Goal : Find a pair of orthogonal matrices X O m,k and Y O n,k optimal solution of the problem : Tr[X T AY ] max (X,Y ) O m,n,k 1 +Tr[X T BY ]. We will assume that the matrix B verifies any (X,Y ) O m,n,k. 1 +Tr[X T BY ] > 0 for PGMO S DAYS Generalized trace ratio optimization and applications 29 / 37

30 Mathematical analysis Single Value Decomposition & Newton Method for STROP Singular value decomposition (SVD) for trace optimization Let C be a m n real matrix with rank (C) = r n < m. 1 There are positive real numbers σ 1 σ 2 σ r > 0, and square orthogonal matrices U O m,m and V O n,n, such that ( ) U T Σ CV r O r,n r = Σ = R m n, O m r,r O m r,n r Σ r =diag(σ 1,σ 2,,σ r ) σ 1,σ 2,,σ r : the singular values of C. 2 The trace of X T CY reaches its maximum (resp., minimum) when X and Y are an orthogonal basis of the left and right singular eigenspaces of C associated with the k algebraically largest (resp., smallest) singular values. PGMO S DAYS Generalized trace ratio optimization and applications 30 / 37

31 Mathematical analysis Single Value Decomposition & Newton Method for STROP Closed form solution : singular value problem Theorem Let C be a m n real matrix with m > n r. Suppose U T CV = Σ is the SVD of C. Let the columns of the matrix U be partitioned in two blocks, U = [U 1, U 2 ] with U 1 R m n and U 2 R m (m n). Then, max Tr [ X T CY ] = k (X,Y ) O m,n,k σ i i=1 = Tr [ U T 1,k C V k ]. PGMO S DAYS Generalized trace ratio optimization and applications 31 / 37

32 Mathematical analysis Single Value Decomposition & Newton Method for STROP Existence and Uniqueness of a Solution of STROP The problem STROP admits a finite maximum value ρ. It is reached for certain (nonunique) orthogonal matrices X and Y. Thanks to the cyclic property of the trace, any simultaneous orthogonal transformation of the columns of X and Y will not change the objective function ( U = X R,V = Y R for any regular matrix R R k k such that R 1 = R T ). We have Tr[X T (A ρ B)Y ] ρ because 1 +Tr[X T BY ] > 0. Therefore, we have the following necessary condition for the triplet ρ,x and Y to be optimal : max (X,Y ) O m,n,k Tr[X T (A ρ B)Y ] = Tr[X T (A ρ B)Y ] = ρ. Let g(ρ) = max Tr[X T (A ρ B)Y ]. (X,Y ) O m,n,k Then, we just need to solve the scalar equation g(ρ) = ρ. PGMO S DAYS Generalized trace ratio optimization and applications 32 / 37

33 Mathematical analysis Single Value Decomposition & Newton Method for STROP Properties of f (ρ) = g(ρ) ρ Evaluating g(ρ) consists in computing the left and right singular vectors X(ρ) and Y (ρ) associated with the k largest singular values of A ρ B. So, g(ρ) = Tr[X T (ρ)(a ρ B)Y (ρ)]. Under the assumption that B verifies 1 +Tr[X T BY ] > 0, we have 1 f is differentiable at ρ with f (ρ) = Tr [ X(ρ) T B Y (ρ) ] 1 and f is a strictly decreasing function. 2 f is convex. 3 f (ρ) = 0 iff ρ = ρ. PGMO S DAYS Generalized trace ratio optimization and applications 33 / 37

34 Mathematical analysis Single Value Decomposition & Newton Method for STROP Fractional iteration of the Newton-approximation-formula Newton s method to approximate the unique fixed point of g : ρ new = ρ Tr[X T (ρ)(a ρ B)Y (ρ)] ρ Tr[X T (ρ) B Y (ρ)] 1 = Tr[X T (ρ) A Y (ρ)] 1 +Tr[X T (ρ) B Y (ρ)]. The Newton-SVD algorithm includes the following three iterative steps : 1 Compute the trace ratio ρ = Tr[X T AY ] 1 +Tr[X T BY ] ; 2 Run the SVD algorithm to compute the k largest singular values of A ρb as well as their associated singular eigenvectors X i and Y i ; 3 Repeat the above two steps until convergence. PGMO S DAYS Generalized trace ratio optimization and applications 34 / 37

35 Summary Outline 1 Trace Ratio Optimization Problem (TROP) 2 Applications 3 Fisher s Linear Discriminant Analysis 4 Generalized TROP (GTROP - STROP) 5 Mathematical analysis 6 Summary PGMO S DAYS Generalized trace ratio optimization and applications 35 / 37

36 Summary SUMMARY The formulation of trace ratio is shared by many real-world application problems. A new formulation for the cell formation problem as the trace ratio : the Singular Trace Ratio Optimization Problem (STROP). Continuous TROP and STROP do not admit local non-global maxima. A direct solution to TROP or STROP is presented : The maximum is the unique zero of a scalar function. Both can be solved numerically by a judicious use of the Lanczos procedure, a good initialization, and inexact eigenvector calculations in the early stages of the Newton procedure. PGMO S DAYS Generalized trace ratio optimization and applications 36 / 37

37 Summary REFERENCES T. T. Ngo, M. B. and Y. Saad. The trace ratio optimization problem, SIAM Review, 54(3) (2012), pp X. Li, W. Hu, C. Shen, A. Dick and Z. Zhang. Context-Aware Hypergraph Construction for Robust Spectral Clustering, Knowledge and Data Engineering, IEEE Transactions on (Volume :PP, Issue : 99, 2013 ) L.-H. Zhang, and R.-C. Li. Maximization of the Sum of the Trace Ratio on the Stiefel, Technical Report Thanks! PGMO S DAYS Generalized trace ratio optimization and applications 37 / 37

Dimension Reduction CS534

Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of