Generalized trace ratio optimization and applications
|
|
- Allen Gardner
- 5 years ago
- Views:
Transcription
1 Generalized trace ratio optimization and applications Mohammed Bellalij, Saïd Hanafi, Rita Macedo and Raca Todosijevic University of Valenciennes, France PGMO Days, 2-4 October 2013 ENSTA ParisTech PGMO S DAYS Generalized trace ratio optimization and applications 1 / 37
2 Outline 1 Trace Ratio Optimization Problem (TROP) 2 Applications 3 Fisher s Linear Discriminant Analysis 4 Generalized TROP (GTROP - STROP) 5 Mathematical analysis 6 Summary PGMO S DAYS Generalized trace ratio optimization and applications 2 / 37
3 Trace Ratio Optimization Problem (TROP) Outline 1 Trace Ratio Optimization Problem (TROP) 2 Applications 3 Fisher s Linear Discriminant Analysis 4 Generalized TROP (GTROP - STROP) 5 Mathematical analysis 6 Summary PGMO S DAYS Generalized trace ratio optimization and applications 3 / 37
4 Trace Ratio Optimization Problem (TROP) Trace Ratio Optimization Problem maximize (TROP) s.t. Tr(X T AX) Tr(X T BX) X T CX = I X Ω where A, B, C are matrices with appropriate dimensions, and I identity matrix. Tr(.) is the trace of a square matrix. Continuous case : Ω = R m d or [0,1] m d or [ 1,1] m d Binary case : Ω = {0,1} m d or { 1,1} m d TROP is a Hard Problem There is no loss of generality in assuming that C = I. PGMO S DAYS Generalized trace ratio optimization and applications 4 / 37
5 Trace Ratio Optimization Problem (TROP) Discrete Trace Ratio Optimization Problem Discrete TROP is more difficult than the continuous case. Continuous TROP is a relaxation of Discrete TROP. Continuous TROP provides an upper bound on the optimal value of the discrete TROP. Linear algebra techniques are powerful for solving continuous TROP : much lower computational cost and guaranteed approximation quality. PGMO S DAYS Generalized trace ratio optimization and applications 5 / 37
6 Applications Outline 1 Trace Ratio Optimization Problem (TROP) 2 Applications 3 Fisher s Linear Discriminant Analysis 4 Generalized TROP (GTROP - STROP) 5 Mathematical analysis 6 Summary PGMO S DAYS Generalized trace ratio optimization and applications 6 / 37
7 Applications Many real-world application problems can be modeled as trace ratio or more generally as sum of the trace ratio optimization problem. It can arise in : The Fisher discriminant analysis in pattern recognition (data mining, machine learning) The downlink of a multi-user MIMO system (wireless communication and signal processing) The cell formation problem (Cellular manufacturing). PGMO S DAYS Generalized trace ratio optimization and applications 7 / 37
8 Applications Data Mining is the process of analyzing data in order to extract useful knowledge such as : Clustering : Unsupervised learning Classification : Supervised learning Feature selection : Suppression of irrelevant or redundant features. PGMO S DAYS Generalized trace ratio optimization and applications 8 / 37
9 Applications Dimensionality reduction : Major tool of Data Mining Most machine learning & data mining techniques may not be effective for high-dimensional data Dimensionality reduction plays a fundamental role in data mining : Map the data in high-dimensional space to a low-dimensional space Reduce noise and redundancy in data before performing a task. Dimensionality reduction usually entails embedding the data in a space of reduced dimension that preserves most of its interesting details. PGMO S DAYS Generalized trace ratio optimization and applications 9 / 37
10 Fisher s Linear Discriminant Analysis Outline 1 Trace Ratio Optimization Problem (TROP) 2 Applications 3 Fisher s Linear Discriminant Analysis 4 Generalized TROP (GTROP - STROP) 5 Mathematical analysis 6 Summary PGMO S DAYS Generalized trace ratio optimization and applications 10 / 37
11 Fisher s Linear Discriminant Analysis Some Techniques Several techniques for dimensionality reduction end with solving a TROP. Fisher s Linear Discriminant Analysis Support Vector Machines or Kernel Methods Graph embedding, and so on. PGMO S DAYS Generalized trace ratio optimization and applications 11 / 37
12 Fisher s Linear Discriminant Analysis Fisher s Linear Discriminant Analysis (LDA) LDA attempts to find linear projections of the data that are optimal for discrimination The intra-cluster separability : a measure of how well separated two distinct classes are The inter-cluster compactness : a measure of how well clustered items of the same class are Goal : Search for a transformation matrix that maximizes intra-cluster separability and at the same time minimizes inter-cluster compactness The natural model for these dual objectives is to optimize a trace ratio problem PGMO S DAYS Generalized trace ratio optimization and applications 12 / 37
13 Fisher s Linear Discriminant Analysis LDA : Objective deduction Let µ = mean of X, and µ (k) = mean of the k th class (of size n k, k = 1,...,c). Define A = c n k (µ (k) µ)(µ (k) µ) T, B = c (x i µ (k) )(x i µ (k) ) T. x i k=1 k=1 Criterion : maximize the ratio of two traces Tr(X T AX) Tr(X T BX) Constraint : X T X = I (orthogonal projection)... Alternative : Solve the easier problem instead : max Tr(X T AX) X T BX =I Solution : largest generalized eigenvectors X i AXi = λ i BXi However, its solution may deviate from the original objective PGMO S DAYS Generalized trace ratio optimization and applications 13 / 37 i.e.
14 Generalized TROP (GTROP - STROP) Outline 1 Trace Ratio Optimization Problem (TROP) 2 Applications 3 Fisher s Linear Discriminant Analysis 4 Generalized TROP (GTROP - STROP) Discriminative hypergraph partitioning Cell formation problem 5 Mathematical analysis 6 Summary PGMO S DAYS Generalized trace ratio optimization and applications 14 / 37
15 Generalized TROP (GTROP - STROP) Discriminative hypergraph partitioning Discriminative hypergraph partitioning 1 Hypergraph partitioning seeks for an optimal partitioning of the vertex set V of a given graph G = (V,A) into κ disjoint subsets 2 The discriminative hypergraph partitioning criterion considers both the inter-cluster compactness and the intra-cluster separability, using a similarity matrix S, and thus aims to solve κ maxg(x) = 1 κ j=1 Xj T SX j Xj T QX j s.t. X {0,1} Nxκ,X1 κ = 1 N where Q = D S, with D is a diagonal matrix with D ii = j S ij, and X j is the j-th column of X. PGMO S DAYS Generalized trace ratio optimization and applications 15 / 37
16 Generalized TROP (GTROP - STROP) Discriminative hypergraph partitioning κ maxg(x) = 1 κ j=1 [X j (Xj T X j ) 1 2 ] T S[X j (Xj T X j ) 1 2 ] [X j (Xj T X j ) 1 2 ] T Q[X j (Xj T X j ) 1 2 ] s.t. X {0,1} Nxκ,X1 κ = 1 N Li et al (2013) show that this problem is equivalent to maxg(x) = 1 κ Pj T SP j κ j=1 Pj T s.t. P T P = I κ (1) QP j where P = (P 1 P 2...P k ) = X(X T X) 1 2. PGMO S DAYS Generalized trace ratio optimization and applications 16 / 37
17 Generalized TROP (GTROP - STROP) Discriminative hypergraph partitioning The problem (1) consists in maximizing the sum of the trace ratio ( specifically : sum of the generalized Rayleigh quotients of the pencil (S,Q)). Problem difficult to solve : Existence of multiple local non-global maxima. So, (1) is approximated as maxf (P) = 1 κj=1 Pj T SP j κ κj=1 Pj T = tr(pt SP) QP j tr(p T QP) s.t. P T P = I κ which is a continuous trace ratio problem. PGMO S DAYS Generalized trace ratio optimization and applications 17 / 37
18 Generalized TROP (GTROP - STROP) Cell formation problem A new generalized (singular) trace ratio problem : maximize (STROP) s.t. Tr(X T AY ) 1+Tr(X T BY ) X Ω Y We show that the cell formation problem can be formulated as (2). (2) PGMO S DAYS Generalized trace ratio optimization and applications 18 / 37
19 Generalized TROP (GTROP - STROP) Cell formation problem Cell formation problem Application : Group technology or cellular manufacturing System : machines and parts interacting Partition the system into subsystems to maximize efficiency : Interactions between the machines and the parts within a subsystem are maximized Interactions between the parts of other systems are reduced as much as possible PGMO S DAYS Generalized trace ratio optimization and applications 19 / 37
20 Generalized TROP (GTROP - STROP) Cell formation problem Problem formulation : Singular Trace Ratio I : set of M machines, J : set of P parts, K : set of C cells { 1, if machine i processes part j a ij = 0, otherwise Measure of autonomy Eff = a aout 1 a +a0 In a = M i=1 Pj=1 a ij : total number of entries equal to 1 in matrix A a Out 1 :number of exceptional elements a In 0 : number of zero entries in the diagonal blocks Objective : Maximize Eff PGMO S DAYS Generalized trace ratio optimization and applications 20 / 37
21 Generalized TROP (GTROP - STROP) Cell formation problem Consider { 1, if machine i belongs to cell k x ik = 0, otherwise, i = 1,...,M, k = 1,...,C { 1, if part j belongs to cell k y jk = 0, otherwise, j = 1,...,P, k = 1,...,C Some constraints : Each machine and each part assigned to exactly one cell : C x ik = 1 (i = 1,...,M) and C y jk = 1 (j = 1,...,P) k=1 k=1 At least one machine and one part in each cell : We verify that M x ik 1 (k = 1,...,C) and i=1 a1 Out C M P = a a ij x ik y jk k=1i=1j=1 P y jk 1 (k = 1,...,C) j=1 a0 In C M P = (1 a ij )x ik y jk k=1i=1j=1 PGMO S DAYS Generalized trace ratio optimization and applications 21 / 37
22 Generalized TROP (GTROP - STROP) Cell formation problem Objective function Ck=1 Mi=1 Pj=1 a ij x ik y jk Eff = a + C Mi=1 Pj=1 k=1 (1 a ij )x ik y jk The cell formation problem may be formulated as generalized (singular) trace ratio problem Remarks : Maximize (STROP) s.t. A = ( a ij a ) and B = ( 1 a ij a ). Tr(X T A Y ) 1+Tr(X T B Y ) X Ω Y X Ω and Y X T X and Y T Y : Regular diagonal matrices. PGMO S DAYS Generalized trace ratio optimization and applications 22 / 37
23 Generalized TROP (GTROP - STROP) Cell formation problem Another Generalized TROP Maximization of the Sum of the Trace Ratio on the Stiefel Manifold L.H. Zhang, R.C. Li 2013 maximize (GTROP) Tr(X T AX) Tr(X T BX) +Tr(X T CX) s.t. X T X = I (I : identity matrix of size d) X R m d (d < m). PGMO S DAYS Generalized trace ratio optimization and applications 23 / 37
24 Mathematical analysis Outline 1 Trace Ratio Optimization Problem (TROP) 2 Applications 3 Fisher s Linear Discriminant Analysis 4 Generalized TROP (GTROP - STROP) 5 Mathematical analysis Lanczos & Newton method for TROP Single Value Decomposition & Newton Method for STROP 6 Summary PGMO S DAYS Generalized trace ratio optimization and applications 24 / 37
25 Mathematical analysis Lanczos & Newton method for TROP Closed form solution Given a symmetric matrix A of dimension m m and an arbitrary unitary matrix X of dimension m d Standard eigenvalue problem max{tr(x T AX) : X T X = I,X R m d } = λ λ d. The eigenvalues λ 1,,λ d are labeled decreasingly X is an orthogonal basis of the eigenspace of A associated with the d algebraically largest eigenvalues. Generalized eigenvalue problem Assuming that B positive definite, we have max Tr(X T AX) = Tr(X X R n d T AX ) = λ λ d X T BX = I (Eigenvalues labeled decreasingly for the generalized problem Aw = λbw, X eigenvectors associated with the first d eigenvalues, with X T BX = I) PGMO S DAYS Generalized trace ratio optimization and applications 25 / 37
26 Mathematical analysis Lanczos & Newton method for TROP Existence and Uniqueness of a Solution of TROP The Trace Ratio Optimization Problem (TROP) does not have a closed-form solution. Trace ratio vs. Ratio trace : Often TROP is simplified into a more accommodating one : max X T X =I Tr((X T BX) 1 (X T AX)). The obtained solution does not necessarily best maximize the corresponding trace ratio problem. TROP admits a finite maximum value ρ : It does not have a local non-global solution. ρ is reached for certain (nonunique) orthogonal matrices X. (T. T. Ngo, M. B. and Y. Saad 2012) PGMO S DAYS Generalized trace ratio optimization and applications 26 / 37
27 Mathematical analysis Lanczos & Newton method for TROP From trace ratio to trace difference Solving the trace ratio problem is equivalent to finding the solution of the scalar equation f (ρ) = 0, where f (ρ) = max X T X =I Tr(X T (A ρb)x) Proposition 1 f is a strictly decreasing function of ρ 2 f (ρ) = 0 iff ρ = ρ 3 f (ρ) = Tr(X(ρ) T BX(ρ)) So, finding the optimal solution a search for the unique root of f (ρ). PGMO S DAYS Generalized trace ratio optimization and applications 27 / 37
28 Mathematical analysis Lanczos & Newton method for TROP Newton-Lanczos algorithm for TROP Select initial m d unitary matrix X compute ρ = Tr(X T AX) Tr(X T BX) While Not convergence Do Call the Lanczos algorithm to compute the d largest eigenvalues of A ρb and associated eigenvectors [w 1,w 2,,w d ] X Set ρ := Tr(X T AX) Tr(X T BX) End While PGMO S DAYS Generalized trace ratio optimization and applications 28 / 37
29 Mathematical analysis Single Value Decomposition & Newton Method for STROP Singular trace ratio optimization problem Given real matrices A and B of dimension m n. Let O p,k = { Z R p k : Z T Z = I k } and Om,n,k = O m,k O n,k. Goal : Find a pair of orthogonal matrices X O m,k and Y O n,k optimal solution of the problem : Tr[X T AY ] max (X,Y ) O m,n,k 1 +Tr[X T BY ]. We will assume that the matrix B verifies any (X,Y ) O m,n,k. 1 +Tr[X T BY ] > 0 for PGMO S DAYS Generalized trace ratio optimization and applications 29 / 37
30 Mathematical analysis Single Value Decomposition & Newton Method for STROP Singular value decomposition (SVD) for trace optimization Let C be a m n real matrix with rank (C) = r n < m. 1 There are positive real numbers σ 1 σ 2 σ r > 0, and square orthogonal matrices U O m,m and V O n,n, such that ( ) U T Σ CV r O r,n r = Σ = R m n, O m r,r O m r,n r Σ r =diag(σ 1,σ 2,,σ r ) σ 1,σ 2,,σ r : the singular values of C. 2 The trace of X T CY reaches its maximum (resp., minimum) when X and Y are an orthogonal basis of the left and right singular eigenspaces of C associated with the k algebraically largest (resp., smallest) singular values. PGMO S DAYS Generalized trace ratio optimization and applications 30 / 37
31 Mathematical analysis Single Value Decomposition & Newton Method for STROP Closed form solution : singular value problem Theorem Let C be a m n real matrix with m > n r. Suppose U T CV = Σ is the SVD of C. Let the columns of the matrix U be partitioned in two blocks, U = [U 1, U 2 ] with U 1 R m n and U 2 R m (m n). Then, max Tr [ X T CY ] = k (X,Y ) O m,n,k σ i i=1 = Tr [ U T 1,k C V k ]. PGMO S DAYS Generalized trace ratio optimization and applications 31 / 37
32 Mathematical analysis Single Value Decomposition & Newton Method for STROP Existence and Uniqueness of a Solution of STROP The problem STROP admits a finite maximum value ρ. It is reached for certain (nonunique) orthogonal matrices X and Y. Thanks to the cyclic property of the trace, any simultaneous orthogonal transformation of the columns of X and Y will not change the objective function ( U = X R,V = Y R for any regular matrix R R k k such that R 1 = R T ). We have Tr[X T (A ρ B)Y ] ρ because 1 +Tr[X T BY ] > 0. Therefore, we have the following necessary condition for the triplet ρ,x and Y to be optimal : max (X,Y ) O m,n,k Tr[X T (A ρ B)Y ] = Tr[X T (A ρ B)Y ] = ρ. Let g(ρ) = max Tr[X T (A ρ B)Y ]. (X,Y ) O m,n,k Then, we just need to solve the scalar equation g(ρ) = ρ. PGMO S DAYS Generalized trace ratio optimization and applications 32 / 37
33 Mathematical analysis Single Value Decomposition & Newton Method for STROP Properties of f (ρ) = g(ρ) ρ Evaluating g(ρ) consists in computing the left and right singular vectors X(ρ) and Y (ρ) associated with the k largest singular values of A ρ B. So, g(ρ) = Tr[X T (ρ)(a ρ B)Y (ρ)]. Under the assumption that B verifies 1 +Tr[X T BY ] > 0, we have 1 f is differentiable at ρ with f (ρ) = Tr [ X(ρ) T B Y (ρ) ] 1 and f is a strictly decreasing function. 2 f is convex. 3 f (ρ) = 0 iff ρ = ρ. PGMO S DAYS Generalized trace ratio optimization and applications 33 / 37
34 Mathematical analysis Single Value Decomposition & Newton Method for STROP Fractional iteration of the Newton-approximation-formula Newton s method to approximate the unique fixed point of g : ρ new = ρ Tr[X T (ρ)(a ρ B)Y (ρ)] ρ Tr[X T (ρ) B Y (ρ)] 1 = Tr[X T (ρ) A Y (ρ)] 1 +Tr[X T (ρ) B Y (ρ)]. The Newton-SVD algorithm includes the following three iterative steps : 1 Compute the trace ratio ρ = Tr[X T AY ] 1 +Tr[X T BY ] ; 2 Run the SVD algorithm to compute the k largest singular values of A ρb as well as their associated singular eigenvectors X i and Y i ; 3 Repeat the above two steps until convergence. PGMO S DAYS Generalized trace ratio optimization and applications 34 / 37
35 Summary Outline 1 Trace Ratio Optimization Problem (TROP) 2 Applications 3 Fisher s Linear Discriminant Analysis 4 Generalized TROP (GTROP - STROP) 5 Mathematical analysis 6 Summary PGMO S DAYS Generalized trace ratio optimization and applications 35 / 37
36 Summary SUMMARY The formulation of trace ratio is shared by many real-world application problems. A new formulation for the cell formation problem as the trace ratio : the Singular Trace Ratio Optimization Problem (STROP). Continuous TROP and STROP do not admit local non-global maxima. A direct solution to TROP or STROP is presented : The maximum is the unique zero of a scalar function. Both can be solved numerically by a judicious use of the Lanczos procedure, a good initialization, and inexact eigenvector calculations in the early stages of the Newton procedure. PGMO S DAYS Generalized trace ratio optimization and applications 36 / 37
37 Summary REFERENCES T. T. Ngo, M. B. and Y. Saad. The trace ratio optimization problem, SIAM Review, 54(3) (2012), pp X. Li, W. Hu, C. Shen, A. Dick and Z. Zhang. Context-Aware Hypergraph Construction for Robust Spectral Clustering, Knowledge and Data Engineering, IEEE Transactions on (Volume :PP, Issue : 99, 2013 ) L.-H. Zhang, and R.-C. Li. Maximization of the Sum of the Trace Ratio on the Stiefel, Technical Report Thanks! PGMO S DAYS Generalized trace ratio optimization and applications 37 / 37
Dimension Reduction CS534
Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of
More informationVisual Representations for Machine Learning
Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering
More informationREGULAR GRAPHS OF GIVEN GIRTH. Contents
REGULAR GRAPHS OF GIVEN GIRTH BROOKE ULLERY Contents 1. Introduction This paper gives an introduction to the area of graph theory dealing with properties of regular graphs of given girth. A large portion
More informationClustering: Classic Methods and Modern Views
Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering
More informationSocial-Network Graphs
Social-Network Graphs Mining Social Networks Facebook, Google+, Twitter Email Networks, Collaboration Networks Identify communities Similar to clustering Communities usually overlap Identify similarities
More informationLecture 3: Camera Calibration, DLT, SVD
Computer Vision Lecture 3 23--28 Lecture 3: Camera Calibration, DL, SVD he Inner Parameters In this section we will introduce the inner parameters of the cameras Recall from the camera equations λx = P
More informationMathematical and Algorithmic Foundations Linear Programming and Matchings
Adavnced Algorithms Lectures Mathematical and Algorithmic Foundations Linear Programming and Matchings Paul G. Spirakis Department of Computer Science University of Patras and Liverpool Paul G. Spirakis
More informationClustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic
Clustering SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering is one of the fundamental and ubiquitous tasks in exploratory data analysis a first intuition about the
More informationSpectral Methods for Network Community Detection and Graph Partitioning
Spectral Methods for Network Community Detection and Graph Partitioning M. E. J. Newman Department of Physics, University of Michigan Presenters: Yunqi Guo Xueyin Yu Yuanqi Li 1 Outline: Community Detection
More informationUnsupervised Learning
Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationON SOME METHODS OF CONSTRUCTION OF BLOCK DESIGNS
ON SOME METHODS OF CONSTRUCTION OF BLOCK DESIGNS NURNABI MEHERUL ALAM M.Sc. (Agricultural Statistics), Roll No. I.A.S.R.I, Library Avenue, New Delhi- Chairperson: Dr. P.K. Batra Abstract: Block designs
More informationDM545 Linear and Integer Programming. Lecture 2. The Simplex Method. Marco Chiarandini
DM545 Linear and Integer Programming Lecture 2 The Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. 2. 3. 4. Standard Form Basic Feasible Solutions
More informationHomework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)
Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Fall 2014, Prakash Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 2. Convex Optimization
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 2 Convex Optimization Shiqian Ma, MAT-258A: Numerical Optimization 2 2.1. Convex Optimization General optimization problem: min f 0 (x) s.t., f i
More informationSupervised vs. Unsupervised Learning
Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now
More informationCS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning
CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning Parallel sparse matrix-vector product Lay out matrix and vectors by rows y(i) = sum(a(i,j)*x(j)) Only compute terms with A(i,j) 0 P0 P1
More informationLECTURE 13, THURSDAY APRIL 1, 2004
LECTURE 13, THURSDAY APRIL 1, 2004 FRANZ LEMMERMEYER 1. Parametrizing Curves of Genus 0 As a special case of the theorem that curves of genus 0, in particular those with the maximal number of double points,
More informationCS 534: Computer Vision Segmentation II Graph Cuts and Image Segmentation
CS 534: Computer Vision Segmentation II Graph Cuts and Image Segmentation Spring 2005 Ahmed Elgammal Dept of Computer Science CS 534 Segmentation II - 1 Outlines What is Graph cuts Graph-based clustering
More informationSpectral Clustering on Handwritten Digits Database
October 6, 2015 Spectral Clustering on Handwritten Digits Database Danielle dmiddle1@math.umd.edu Advisor: Kasso Okoudjou kasso@umd.edu Department of Mathematics University of Maryland- College Park Advance
More informationUnsupervised Learning
Unsupervised Learning Fabio G. Cozman - fgcozman@usp.br November 16, 2018 What can we do? We just have a dataset with features (no labels, no response). We want to understand the data... no easy to define
More informationConic Duality. yyye
Conic Linear Optimization and Appl. MS&E314 Lecture Note #02 1 Conic Duality Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/
More informationAarti Singh. Machine Learning / Slides Courtesy: Eric Xing, M. Hein & U.V. Luxburg
Spectral Clustering Aarti Singh Machine Learning 10-701/15-781 Apr 7, 2010 Slides Courtesy: Eric Xing, M. Hein & U.V. Luxburg 1 Data Clustering Graph Clustering Goal: Given data points X1,, Xn and similarities
More informationEuclidean Space. Definition 1 (Euclidean Space) A Euclidean space is a finite-dimensional vector space over the reals R, with an inner product,.
Definition 1 () A Euclidean space is a finite-dimensional vector space over the reals R, with an inner product,. 1 Inner Product Definition 2 (Inner Product) An inner product, on a real vector space X
More informationEfficient Iterative Semi-supervised Classification on Manifold
. Efficient Iterative Semi-supervised Classification on Manifold... M. Farajtabar, H. R. Rabiee, A. Shaban, A. Soltani-Farani Sharif University of Technology, Tehran, Iran. Presented by Pooria Joulani
More informationAdvanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs
Advanced Operations Research Techniques IE316 Quiz 1 Review Dr. Ted Ralphs IE316 Quiz 1 Review 1 Reading for The Quiz Material covered in detail in lecture. 1.1, 1.4, 2.1-2.6, 3.1-3.3, 3.5 Background material
More informationClustering. Chapter 10 in Introduction to statistical learning
Clustering Chapter 10 in Introduction to statistical learning 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 1 Clustering ² Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 1990). ² What
More informationDimension reduction : PCA and Clustering
Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental
More informationA few multilinear algebraic definitions I Inner product A, B = i,j,k a ijk b ijk Frobenius norm Contracted tensor products A F = A, A C = A, B 2,3 = A
Krylov-type methods and perturbation analysis Berkant Savas Department of Mathematics Linköping University Workshop on Tensor Approximation in High Dimension Hausdorff Institute for Mathematics, Universität
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Unsupervised Learning: Clustering Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com (Some material
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany
More informationConvex sets and convex functions
Convex sets and convex functions Convex optimization problems Convex sets and their examples Separating and supporting hyperplanes Projections on convex sets Convex functions, conjugate functions ECE 602,
More informationCMSC Honors Discrete Mathematics
CMSC 27130 Honors Discrete Mathematics Lectures by Alexander Razborov Notes by Justin Lubin The University of Chicago, Autumn 2017 1 Contents I Number Theory 4 1 The Euclidean Algorithm 4 2 Mathematical
More informationGraph drawing in spectral layout
Graph drawing in spectral layout Maureen Gallagher Colleen Tygh John Urschel Ludmil Zikatanov Beginning: July 8, 203; Today is: October 2, 203 Introduction Our research focuses on the use of spectral graph
More informationSegmentation and Grouping
Segmentation and Grouping How and what do we see? Fundamental Problems ' Focus of attention, or grouping ' What subsets of pixels do we consider as possible objects? ' All connected subsets? ' Representation
More informationConvex Optimization / Homework 2, due Oct 3
Convex Optimization 0-725/36-725 Homework 2, due Oct 3 Instructions: You must complete Problems 3 and either Problem 4 or Problem 5 (your choice between the two) When you submit the homework, upload a
More informationSubspace Clustering with Global Dimension Minimization And Application to Motion Segmentation
Subspace Clustering with Global Dimension Minimization And Application to Motion Segmentation Bryan Poling University of Minnesota Joint work with Gilad Lerman University of Minnesota The Problem of Subspace
More informationINTRODUCTION TO THE HOMOLOGY GROUPS OF COMPLEXES
INTRODUCTION TO THE HOMOLOGY GROUPS OF COMPLEXES RACHEL CARANDANG Abstract. This paper provides an overview of the homology groups of a 2- dimensional complex. It then demonstrates a proof of the Invariance
More informationApplications Video Surveillance (On-line or off-line)
Face Face Recognition: Dimensionality Reduction Biometrics CSE 190-a Lecture 12 CSE190a Fall 06 CSE190a Fall 06 Face Recognition Face is the most common biometric used by humans Applications range from
More informationGraphs and Network Flows IE411. Lecture 21. Dr. Ted Ralphs
Graphs and Network Flows IE411 Lecture 21 Dr. Ted Ralphs IE411 Lecture 21 1 Combinatorial Optimization and Network Flows In general, most combinatorial optimization and integer programming problems are
More informationInteger Programming Theory
Integer Programming Theory Laura Galli October 24, 2016 In the following we assume all functions are linear, hence we often drop the term linear. In discrete optimization, we seek to find a solution x
More informationMethods for Intelligent Systems
Methods for Intelligent Systems Lecture Notes on Clustering (II) Davide Eynard eynard@elet.polimi.it Department of Electronics and Information Politecnico di Milano Davide Eynard - Lecture Notes on Clustering
More informationc 2004 Society for Industrial and Applied Mathematics
SIAM J. MATRIX ANAL. APPL. Vol. 26, No. 2, pp. 390 399 c 2004 Society for Industrial and Applied Mathematics HERMITIAN MATRICES, EIGENVALUE MULTIPLICITIES, AND EIGENVECTOR COMPONENTS CHARLES R. JOHNSON
More informationEdge and local feature detection - 2. Importance of edge detection in computer vision
Edge and local feature detection Gradient based edge detection Edge detection by function fitting Second derivative edge detectors Edge linking and the construction of the chain graph Edge and local feature
More informationSpectral Clustering X I AO ZE N G + E L HA M TA BA S SI CS E CL A S S P R ESENTATION MA RCH 1 6,
Spectral Clustering XIAO ZENG + ELHAM TABASSI CSE 902 CLASS PRESENTATION MARCH 16, 2017 1 Presentation based on 1. Von Luxburg, Ulrike. "A tutorial on spectral clustering." Statistics and computing 17.4
More informationDiscriminate Analysis
Discriminate Analysis Outline Introduction Linear Discriminant Analysis Examples 1 Introduction What is Discriminant Analysis? Statistical technique to classify objects into mutually exclusive and exhaustive
More informationNon-exhaustive, Overlapping k-means
Non-exhaustive, Overlapping k-means J. J. Whang, I. S. Dhilon, and D. F. Gleich Teresa Lebair University of Maryland, Baltimore County October 29th, 2015 Teresa Lebair UMBC 1/38 Outline Introduction NEO-K-Means
More informationOn the Complexity of Multi-Dimensional Interval Routing Schemes
On the Complexity of Multi-Dimensional Interval Routing Schemes Abstract Multi-dimensional interval routing schemes (MIRS) introduced in [4] are an extension of interval routing schemes (IRS). We give
More informationGeneral Instructions. Questions
CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 Only one late period is allowed for this homework (11:59pm 2/13). General Instructions Submission instructions: These
More informationDiscriminative Clustering for Image Co-segmentation
Discriminative Clustering for Image Co-segmentation Armand Joulin 1,2 Francis Bach 1,2 Jean Ponce 2 1 INRIA 2 Ecole Normale Supérieure January 15, 2010 Introduction Introduction problem: dividing simultaneously
More informationMinima, Maxima, Saddle points
Minima, Maxima, Saddle points Levent Kandiller Industrial Engineering Department Çankaya University, Turkey Minima, Maxima, Saddle points p./9 Scalar Functions Let us remember the properties for maxima,
More informationTargil 12 : Image Segmentation. Image segmentation. Why do we need it? Image segmentation
Targil : Image Segmentation Image segmentation Many slides from Steve Seitz Segment region of the image which: elongs to a single object. Looks uniform (gray levels, color ) Have the same attributes (texture
More informationAdaptive Video Compression using PCA Method
Adaptive Video Compression using Method Mostafa Mofarreh-Bonab Department of Electrical and Computer Engineering Shahid Beheshti University,Tehran, Iran Mohamad Mofarreh-Bonab Electrical and Electronic
More informationDefinition. Given a (v,k,λ)- BIBD, (X,B), a set of disjoint blocks of B which partition X is called a parallel class.
Resolvable BIBDs Definition Given a (v,k,λ)- BIBD, (X,B), a set of disjoint blocks of B which partition X is called a parallel class. A partition of B into parallel classes (there must be r of them) is
More informationSegmentation: Clustering, Graph Cut and EM
Segmentation: Clustering, Graph Cut and EM Ying Wu Electrical Engineering and Computer Science Northwestern University, Evanston, IL 60208 yingwu@northwestern.edu http://www.eecs.northwestern.edu/~yingwu
More informationMining Social Network Graphs
Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationarxiv: v4 [math.co] 25 Apr 2010
QUIVERS OF FINITE MUTATION TYPE AND SKEW-SYMMETRIC MATRICES arxiv:0905.3613v4 [math.co] 25 Apr 2010 AHMET I. SEVEN Abstract. Quivers of finite mutation type are certain directed graphs that first arised
More informationData Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017
Data Analysis 3 Support Vector Machines Jan Platoš October 30, 2017 Department of Computer Science Faculty of Electrical Engineering and Computer Science VŠB - Technical University of Ostrava Table of
More informationIntroduction to Modern Control Systems
Introduction to Modern Control Systems Convex Optimization, Duality and Linear Matrix Inequalities Kostas Margellos University of Oxford AIMS CDT 2016-17 Introduction to Modern Control Systems November
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim,
More information1. Lecture notes on bipartite matching
Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans February 5, 2017 1. Lecture notes on bipartite matching Matching problems are among the fundamental problems in
More informationMATRIX INTEGRALS AND MAP ENUMERATION 1
MATRIX INTEGRALS AND MAP ENUMERATION 1 IVAN CORWIN Abstract. We explore the connection between integration with respect to the GUE and enumeration of maps. This connection is faciliated by the Wick Formula.
More informationCIS 580, Machine Perception, Spring 2014: Assignment 4 Due: Wednesday, April 10th, 10:30am (use turnin)
CIS 580, Machine Perception, Spring 2014: Assignment 4 Due: Wednesday, April 10th, 10:30am (use turnin) Solutions (hand calculations, plots) have to be submitted electronically as a single pdf file using
More informationBehavioral Data Mining. Lecture 18 Clustering
Behavioral Data Mining Lecture 18 Clustering Outline Why? Cluster quality K-means Spectral clustering Generative Models Rationale Given a set {X i } for i = 1,,n, a clustering is a partition of the X i
More informationConvex sets and convex functions
Convex sets and convex functions Convex optimization problems Convex sets and their examples Separating and supporting hyperplanes Projections on convex sets Convex functions, conjugate functions ECE 602,
More informationGlobally and Locally Consistent Unsupervised Projection
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Globally and Locally Consistent Unsupervised Projection Hua Wang, Feiping Nie, Heng Huang Department of Electrical Engineering
More informationMy favorite application using eigenvalues: partitioning and community detection in social networks
My favorite application using eigenvalues: partitioning and community detection in social networks Will Hobbs February 17, 2013 Abstract Social networks are often organized into families, friendship groups,
More informationModelling and Visualization of High Dimensional Data. Sample Examination Paper
Duration not specified UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE Modelling and Visualization of High Dimensional Data Sample Examination Paper Examination date not specified Time: Examination
More informationUnsupervised Learning
Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling
More informationConvex Optimization. 2. Convex Sets. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University. SJTU Ying Cui 1 / 33
Convex Optimization 2. Convex Sets Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2018 SJTU Ying Cui 1 / 33 Outline Affine and convex sets Some important examples Operations
More informationA fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation [Wen,Yin,Goldfarb,Zhang 2009]
A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation [Wen,Yin,Goldfarb,Zhang 2009] Yongjia Song University of Wisconsin-Madison April 22, 2010 Yongjia Song
More informationRadial Basis Function Networks: Algorithms
Radial Basis Function Networks: Algorithms Neural Computation : Lecture 14 John A. Bullinaria, 2015 1. The RBF Mapping 2. The RBF Network Architecture 3. Computational Power of RBF Networks 4. Training
More informationClustering. Supervised vs. Unsupervised Learning
Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now
More informationConvex Optimization - Chapter 1-2. Xiangru Lian August 28, 2015
Convex Optimization - Chapter 1-2 Xiangru Lian August 28, 2015 1 Mathematical optimization minimize f 0 (x) s.t. f j (x) 0, j=1,,m, (1) x S x. (x 1,,x n ). optimization variable. f 0. R n R. objective
More informationThe clustering in general is the task of grouping a set of objects in such a way that objects
Spectral Clustering: A Graph Partitioning Point of View Yangzihao Wang Computer Science Department, University of California, Davis yzhwang@ucdavis.edu Abstract This course project provide the basic theory
More informationResearch Interests Optimization:
Mitchell: Research interests 1 Research Interests Optimization: looking for the best solution from among a number of candidates. Prototypical optimization problem: min f(x) subject to g(x) 0 x X IR n Here,
More informationIntroduction aux Systèmes Collaboratifs Multi-Agents
M1 EEAII - Découverte de la Recherche (ViRob) Introduction aux Systèmes Collaboratifs Multi-Agents UPJV, Département EEA Fabio MORBIDI Laboratoire MIS Équipe Perception et Robotique E-mail: fabio.morbidi@u-picardie.fr
More information60 2 Convex sets. {x a T x b} {x ã T x b}
60 2 Convex sets Exercises Definition of convexity 21 Let C R n be a convex set, with x 1,, x k C, and let θ 1,, θ k R satisfy θ i 0, θ 1 + + θ k = 1 Show that θ 1x 1 + + θ k x k C (The definition of convexity
More informationWeek 5. Convex Optimization
Week 5. Convex Optimization Lecturer: Prof. Santosh Vempala Scribe: Xin Wang, Zihao Li Feb. 9 and, 206 Week 5. Convex Optimization. The convex optimization formulation A general optimization problem is
More informationREVIEW OF FUZZY SETS
REVIEW OF FUZZY SETS CONNER HANSEN 1. Introduction L. A. Zadeh s paper Fuzzy Sets* [1] introduces the concept of a fuzzy set, provides definitions for various fuzzy set operations, and proves several properties
More informationSection Notes 5. Review of Linear Programming. Applied Math / Engineering Sciences 121. Week of October 15, 2017
Section Notes 5 Review of Linear Programming Applied Math / Engineering Sciences 121 Week of October 15, 2017 The following list of topics is an overview of the material that was covered in the lectures
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 1: Course Overview; Matrix Multiplication Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 21 Outline 1 Course
More informationAlgebraic Graph Theory- Adjacency Matrix and Spectrum
Algebraic Graph Theory- Adjacency Matrix and Spectrum Michael Levet December 24, 2013 Introduction This tutorial will introduce the adjacency matrix, as well as spectral graph theory. For those familiar
More informationA Dendrogram. Bioinformatics (Lec 17)
A Dendrogram 3/15/05 1 Hierarchical Clustering [Johnson, SC, 1967] Given n points in R d, compute the distance between every pair of points While (not done) Pick closest pair of points s i and s j and
More informationKernels and representation
Kernels and representation Corso di AA, anno 2017/18, Padova Fabio Aiolli 20 Dicembre 2017 Fabio Aiolli Kernels and representation 20 Dicembre 2017 1 / 19 (Hierarchical) Representation Learning Hierarchical
More informationSequential Coordinate-wise Algorithm for Non-negative Least Squares Problem
CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY Sequential Coordinate-wise Algorithm for Non-negative Least Squares Problem Woring document of the EU project COSPAL IST-004176 Vojtěch Franc, Miro
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationUnlabeled Data Classification by Support Vector Machines
Unlabeled Data Classification by Support Vector Machines Glenn Fung & Olvi L. Mangasarian University of Wisconsin Madison www.cs.wisc.edu/ olvi www.cs.wisc.edu/ gfung The General Problem Given: Points
More informationTutorial on Convex Optimization for Engineers
Tutorial on Convex Optimization for Engineers M.Sc. Jens Steinwandt Communications Research Laboratory Ilmenau University of Technology PO Box 100565 D-98684 Ilmenau, Germany jens.steinwandt@tu-ilmenau.de
More information15.082J and 6.855J. Lagrangian Relaxation 2 Algorithms Application to LPs
15.082J and 6.855J Lagrangian Relaxation 2 Algorithms Application to LPs 1 The Constrained Shortest Path Problem (1,10) 2 (1,1) 4 (2,3) (1,7) 1 (10,3) (1,2) (10,1) (5,7) 3 (12,3) 5 (2,2) 6 Find the shortest
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Spring 2018 http://vllab.ee.ntu.edu.tw/dlcv.html (primary) https://ceiba.ntu.edu.tw/1062dlcv (grade, etc.) FB: DLCV Spring 2018 Yu Chiang Frank Wang 王鈺強, Associate Professor
More informationROUGH MEMBERSHIP FUNCTIONS: A TOOL FOR REASONING WITH UNCERTAINTY
ALGEBRAIC METHODS IN LOGIC AND IN COMPUTER SCIENCE BANACH CENTER PUBLICATIONS, VOLUME 28 INSTITUTE OF MATHEMATICS POLISH ACADEMY OF SCIENCES WARSZAWA 1993 ROUGH MEMBERSHIP FUNCTIONS: A TOOL FOR REASONING
More informationApplied Lagrange Duality for Constrained Optimization
Applied Lagrange Duality for Constrained Optimization Robert M. Freund February 10, 2004 c 2004 Massachusetts Institute of Technology. 1 1 Overview The Practical Importance of Duality Review of Convexity
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationMining di Dati Web. Lezione 3 - Clustering and Classification
Mining di Dati Web Lezione 3 - Clustering and Classification Introduction Clustering and classification are both learning techniques They learn functions describing data Clustering is also known as Unsupervised
More informationImage Analysis, Classification and Change Detection in Remote Sensing
Image Analysis, Classification and Change Detection in Remote Sensing WITH ALGORITHMS FOR ENVI/IDL Morton J. Canty Taylor &. Francis Taylor & Francis Group Boca Raton London New York CRC is an imprint
More informationThe Singular Value Decomposition: Let A be any m n matrix. orthogonal matrices U, V and a diagonal matrix Σ such that A = UΣV T.
Section 7.4 Notes (The SVD) The Singular Value Decomposition: Let A be any m n matrix. orthogonal matrices U, V and a diagonal matrix Σ such that Then there are A = UΣV T Specifically: The ordering of
More informationMathematics. Linear Programming
Mathematics Linear Programming Table of Content 1. Linear inequations. 2. Terms of Linear Programming. 3. Mathematical formulation of a linear programming problem. 4. Graphical solution of two variable
More information