Efficient Algorithm for Distance Metric Learning
|
|
- Kerry O’Brien’
- 5 years ago
- Views:
Transcription
1 Efficient lgorithm for Distance Metric Learning Yipei Wang Language Technologies Institute, Carnegie Mellon University Pittsburgh, P yipeiw@cmu.edu bstract Distance metric learning provides an approach to transfer knowledge from sparse labeled data to unlabeled data. The learned metric is more proper to measure the similarity of semantics among instances. The main idea of the algorithm is to create an objective function using the equivalence constraints and in-equivalence constraints and pose the problem as an optimization problem. In this paper, we proposed to unify different metric learning algorithms into semidefinite programming (SDP) framework. The classical semidefinite programming algorithms are extremely expensive on larger problem. So we discuss efficient algorithms for large-scale metric learning. We investigated a recent proposed algorithm arise from Frank-Wolfe algorithm and proposed novel strategies for acceleration based on the special structure of the problem. We compared different algorithms on 3 UCI dataset in clustering problem. 1 Introduction proper distance metric has crucial effect on the performance of distance-based supervised learning and unsupervised learning. For instance, the performance of K-means clustering algorithm, KNN classifiers and SVM classifiers are critically influenced by good metrics. Metric learning provides approaches to transfer knowledge learned from sparse labeled data to unlabeled data. Recently this problem has been actively studied [1][2][3][4][5][16]. These methods have been applied to many real-world problems such as image retrieval [7], face verification [6] and bioinformatics [8]. Previous works often have different formulations and provide specific optimization technologies to solve the problem. For example, Xing [1] pose the problem into a convex optimization problem and design iterative gradient decent algorithm to solve the problem. In [2], they incorporate the idea of margin in formalizing the cost function. They solve the minimization of cost function though alternating projection algorithm. In [4], they learn the Mahalanobis distance metric bu directly maximizing a stochastic variant of the leave-one-out KNN score on training set. The function is not convex and they adopt gradient search to find the maximal. Though the distance metric can be a general function, the prevalent form of the distance function is (x y) T (x y),. This is linear transformation and non linear transformation can be implemented by using kernel. Inspired by the format of the distance function, can we derive a unified semidefinite programming framework? In the following part, we discuss the reformulation into standard semidefinite programming (SDP) of two previous algorithms and we implemented the algorithm with open source SDP solver Sedumi [9]. nother problem is the efficiency of the algorithm in dealing with large scale problem, either high dimension or large data size. Recent works have studied multiple technologies for large-scale semidefinite programming problem [11]. The work mainly falls into two lines of research directions. One direction is to develop first-order method designed for solving generic optimization problem[]. They provide approximation approaches to reduce the iteration cost. The second direction is designing 1
2 algorithm by exploiting the special structure of the problem [11][14]. These algorithms also include Frank-Wolfe algorithm, block coordinate descent method, cutting-plane method, etc. Recently, the development of subsampling technologies also lead to some efficient algorithms [15] for large scale problem. Here we followed a recent proposed approach [5], which is a combination of the two main directions. They first reformulated the problem into eigenvalue optimization problem and design efficient algorithm by combining smoothing technology and Frank-Wolfe algorithm. By investigating the sparse structure of the problem, we further propose several acceleration strategies to improve the efficiency. 2 Related Work We mainly focus on two metric learning methods in the rest of the paper. They learn the metric through convex optimization and one of the algorithm (LMNN) achieves the state-of-art performance on multiple dataset. 2.1 Review of method by Xing Problem Formulation In Xing s method, we re supposed to be given pairs of equivalent constraints: S: (x i, x j ) S if x i and x j are similar The defined criterion for the desired metric is to demand that pairs of points in S have smaller distance. This is cast into a convex optimization problem as below: min s.t. (x i,x j) D x i x j 2 (x i,x j) S x i x j 1 (1) It uses the in-equivalence constraints as condition. Here, D can be set of pairs of points known to be dissimilar if such information is explicitly available; otherwise, we just take all pairs not in S. Without this condition, the function can be solved trivially with =, which is not useful. s is mentioned in the paper, they didn t formulates in the format x i x j 2 1 because it would result in always being rank 1. Optimization The derive the the optimization part for computational cost analysis. Details can be referred to appendix. 1. Newton Method for diagonal. Xing proved that the original optimization problem is equivalent to minimize the function: g() = x i x j 2 c log( x i x j ) ( ) The computational cost is O(n). But when is full rank, it requires O(n 3 ) time to invert the hessian matrix. The computational cost is too expensive and unacceptable. Projected gradient search For full rank matrix, Xing proposed to use iterative projected gradient search to solve the opti- 2
3 mization problem efficiently. The problem is posed as the equivalent form as below: max = x i x j s.t. f() = x i x j 2 1 (x i,x j) S (2) The algorithm takes the gradient step on g() and then repeatedly project into the sets C 1 = { : (x i,x j) S x i x j 2 1} and C 2 = { : }. The algorithm is shown below: Iterate Iterate projection := P C1 () := P C2 () until converges := + t ( g()) until convergence f() The projection to set C 1 can be solved analytically. = < X s, > X s 2 X s + F Considering projection to set C 2. It is completed through decomposition of the matrix. 2.2 Review of LMNN method = U T ΣU Σ + = max(, Σ) = U T Σ + U This work aims to learn Mahanalobis matirx for knn classification. Compared to Xing s method, we are given more information (the class label for each point in the training set) than just the equivalence constraints. Here, we use y ij {, 1} to indicate whether or not the class label y i and y j match. We use η ij {, 1} to indicate whether input x j is a k-nearest neighbor of input x i. The criterion is that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. The cost function is given by: cost() = i,j η ij x i x j 2 + c ijl η ij (1 y il ) max([1 + x i x j 2 x i x l 2 ], ) The optimization of the cost function is given by: min η ij x i x j 2 + c η ij (1 y il )ɛ ijl i,j ijl s.t. x i x l 2 x i x j 2 1 ɛ ijl ɛ ijl (3) Most of the slack variable η ijl never attain positive values. The method is based on a combination of sub-gradient decent in matrices L and M (M = L T L), the later mainly to verify that we reached the global minimum. The alternating projection algorithm is proved to be converged [17]. 3 The unified Semidifinite programming framework We will discuss how to transform two metric learning algorithms into standard semidefinite programming problem. 3
4 3.1 Xing s method We use the denotation: X s = (x i,x j) S x i x j 2, X α = x i x j 2, (x i, x j ) D. (x i,x j) S x i x j 2 can be written as < X S, >. We modify the original problem as below so that it s easy to reformulate into SDP problem. max s.t. min x i x j 2 x i x j 2 1 (4) The problem can be rewritten as: max t, t s.t. < X α, > t, α = 1,, D < X s, > 1 (5) The problem can be transformed into standard form as below: where X = ( t max X < C, X > s.t. < MD α, X > =, α = 1,, D < MS, X > = 1 X d ( D +1) ( D +1) ), C = ( n n 1 ( D +1) ( D +1) Here d is diagonal matrix and each diagonal element is the slack variable to transform the inequality into equality. X α X s 1 MD α = E, MS = α D D 1 For matrix E α R D D, E αα = 1, other elements equal to. ). (6) 3.2 LMNN method We use the denotation C = i,j η ij(x i x j )(x i x j ) T The original problem definition in section 1.1 can be transformed into SDP form as below: ( X = ɛ ) ( C, C = Y N min < C, X > s.t. < B ijl, X >, for all ijl X ). Here ɛ includes all the slack variable ɛ ijl and Y N = diag([η ij (1 y il ), ]). The index should be consistent with the ijl index order in ɛ. ( ) (xi x B ijl = l )(x i x l ) T (x i x j )(x i x j ) T. Eijl Here E ijl is a diagonal matrix where only the corresponding ijl diagonal element equals to 1 and (7) 4
5 others. The problem can be easily transformed into standard form by adding slack variables to the inequality constraints. 4 Large-scale problem 4.1 lgorithms for large-scale problem and DML-eig method Most of semidefinite programming solvers are based on interiror point method and compuing Hessian become very hard on larger problem. series of algorithms have been proposed to address the problem. lot of efforts focused on exploiting structural properties of the problem and the proper algorithm depends on the type of the problem. More general method is first order methods, which seeks to significantly reduce the per iteration complexity of optimization algorithms rather than the total computational cost. nother recent trend it to use subsampling to reduce the computational cost of each iteration. Recently Ying [5] proposed a method arised from a special structure based method, Frank-Wolfe algorithm. They modify the algorithm by smoothing technologies, which allows gradient search on the approximated function instead of subgradient method on the initial problem. Here we briefly review their method. Ying proved the theorem: ssume that X s is invertible and, for any τ D, let X τ = X 1/2 S X τ X 1/2 S. Then, problem () is equivalent to the following problem max min u τ < X τ, S > S P u τ D where = {u R D : u τ, τ D u τ = 1}, P = {M S d + : T r(m) = 1} Ying further propose en efficient algorithm for DML-eig, which a new first-order method by combining the Frank-Wolfe algorithm and smoothing technologies. Let f u (S) = min u τ D u τ < X τ, S > +u τ D < X τ, S >, u > is smoothing parameter lgorithm: pproximate Frank-Wolfe lgorithm for DML-eig Parameter: smoothing parameter u >, tolerance value tol, step size α t (, 1), t N Initialization: Set S u 1 S d + with T r(s u 1 ) = 1 for t=1,2,3,... Z u t = argmax{f u (S t )+ < Z, f u (S u t ) >: Z S d +, T r(z) = 1}, that is Z u t = vv T. S u t+1 = (1 α t )S u t + α t Z u t if f u (S u t+1) f u (S u t ) < tol then break The step size need to satisfy: α t =, lim t α t = t N 4.2 cceleration Strategies We can observe that the density part of the matrix is the gram matrix of the samples. So the complexity problem depends on the feature dimension. DML-eig algorithm has reduced the computational 5
6 cost to O(d 2 ). This is due to the reason that they calculate the leading eigenvector instead of the decomposition of the matrix. The constraints are all from the in-equivalence constraints and the computational cost is also proportional to the number of the in-equivalence constraints. However, only few of them should be active based on our formulation. Therefore, we might use the Euclidean distance to prefilter out less in-equivalence constrains before applying the optimization algorithms so that we can accelerate the optimization. nother idea to accelerate the DML-eig algorithm is that whether we can use better initialization with low computational cost. Relevent Component nalysis (RC) [16] is a metric learning method only considering the equivalence constraints with low computational cost. So we explored to use the result from RC as initialization. 5 Experiments 5.1 Dataset and Evaluation Criteria We experiment with 3 UCI dataset, iris, wine and protein. The number of classes and the feature dimension for each data set is: iris: classes 3,d=4; wine: classes 3, d=12; protein: classes 6, d=2; We follow the criteria used by Xing to evaluate the quality of learned metrics in a clustering application.(we use Kmeans with learned metric here) Let c i be the cluster label, ĉ i be the assigned label by an automatic clustering algorithm. ccuracy = i>j 1{1{c i = c j } = 1{ĉ i = ĉ j }}.5m(m 1) where 1 is the indicator function. ll the experiment code is released in Comparison of different optimization technologies Here we compare the performance of different optimization algorithms in learning metric. Xing s method is from his released implementation [1]. We implemented SDP using the open source Sedumi solver [9]. We implemented DML-eig algorithm by matlab. The baseline is using Euclidean distance. From the result, we can see that both SDP and DML-eig achieves better performance than Xing s optimization algorithm. 6
7 ccuracy (ratio=.9) 1 Baseline(Euclidean) Newton Iterative projected gradient SDP DEig Iris Wine Protein Figure 1: ccuracy for different optimization technologies. To better visualize the result, we also show the distance matrix on protein data. It actually includes 6 clusters. This is not clear in the Euclidean distance matrix but more clear pattern is shown using the learned matrix x Euclidean Newton IPG DML-eig Figure 2: Distance matrix over different distance functions 5.3 Result for ccelartion strategies 1. We explored to use RC as initialization for DML-eig algorithm. Unfortunately, we didn t reduce the iteration number. 2.We explored to filter the negative constraints by selecting those with smaller Euclidean distance. The result is shown in the figure below. Both the semidefinite programming algorithm and the DML-eig algorithm converge with fewer iterations while the performance is only slightly affect. 7
8 Iteration Number of SDP (tolerance=1e 6) original sampled 12 Iteration Number of Deig on wine data original sampled IterNum IterNum dataset: iris, wine, protein 1e 6 1e 8 1e 1 1e 12 1e 6 1e 8 1e 1 tolerance SDP DML-eig Figure 3: Iteration number using sampling strategy References [1] Xing, Eric P., et al. Distance metric learning with application to clustering with side-information. dvances in neural information processing systems. 22. [2] Kilian Q. Weinberger, Lawrence K. Saul,Distance Metric Learning for Large Margin Nearest Neighbor Classification,Journal of Machine Learning Research,1,29, [3] J.Davis, B. Kulis, P. Jain, S. Sra, and I. Dhillon. Information-theoretic metric learning. In Proceedings of the Twenty-Fourth International Conference on Machine Learning, pages 29216, 27. [4] J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov. Neighbourhood component analysis. In dvances in Neural Information Processing Systems 17, 24. [5] Yiming Ying, Peng Li, Distance Metric Learning with Eigenvalue Optimization, Journal of Machine Learning Research, 212 [6] S. Chopra, R. Hadsell, and Y. LeCun. Learning a similarity metric discriminatively with application to face verification. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages , 25. [7] S. C. H. Hoi, W. Liu, M. R. Lyu, and W.-Y. Ma. Learning distance metrics with contextual constraints for image retrieval. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages , 26 [8] T. Kato and N. Nagano. Metric learning for enzyme active-site search. Bioinformatics, 26: , 21. [9] Sturm, J. F. (1999). Using SeDuMi 1.2, a MTLB toolbox for optimization over symmetric cones.optimization Methods and Software, 1112: Special issue on Interior Point Methods. [1] epxing/papers/old_papers/code_metric_online.tar.gz [11] lexandre dspremont,(212) Tutorial: algorithms for Large-Scale Semidefinite Programming [12] Olivier Devolder, Franois Glineur, Yurii Nesterov, First-order methods of smooth convex optimization with inexact oracle, Math. Program., Ser. DOI 1.17/s [13] dspremont,., Banerjee, O., and El Ghaoui, L. (28). First-order methods for sparse covariance selection., SIM Journal on Matrix nalysis and its pplications, 3(1):5666. [14] Steven J. Benson, Yinyu Ye, and Xiong Zhang, Solving large-scale sparse semidefinite programs for combinatorial optimization, SIM J. Optim., 1(2), (19 pages) [15] lexandre dspremont,(211) Subsampling algorithms for Semidefinite Programming,Technical report arxiv:83.199v6 [16] haron Bar-Hillel, Tomer Hertz,Noam Shental, Daphna Weinshall, Learning a Mahalanobis Metric from Equivalence Constraints, Journal of Machine Learning Research 6 (25) [17] Lieven Vandenberghe,Stephen P. Boyd, Semidefinite Programming, SIM review, 38(1):49-95, March
9 ppendix Newton method by Xing Xing proved that the original optimization problem is equivalent to minimize the function: g() = x i x j 2 c log( x i x j ) ( ) When is diagonal matrix, Xing points that it can be solved by Newton method cheaply. We derived the gradient and hessian matrix to consider the computation complexity. We define a = [ 11, 22,, nn] T. dist(x i, x j ) = ((x i x j ) T a (x i x j )) g = g(11, 22,, nn) a 2 H = 2 g( 11, 22,, nn ) 2 a The update step is distderive1(x i, x j ) =.5 (x i x j ) 2 dist(x i, x j ) distderive2(x i, x j ) =.25 (x i x j ) 4 sumddist = sumdderive1 = sumdderive2 = = dist(x i, x j ) 3 dist(x i, x j) distderive1(x i, x j ) distderive2(x i, x j) (x i x j) 2 C sumdderive1 sumddist = C [ sumdderive2 sumddist a = a t [ 2 H] 1 g, t is stepsize sumdderive1t sumdderive1 sumddist 2 ] (8) There are n parameters in a and they are separable. So the computational cost is around O(n). But when is full rank, n 2 parameter requires O(n 6 ) time to invert the hessian matrix. The computational cost is too expensive and unacceptable. Projected gradient search by Xing For full rank matrix, Xing proposed to use iterative projected gradient search to solve the optimization problem efficiently. The problem is posed as the equivalent form as below: max g() = x i x j s.t. f() = x i x j 2 1 The algorithm takes the gradient step on g() and then repeatedly project into the sets C 1 = { : x i x j 2 1} and C 2 = { : }. The algorithm is shown below: (9) Iterate Iterate projection := P C1 () := P C2 () until converges := + t ( g()) until convergence f() 9
10 The projection to set C 1 is to solve the optimization problem: min s.t. 2 F x i x j 2 1 (1) Considering the dual problem. We define X s matrix as <, B >= T r( T B) = x i x j 2. We denote the inner product of The Lagrangian function:l(, u) = 2 F + u( x i x j 1) L(, u) = 2( ) + ux s = =.5uX s + (11) g(u) = min L(, u) =.5uX s 2 F + u(< X s, +.5uX s > 1) =.75u 2 T r(s T S) + u(t r(s T ) 1) (12) The dual problem is: max g(u), u R. g(u) =, we can get: u = T r(xt s ) T r(x T s X s ) (13) Using KKT condition, using (6) and (8), we can get: = < X s, > X s 2 X s + F Considering projection to set C 2: = U T ΣU Σ + = max(, Σ) = U T Σ + U Summary From the illustration of the projection step, we can see that the projection to set C 1 has analytical solution. X s can be pre-computed and stored. The only cost is matrix multiplication and the projection step is cheap. The main cost of projection to C 2 is matrix decomposition, which is usually O(n 3 ) time complexity. 1
Distance metric learning: A two-phase approach
ESANN 07 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 6-8 April 07, i6doc.com publ., ISBN 978-8758709-. Distance metric
More informationRegularized Large Margin Distance Metric Learning
2016 IEEE 16th International Conference on Data Mining Regularized Large Margin Distance Metric Learning Ya Li, Xinmei Tian CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application
More informationGlobal Metric Learning by Gradient Descent
Global Metric Learning by Gradient Descent Jens Hocke and Thomas Martinetz University of Lübeck - Institute for Neuro- and Bioinformatics Ratzeburger Allee 160, 23538 Lübeck, Germany hocke@inb.uni-luebeck.de
More informationConstrained Metric Learning via Distance Gap Maximization
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Constrained Metric Learning via Distance Gap Maximization Wei Liu, Xinmei Tian, Dacheng Tao School of Computer Engineering
More informationConvex Optimization MLSS 2015
Convex Optimization MLSS 2015 Constantine Caramanis The University of Texas at Austin The Optimization Problem minimize : f (x) subject to : x X. The Optimization Problem minimize : f (x) subject to :
More informationMTTS1 Dimensionality Reduction and Visualization Spring 2014, 5op Jaakko Peltonen
MTTS1 Dimensionality Reduction and Visualization Spring 2014, 5op Jaakko Peltonen Lecture 9: Metric Learning Motivation metric learning Metric learning means learning a better metric (better distance function)
More informationFast Solvers and Efficient Implementations for Distance Metric Learning
Fast Solvers and Efficient Implementations for Distance Metric Learning Kilian Q. Weinberger kilian@yahoo-inc.com Yahoo! Research, 2821 Mission College Blvd, Santa Clara, CA 9505 Lawrence K. Saul saul@cs.ucsd.edu
More informationLearning Models of Similarity: Metric and Kernel Learning. Eric Heim, University of Pittsburgh
Learning Models of Similarity: Metric and Kernel Learning Eric Heim, University of Pittsburgh Standard Machine Learning Pipeline Manually-Tuned Features Machine Learning Model Desired Output for Task Features
More informationLocally Smooth Metric Learning with Application to Image Retrieval
Locally Smooth Metric Learning with Application to Image Retrieval Hong Chang Xerox Research Centre Europe 6 chemin de Maupertuis, Meylan, France hong.chang@xrce.xerox.com Dit-Yan Yeung Hong Kong University
More informationSemi-Supervised Distance Metric Learning for Collaborative Image Retrieval
Semi-Supervised Distance Metric Learning for Collaborative Image Retrieval Steven C.H. Hoi School of Computer Engineering Nanyang Technological University chhoi@ntu.edu.sg Wei Liu and Shih-Fu Chang Department
More informationMetric Learning Applied for Automatic Large Image Classification
September, 2014 UPC Metric Learning Applied for Automatic Large Image Classification Supervisors SAHILU WENDESON / IT4BI TOON CALDERS (PhD)/ULB SALIM JOUILI (PhD)/EuraNova Image Database Classification
More informationA Unified Framework to Integrate Supervision and Metric Learning into Clustering
A Unified Framework to Integrate Supervision and Metric Learning into Clustering Xin Li and Dan Roth Department of Computer Science University of Illinois, Urbana, IL 61801 (xli1,danr)@uiuc.edu December
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationDavid G. Luenberger Yinyu Ye. Linear and Nonlinear. Programming. Fourth Edition. ö Springer
David G. Luenberger Yinyu Ye Linear and Nonlinear Programming Fourth Edition ö Springer Contents 1 Introduction 1 1.1 Optimization 1 1.2 Types of Problems 2 1.3 Size of Problems 5 1.4 Iterative Algorithms
More informationLearning Compact and Effective Distance Metrics with Diversity Regularization. Pengtao Xie. Carnegie Mellon University
Learning Compact and Effective Distance Metrics with Diversity Regularization Pengtao Xie Carnegie Mellon University 1 Distance Metric Learning Similar Dissimilar Distance Metric Wide applications in retrieval,
More informationInformation-Theoretic Metric Learning
Jason V. Davis Brian Kulis Prateek Jain Suvrit Sra Inderjit S. Dhillon Dept. of Computer Science, University of Texas at Austin, Austin, TX 7872 Abstract In this paper, we present an information-theoretic
More informationClassification with Partial Labels
Classification with Partial Labels Nam Nguyen, Rich Caruana Cornell University Department of Computer Science Ithaca, New York 14853 {nhnguyen, caruana}@cs.cornell.edu ABSTRACT In this paper, we address
More informationon learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015
on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 Vector visual representation Fixed-size image representation High-dim (100 100,000) Generic, unsupervised: BoW,
More informationSDLS: a Matlab package for solving conic least-squares problems
SDLS: a Matlab package for solving conic least-squares problems Didier Henrion 1,2 Jérôme Malick 3 June 28, 2007 Abstract This document is an introduction to the Matlab package SDLS (Semi-Definite Least-Squares)
More informationQuasi Cosine Similarity Metric Learning
Quasi Cosine Similarity Metric Learning Xiang Wu, Zhi-Guo Shi and Lei Liu School of Computer and Communication Engineering, University of Science and Technology Beijing, No.30 Xueyuan Road, Haidian District,
More informationMachine Learning: Think Big and Parallel
Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least
More informationConvergence of Multi-Pass Large Margin Nearest Neighbor Metric Learning
Convergence of Multi-Pass Large Margin Nearest Neighbor Metric Learning Christina Göpfert Benjamin Paassen Barbara Hammer CITEC center of excellence Bielefeld University - Germany (This is a preprint of
More informationNeighbourhood Components Analysis
Neighbourhood Components Analysis Jacob Goldberger, Sam Roweis, Geoff Hinton, Ruslan Salakhutdinov Department of Computer Science, University of Toronto {jacob,rsalakhu,hinton,roweis}@cs.toronto.edu Abstract
More informationLarge Margin Component Analysis
DRAFT (November 2, 26) Final version to appear in Proc. NIPS 26 Large Margin Component Analysis Lorenzo Torresani Riya, Inc. lorenzo@riya.com Kuang-chih Lee Riya, Inc. kclee@riya.com Abstract Metric learning
More informationLearning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li
Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,
More informationSemi-supervised Data Representation via Affinity Graph Learning
1 Semi-supervised Data Representation via Affinity Graph Learning Weiya Ren 1 1 College of Information System and Management, National University of Defense Technology, Changsha, Hunan, P.R China, 410073
More informationSupervised Distance Metric Learning
Supervised Distance Metric Learning A Retrospective Nan Xiao Stat. Dept., Central South Univ. Q4 2013 Outline Theory Algorithms Applications What is a metric? A metric is a function. Function satisfies:
More informationInstance-based Learning
Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin
More informationLearning Better Data Representation using Inference-Driven Metric Learning
Learning Better Data Representation using Inference-Driven Metric Learning Paramveer S. Dhillon CIS Deptt., Univ. of Penn. Philadelphia, PA, U.S.A dhillon@cis.upenn.edu Partha Pratim Talukdar Search Labs,
More informationConvex Optimizations for Distance Metric Learning and Pattern Classification
Convex Optimizations for Distance Metric Learning and Pattern Classification Kilian Q. Weinberger Department of Computer Science and Engineering Washington University, St. Louis, MO 63130 kilian@seas.wustl.edu
More informationSemi-Supervised Clustering via Learnt Codeword Distances
Semi-Supervised Clustering via Learnt Codeword Distances Dhruv Batra 1 Rahul Sukthankar 2,1 Tsuhan Chen 1 www.ece.cmu.edu/~dbatra rahuls@cs.cmu.edu tsuhan@cmu.edu 1 Carnegie Mellon University 2 Intel Research
More informationA Brief Look at Optimization
A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest
More informationarxiv: v1 [cs.lg] 4 Jul 2014
Improving Performance of Self-Organising Maps with Distance Metric Learning Method. Piotr P loński 1 and Krzysztof Zaremba 1 arxiv:1407.1201v1 [cs.lg] 4 Jul 2014 1 Institute of Radioelectronics, Warsaw
More informationMetric Learning for Large-Scale Image Classification:
Metric Learning for Large-Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Florent Perronnin 1 work published at ECCV 2012 with: Thomas Mensink 1,2 Jakob Verbeek 2 Gabriela Csurka
More informationMachine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013
Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork
More informationLorentzian Distance Classifier for Multiple Features
Yerzhan Kerimbekov 1 and Hasan Şakir Bilge 2 1 Department of Computer Engineering, Ahmet Yesevi University, Ankara, Turkey 2 Department of Electrical-Electronics Engineering, Gazi University, Ankara, Turkey
More informationLinear methods for supervised learning
Linear methods for supervised learning LDA Logistic regression Naïve Bayes PLA Maximum margin hyperplanes Soft-margin hyperplanes Least squares resgression Ridge regression Nonlinear feature maps Sometimes
More information1. Introduction. performance of numerical methods. complexity bounds. structural convex optimization. course goals and topics
1. Introduction EE 546, Univ of Washington, Spring 2016 performance of numerical methods complexity bounds structural convex optimization course goals and topics 1 1 Some course info Welcome to EE 546!
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationUnsupervised learning in Vision
Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual
More informationConstrained Clustering with Interactive Similarity Learning
SCIS & ISIS 2010, Dec. 8-12, 2010, Okayama Convention Center, Okayama, Japan Constrained Clustering with Interactive Similarity Learning Masayuki Okabe Toyohashi University of Technology Tenpaku 1-1, Toyohashi,
More informationConstrained optimization
Constrained optimization A general constrained optimization problem has the form where The Lagrangian function is given by Primal and dual optimization problems Primal: Dual: Weak duality: Strong duality:
More informationPARALLEL CLASSIFICATION ALGORITHMS
PARALLEL CLASSIFICATION ALGORITHMS By: Faiz Quraishi Riti Sharma 9 th May, 2013 OVERVIEW Introduction Types of Classification Linear Classification Support Vector Machines Parallel SVM Approach Decision
More informationEfficient Iterative Semi-supervised Classification on Manifold
. Efficient Iterative Semi-supervised Classification on Manifold... M. Farajtabar, H. R. Rabiee, A. Shaban, A. Soltani-Farani Sharif University of Technology, Tehran, Iran. Presented by Pooria Joulani
More informationSemi-Supervised Clustering with Partial Background Information
Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject
More informationOutlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data
Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University
More informationStepwise Metric Adaptation Based on Semi-Supervised Learning for Boosting Image Retrieval Performance
Stepwise Metric Adaptation Based on Semi-Supervised Learning for Boosting Image Retrieval Performance Hong Chang & Dit-Yan Yeung Department of Computer Science Hong Kong University of Science and Technology
More informationSparse and large-scale learning with heterogeneous data
Sparse and large-scale learning with heterogeneous data February 15, 2007 Gert Lanckriet (gert@ece.ucsd.edu) IEEE-SDCIS In this talk Statistical machine learning Techniques: roots in classical statistics
More informationExploiting Low-Rank Structure in Semidenite Programming by Approximate Operator Splitting
Exploiting Low-Rank Structure in Semidenite Programming by Approximate Operator Splitting Mario Souto, Joaquim D. Garcia and Álvaro Veiga June 26, 2018 Outline Introduction Algorithm Exploiting low-rank
More informationFast Low-Rank Semidefinite Programming for Embedding and Clustering
Fast Low-Rank Semidefinite Programming for Embedding and Clustering Brian Kulis Department of Computer Sciences University of Texas at Austin Austin, TX 78759 USA kulis@cs.utexas.edu Arun C. Surendran
More informationA Gradient-based Metric Learning Algorithm for k-nn Classifiers
A Gradient-based Metric Learning Algorithm for k-nn Classifiers Nayyar A. Zaidi 1, David McG. Squire 1, and David Suter 2 1 Clayton School of Information Technology, Monash University, VIC 38, Australia,
More informationSupport Vector Machines.
Support Vector Machines srihari@buffalo.edu SVM Discussion Overview 1. Overview of SVMs 2. Margin Geometry 3. SVM Optimization 4. Overlapping Distributions 5. Relationship to Logistic Regression 6. Dealing
More informationOverview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010
INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,
More informationSDLS: a Matlab package for solving conic least-squares problems
SDLS: a Matlab package for solving conic least-squares problems Didier Henrion, Jérôme Malick To cite this version: Didier Henrion, Jérôme Malick. SDLS: a Matlab package for solving conic least-squares
More informationarxiv: v2 [cs.ds] 2 May 2018
arxiv:1610.05710v2 [cs.ds] 2 May 2018 Feasibility Based Large Margin Nearest Neighbor Metric Learning Babak Hosseini 1 and Barbara Hammer 1 CITEC centre of excellence, Bielefeld University Bielefeld, Germany
More informationConvex Optimization. Lijun Zhang Modification of
Convex Optimization Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Modification of http://stanford.edu/~boyd/cvxbook/bv_cvxslides.pdf Outline Introduction Convex Sets & Functions Convex Optimization
More informationComparison of Interior Point Filter Line Search Strategies for Constrained Optimization by Performance Profiles
INTERNATIONAL JOURNAL OF MATHEMATICS MODELS AND METHODS IN APPLIED SCIENCES Comparison of Interior Point Filter Line Search Strategies for Constrained Optimization by Performance Profiles M. Fernanda P.
More informationIntroduction to Machine Learning
Introduction to Machine Learning Maximum Margin Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationThe Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem
Int. J. Advance Soft Compu. Appl, Vol. 9, No. 1, March 2017 ISSN 2074-8523 The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Loc Tran 1 and Linh Tran
More informationKernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Kernels + K-Means Matt Gormley Lecture 29 April 25, 2018 1 Reminders Homework 8:
More informationAll lecture slides will be available at CSC2515_Winter15.html
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many
More informationContents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.
page v Preface xiii I Basics 1 1 Optimization Models 3 1.1 Introduction... 3 1.2 Optimization: An Informal Introduction... 4 1.3 Linear Equations... 7 1.4 Linear Optimization... 10 Exercises... 12 1.5
More information570 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 2, FEBRUARY 2011
570 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 2, FEBRUARY 2011 Contextual Object Localization With Multiple Kernel Nearest Neighbor Brian McFee, Student Member, IEEE, Carolina Galleguillos, Student
More informationINF 4300 Classification III Anne Solberg The agenda today:
INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15
More informationMetric Learning for Large Scale Image Classification:
Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Thomas Mensink 1,2 Jakob Verbeek 2 Florent Perronnin 1 Gabriela Csurka 1 1 TVPA - Xerox Research Centre
More informationResearch Interests Optimization:
Mitchell: Research interests 1 Research Interests Optimization: looking for the best solution from among a number of candidates. Prototypical optimization problem: min f(x) subject to g(x) 0 x X IR n Here,
More informationMathematical Themes in Economics, Machine Learning, and Bioinformatics
Western Kentucky University From the SelectedWorks of Matt Bogard 2010 Mathematical Themes in Economics, Machine Learning, and Bioinformatics Matt Bogard, Western Kentucky University Available at: https://works.bepress.com/matt_bogard/7/
More informationLearning Anisotropic RBF Kernels
Learning Anisotropic RBF Kernels Fabio Aiolli and Michele Donini University of Padova - Department of Mathematics Via Trieste, 63, 35121 Padova - Italy {aiolli,mdonini}@math.unipd.it Abstract. We present
More informationI How does the formulation (5) serve the purpose of the composite parameterization
Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)
More informationParallel and Distributed Sparse Optimization Algorithms
Parallel and Distributed Sparse Optimization Algorithms Part I Ruoyu Li 1 1 Department of Computer Science and Engineering University of Texas at Arlington March 19, 2015 Ruoyu Li (UTA) Parallel and Distributed
More informationML Detection via SDP Relaxation
ML Detection via SDP Relaxation Junxiao Song and Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) ELEC 5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Spring 2018 http://vllab.ee.ntu.edu.tw/dlcv.html (primary) https://ceiba.ntu.edu.tw/1062dlcv (grade, etc.) FB: DLCV Spring 2018 Yu Chiang Frank Wang 王鈺強, Associate Professor
More informationHeterogeneous Multi-Metric Learning for Multi-Sensor Fusion
14th International Conference on Information Fusion Chicago, Illinois, USA, July 5-8, 2011 Heterogeneous Multi-Metric Learning for Multi-Sensor Fusion Haichao Zhang Beckman Institute University of Illinois
More informationConic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding
Conic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding B. O Donoghue E. Chu N. Parikh S. Boyd Convex Optimization and Beyond, Edinburgh, 11/6/2104 1 Outline Cone programming Homogeneous
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationAlternating Projections
Alternating Projections Stephen Boyd and Jon Dattorro EE392o, Stanford University Autumn, 2003 1 Alternating projection algorithm Alternating projections is a very simple algorithm for computing a point
More informationLogistic Regression
Logistic Regression ddebarr@uw.edu 2016-05-26 Agenda Model Specification Model Fitting Bayesian Logistic Regression Online Learning and Stochastic Optimization Generative versus Discriminative Classifiers
More informationAn Efficient Algorithm for Local Distance Metric Learning
An Efficient Algorithm for Local Distance Metric Learning Liu Yang and Rong Jin Michigan State University Dept. of Computer Science & Engineering East Lansing, MI 48824 {yangliu1, rongjin}@cse.msu.edu
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationProgramming, numerics and optimization
Programming, numerics and optimization Lecture C-4: Constrained optimization Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428 June
More informationKeyword Extraction by KNN considering Similarity among Features
64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,
More informationLinear Discriminant Functions: Gradient Descent and Perceptron Convergence
Linear Discriminant Functions: Gradient Descent and Perceptron Convergence The Two-Category Linearly Separable Case (5.4) Minimizing the Perceptron Criterion Function (5.5) Role of Linear Discriminant
More informationarxiv: v1 [stat.ml] 3 Jan 2012
Random Forests for Metric Learning with Implicit Pairwise Position Dependence arxiv:1201.0610v1 [stat.ml] 3 Jan 2012 Caiming Xiong, David Johnson, Ran Xu and Jason J. Corso Department of Computer Science
More informationImproving Image Segmentation Quality Via Graph Theory
International Symposium on Computers & Informatics (ISCI 05) Improving Image Segmentation Quality Via Graph Theory Xiangxiang Li, Songhao Zhu School of Automatic, Nanjing University of Post and Telecommunications,
More informationPerformance Evaluation of an Interior Point Filter Line Search Method for Constrained Optimization
6th WSEAS International Conference on SYSTEM SCIENCE and SIMULATION in ENGINEERING, Venice, Italy, November 21-23, 2007 18 Performance Evaluation of an Interior Point Filter Line Search Method for Constrained
More informationUnsupervised Learning
Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised
More informationData Mining in Bioinformatics Day 1: Classification
Data Mining in Bioinformatics Day 1: Classification Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institute Tübingen and Eberhard Karls
More informationAn efficient algorithm for rank-1 sparse PCA
An efficient algorithm for rank- sparse PCA Yunlong He Georgia Institute of Technology School of Mathematics heyunlong@gatech.edu Renato Monteiro Georgia Institute of Technology School of Industrial &
More informationKernel Methods & Support Vector Machines
& Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector
More informationActive Sampling for Constrained Clustering
Paper: Active Sampling for Constrained Clustering Masayuki Okabe and Seiji Yamada Information and Media Center, Toyohashi University of Technology 1-1 Tempaku, Toyohashi, Aichi 441-8580, Japan E-mail:
More informationPerceptron as a graph
Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2
More informationDiscriminative Clustering for Image Co-segmentation
Discriminative Clustering for Image Co-segmentation Armand Joulin Francis Bach Jean Ponce INRIA Ecole Normale Supérieure, Paris January 2010 Introduction Introduction Task: dividing simultaneously q images
More informationNon-Parametric Kernel Learning with Robust Pairwise Constraints
Noname manuscript No. (will be inserted by the editor) Non-Parametric Kernel Learning with Robust Pairwise Constraints Changyou Chen Junping Zhang Xuefang He Zhi-Hua Zhou Received: date / Accepted: date
More informationVisualizing pairwise similarity via semidefinite programming
Visualizing pairwise similarity via semidefinite programming Amir Globerson Computer Science and Artificial Intelligence Laboratory MIT Cambridge, MA 02139 gamir@csail.mit.edu Sam Roweis Department of
More informationMachine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016
Machine Learning 10-701, Fall 2016 Nonparametric methods for Classification Eric Xing Lecture 2, September 12, 2016 Reading: 1 Classification Representing data: Hypothesis (classifier) 2 Clustering 3 Supervised
More informationIE598 Big Data Optimization Summary Nonconvex Optimization
IE598 Big Data Optimization Summary Nonconvex Optimization Instructor: Niao He April 16, 2018 1 This Course Big Data Optimization Explore modern optimization theories, algorithms, and big data applications
More informationSPECTRAL SPARSIFICATION IN SPECTRAL CLUSTERING
SPECTRAL SPARSIFICATION IN SPECTRAL CLUSTERING Alireza Chakeri, Hamidreza Farhidzadeh, Lawrence O. Hall Department of Computer Science and Engineering College of Engineering University of South Florida
More informationComputational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions
Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Thomas Giraud Simon Chabot October 12, 2013 Contents 1 Discriminant analysis 3 1.1 Main idea................................
More informationAdvanced Topics In Machine Learning Project Report : Low Dimensional Embedding of a Pose Collection Fabian Prada
Advanced Topics In Machine Learning Project Report : Low Dimensional Embedding of a Pose Collection Fabian Prada 1 Introduction In this project we present an overview of (1) low dimensional embedding,
More informationFace2Face Comparing faces with applications Patrick Pérez. Inria, Rennes 2 Oct. 2014
Face2Face Comparing faces with applications Patrick Pérez Inria, Rennes 2 Oct. 2014 Outline Metric learning for face comparison Expandable parts model and occlusions Face sets comparison Identity-based
More information