Advanced Topics In Machine Learning Project Report : Low Dimensional Embedding of a Pose Collection Fabian Prada

Similar documents
Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

3 No-Wait Job Shops with Variable Processing Times

A METHOD FOR CONTENT-BASED SEARCHING OF 3D MODEL DATABASES

Visual Representations for Machine Learning

Surface Registration. Gianpaolo Palma

Mathematical and Algorithmic Foundations Linear Programming and Matchings

11 Linear Programming

Automorphism Groups of Cyclic Polytopes

10-701/15-781, Fall 2006, Final

calibrated coordinates Linear transformation pixel coordinates

/10/$ IEEE 4048

Modern Multidimensional Scaling

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic

Lecture 2 September 3

6. Lecture notes on matroid intersection

Robust Shape Retrieval Using Maximum Likelihood Theory

Modern Multidimensional Scaling

c 2004 Society for Industrial and Applied Mathematics

Convexity: an introduction

Integer Programming Theory

CS522: Advanced Algorithms

However, this is not always true! For example, this fails if both A and B are closed and unbounded (find an example).

arxiv: v1 [cs.ni] 28 Apr 2015

The Discrete Surface Kernel: Framework and Applications

1 Linear Programming. 1.1 Optimizion problems and convex polytopes 1 LINEAR PROGRAMMING

9 Connectivity. Contents. 9.1 Vertex Connectivity

Generalized trace ratio optimization and applications

Metric Learning for Large-Scale Image Classification:

Using Subspace Constraints to Improve Feature Tracking Presented by Bryan Poling. Based on work by Bryan Poling, Gilad Lerman, and Arthur Szlam

60 2 Convex sets. {x a T x b} {x ã T x b}

Conic Duality. yyye

Feature selection. Term 2011/2012 LSI - FIB. Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/ / 22

Gromov-Hausdorff distances in Euclidean Spaces. Facundo Mémoli

Lecture notes on the simplex method September We will present an algorithm to solve linear programs of the form. maximize.

Template Matching Rigid Motion. Find transformation to align two images. Focus on geometric features

Planarity. 1 Introduction. 2 Topological Results

DM545 Linear and Integer Programming. Lecture 2. The Simplex Method. Marco Chiarandini

Semistandard Young Tableaux Polytopes. Sara Solhjem Joint work with Jessica Striker. April 9, 2017

Fuzzy Set-Theoretical Approach for Comparing Objects with Fuzzy Attributes

Globally and Locally Consistent Unsupervised Projection

General Instructions. Questions

Foundations of Computing

Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery

THE COMPUTER MODELLING OF GLUING FLAT IMAGES ALGORITHMS. Alekseí Yu. Chekunov. 1. Introduction

Polytopes Course Notes

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 11

MATH 423 Linear Algebra II Lecture 17: Reduced row echelon form (continued). Determinant of a matrix.

Machine Learning for OR & FE

A Generic Separation Algorithm and Its Application to the Vehicle Routing Problem

1. Lecture notes on bipartite matching

CS 664 Structure and Motion. Daniel Huttenlocher

Unsupervised Learning

CSC Linear Programming and Combinatorial Optimization Lecture 12: Semidefinite Programming(SDP) Relaxation

Recent Developments in Model-based Derivative-free Optimization

Collaborative filtering based on a random walk model on a graph

Discriminative Clustering for Image Co-segmentation

Monotone Paths in Geometric Triangulations

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

Support Vector Machines.

Invariant shape similarity. Invariant shape similarity. Invariant similarity. Equivalence. Equivalence. Equivalence. Equal SIMILARITY TRANSFORMATION

Adaptive Binary Quantization for Fast Nearest Neighbor Search

1 Linear programming relaxation

Large-Scale Face Manifold Learning

Lecture 25 Nonlinear Programming. November 9, 2009

Lecture 3: Camera Calibration, DLT, SVD

TELCOM2125: Network Science and Analysis

A New Shape Matching Measure for Nonlinear Distorted Object Recognition

POLYHEDRAL GEOMETRY. Convex functions and sets. Mathematical Programming Niels Lauritzen Recall that a subset C R n is convex if

Discriminative Clustering for Image Co-segmentation

The Farthest Point Delaunay Triangulation Minimizes Angles

Segmentation: Clustering, Graph Cut and EM

FINDING a subset of a large number of models or data

Bilinear Programming

Hierarchical Shape Classification Using Bayesian Aggregation

Clustering will not be satisfactory if:

Crossing Families. Abstract

Non-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions

THE COMPUTER MODELLING OF GLUING FLAT IMAGES ALGORITHMS. Alekseí Yu. Chekunov. 1. Introduction

arxiv: v5 [cs.dm] 9 May 2016

Locally convex topological vector spaces

Aarti Singh. Machine Learning / Slides Courtesy: Eric Xing, M. Hein & U.V. Luxburg

Graph drawing in spectral layout

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM

On the Generic Rigidity of Bar-Frameworks

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching

Recognizing Handwritten Digits Using the LLE Algorithm with Back Propagation

Mathematical Programming and Research Methods (Part II)

On the positive semidenite polytope rank

METHODS FOR TARGET DETECTION IN SAR IMAGES

Learning a Manifold as an Atlas Supplementary Material

Template Matching Rigid Motion

arxiv: v1 [math.co] 27 Feb 2015

Conic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding

Math 734 Aug 22, Differential Geometry Fall 2002, USC

Lecture 9: Pipage Rounding Method

Learning Compact and Effective Distance Metrics with Diversity Regularization. Pengtao Xie. Carnegie Mellon University

Lecture 1: Introduction

Distance and Collision Detection

Transcription:

Advanced Topics In Machine Learning Project Report : Low Dimensional Embedding of a Pose Collection Fabian Prada 1 Introduction In this project we present an overview of (1) low dimensional embedding, and (2) K-Most Dissimilar Subset detection for a set of samples for which only the matrix M of pairwise dissimilarity is provided. For evaluation (and visual validation) our dissimilarity matrix is computed on a collection of shapes, but the analysis and techniques discussed can be applied to any set that admits a dissimilarity measure. Following the approach of [2], we compute for each shape x i its characteristic function χ i and its euclidean distance transform s i. We define, M ij min gpg xχ i, s j gy xs i, χ j gy (1) where G is the set of rigid transformations in R 3. We must notice that an ideal dissimilarity matrix M should satisfy metric properties 1, i.e,m ij dpx i, x j q for some metric d : X X Ñ R. Our shape dissimilarity 1 satisfy positivity, identity, and symmetry properties but not necessarily triangle inequality 2. Methods based in Lp distance of rotational invariant feature vectors satisfy triangular inequality but tend to have less discriminative power. Given a dissimilarity matrix M we will derive two types of affinity matrices that will be used in the sections below. We define the product-affine matrix A as: A 1 JpM d MqJ (2) 2 1 Metric properties are necessary for M to admit a perfect embedding. 2 In our evaluation less than 8% of all triplets dont satisfy triangular inequality 1

where J I 1 N 11J is a centering operator. We also define the kernelaffine matrix W as: W ij expt M 2 ij 2σ 2 u (3) In contrast to dissimilarity matrices, affinity matrices have large values at entries associated to similar samples. Finally, we denote by K to the k-nearest neighbor mask associated to M. The k-nearest neighbor affinity matrix correspond to W d K. Abusing of notation, we will keep denoting those affinity matrices as W. 2 Low Dimensional Embedding Problem 1: Given dissimilarity M compute Y ty 1,..., y N u R d that minimizes, argmin Y p y i y j M ij q 2 (4) i,j The problem above is non-convex and not admit a close form solution. 2.1 MultiDimension Scaling(MDS) MDS states an alternative problem in terms of inner-product affinities, argmin Y pxy i, y j y a ij q 2 argmin Y Y J Y A 2 F (5) i,j Close form solution to this problem is given by Y J RΣ 1{2 d U d, i.e., an optimal d-rank approxiamtion of A. In equation 2 we formulate the relation between dissimilarity matrix and inner-product affinity matrix. Lets formalize this derivation: 2

Proposition 1. Given x 1,..., x N R D, let x i Uy i µ be its principal componet decomposition. Define M ij x i x j and A ij xy i, y j y. Then A 1 JpM d MqJ 2 Proof. pm d Mq ij M 2 ij x i x j 2 y i y j 2 y i 2 2xy i, y j y y j 2 For any matrix T, pjt Jq ij T ij T i T j T. Since ȳ 0, it can be checked that pm d Mq i y i 2 y 2,pM d Mq j y j 2 y 2, and pm d Mq 2 y 2. Thus 1 2 pjpm d MqJq ij xy i, y j y. From the previous proposition, we obtain the next corollary, Corollary 2. If M is the dissimilarity matrix of points in a d-dimensional affine linear space, and A 1 JpM d MqJ, then MDS provides a perfect 2 embedding (i.e. energy 4 is zero). 2.2 Laplacian Eigenmaps(LE) LE minimize an energy that also admits close form solution, argmin Y w ij y i y j 2 s.t. Y D1 0 and Y DY J I (6) i,j The embedding in this case is given by the second to (d+1)-st eigevectors of the generalized eigenvalue problem, LY J DY J where L D W, and D ii j w ij Large values of w ij enforce proximity between y i and y j, while the constraints avoid a trivial solution. In fact, the following condition holds. Proposition 3. Let ρ i D ii i D ii, and ty iu i the embedding positions. Then the weighted point distribution py i, ρ i q i has zero mean and covariance 1 i D ii I d. For our evaluation we set w ij expt M ij 2 u where σ 2 M 2 2σ 2 ij. We evaluate LE with both normalized and unnormalized constraints and with different k-nearest neighbour masks. 3

2.3 Low Dimensional Embedding Evaluation All the tests in this section were done on a sequence of 2283 frames captured in a continuous performance. Test 1 : MDS Approximation Ratio In this experiment we plot the approximation ratio 1 Y J Y A 2 F as a A 2 F function of the embedding dimension d. For d 3 we get an approximation ratio of 0.72. Since A is not positive definite it can not represented as Y J Y, however for this example we get an optimal approximation error of 0.9809. Figure 1: Approximation Ratio. Test 2 : Scaled Embedding Error Given an embedding Y ty 1,..., y N u we define the optimal scaling λ as argmin λ p λy i λy j M ij q 2 (7) i,j yi y Solution to equation above is given by λ j M ij yi y j. The scaled embedding error is the value of the energy in 7 for the optimal 2 scaling. The following were the results of this test for 3 dimensional embedding on the evaluation sequence 4

Zero Emb. Rand. Emb. MDS LE LE-10NN 5.11e+14 8.78e+13 4.46e+13 4.80e+13 2.37e+14 As expected MDS provided the best embedding error 3. Test 3 : Embedding Qualitative Results In Figure 3 we observe the embedding provided by LE, LE-10NN and MDS. The embedding for LE-10NN looks like a self intersecting curve. This is particularly nice for a pose collection obtained from a continuous motion since consecutive frames are mapped to consecutive points in the curve. However, since the embedding only consider local similarity, very different poses can be (and are indeed ) mapped to close points. On the other hand LE and MDS provided suprisingly similar embeddings. Both embeddings perform a satisfactory job on preserving similar poses close and separating dissimilars. Figure 2: 3D Embedding Results. 3 K-Most Dissimilar Subset The second problem we will consider is identifying a subset of samples that are the most dissimilar among them. We will be considering the following formulation for this problem: Problem 2: Given dissimilarty M, identify a subset of size k that maximize the minimal pairwise distance: 3 By Corollary 2, MDS attains zero error of equation 7 when the dissimilarity matrix is the euclidean distance samples in a d-dimensional subspace. 5

max S rns, S k min M ij (8) i,jps Observe that this problem is combinatorial, and a brute force solution N would require the evaluation of K subsets. This is prohibitively expensive for large values of N and K. Rather than considering the entire set of N samples we propouse to consider only a small subset of samples that make the combinatorial search tractable. The set we will be considering is the convex hull of a low dimensional embedding for d 2, 3. For d 2 and d 3 the convex hull of a set of N samples can be computed in OpN log hq where h is the size of the output [1]. For larger dimensions the complexity is bounded by OpN d{2 q. In Figure 3 we show the percentage of samples (N 2283) that belong to the convex hull as a function of the embedding dimension. For d 2, 3, 4 the number of points in the hull for MDS and LE is roghly 1%, 3% and 10% of the input size. Figure 3: Convex Hull. Left: plot of percentage of vertices in the hull as function of d. Right: convex hull of the 3D embedding for MDS(middle) and LE(right). 6

Test 4: Most dissimilar pair and triplet For our evaluation set, the most dissimilar pair is p963, 1678q and the most dissimilar triplet is p645, 1679, 2283q. The convex hull of the 2-dimensional embeddings provided by LE and MSD (with 19 and 21 samples, resp.) contains the pair p963, 1678q. These convex hulls does not contain the triplet p645, 1679, 2283q but a very close one : p642, 1679, 2283q (see Figure 3). Since the d-dimensional embedding provided by LE and MSD is a projection of the d 1-dimensional embedding (a projection of the first d coordinates), all the samples belonging to the hull of the d-dimensional embedding are still in the hull of the higher one. Thus p963, 1678q is keep in the hull of the 3-dimensional embedding for both methods. Even more, the most dissimilar triplet p645, 1679, 2283q is perfectly contained in the hull of the 3D embedding for both methods. Figure 4: Most dissimilar pair (top) and triplet (bottom). Both are within the vertices of the convex hull of the 2D and 3D embedding provided by LE (depicted in the images) and MDS. 7

Test 5: Greedy K-Most Dissimilar A natural approach to generate k dissimilar poses is following a greedy approach: start by finding the most dissimilar and at each iteration add to set the element that maximize the minimum distance to the already selected poses. It is easy to check that such greedy algorithm has complexity OpNk 2 q. If we use the Greedy K-Most Dissimilar algorithm to identify the 3 most dissimilar triplet we get the result p963, 1678, 803q which has dissimilarity score (i.e., the minima among pairwise similarity) of 2.18e10 8. Instead, using exhaustive search in the convex hull of the 2-dimensional embedding of LE (which evaluates 19 3 triplets), we get the triplet p642, 1680, 2283q which has dissimilarity score 2.76e10 8. Conclusions MDS and LE produced a satisfactory embedding of the pose collection: similar poses where mapped to close points in the embedding, while the most dissimilar poses were well separated. Despite LE and MDS minimize significantly different energy functions, there was a surprisingly strong similarity of the embedding points distributions and samples belonging to the convex hull. This was an unexepected result since there is no direct relation between the LE energy 6 and the optimal emebedding energy 4. Further experimental and theoretical analysis is required to understand the extent of this relation. In this project we propoused a characterization of the most dissimilar subset. Our testing case reinforced the hypothesis that motivated this project: the most dissimilar poses (or a very good approximation ) can be directly obtained from the convex hull of a low dimensional embedding. Additional evaluation and analysis is required to understand under what conditions this is the case. Since the convex hull can be efficiently computed ( OpN log hq for d 2) and outputs a small subset of vertices ( 1% for d 2 in our testing case ) it would be a very useful tool to get a set vertices where the most dissimilar subset computation can be tractable. 8

References [1] C. Bradford Barber, David P. Dobkin, and Hannu Huhdanpaa. The quickhull algorithm for convex hulls. ACM Trans. Math. Softw., 22(4):469 483, December 1996. [2] Thomas Funkhouser, Michael Kazhdan, Philip Shilane, Patrick Min, William Kiefer, Ayellet Tal, Szymon Rusinkiewicz, and David Dobkin. Modeling by example. ACM Trans. Graph., 23(3), 2004. 9