Large-Scale Face Manifold Learning

Similar documents
Large-scale Manifold Learning

Non-linear dimension reduction

Dimension Reduction CS534

Locally Linear Landmarks for large-scale manifold learning

Modelling and Visualization of High Dimensional Data. Sample Examination Paper

Robust Pose Estimation using the SwissRanger SR-3000 Camera

Isometric Mapping Hashing

Manifold Learning for Video-to-Video Face Recognition

Data fusion and multi-cue data matching using diffusion maps

Image Similarities for Learning Video Manifolds. Selen Atasoy MICCAI 2011 Tutorial

CIE L*a*b* color model

Locality Preserving Projections (LPP) Abstract

Manifold Clustering. Abstract. 1. Introduction

The Analysis of Parameters t and k of LPP on Several Famous Face Databases

Hashing with Graphs. Sanjiv Kumar (Google), and Shih Fu Chang (Columbia) June, 2011

Applications Video Surveillance (On-line or off-line)

Dimension reduction for hyperspectral imaging using laplacian eigenmaps and randomized principal component analysis:midyear Report

Dimension Reduction of Image Manifolds

Globally and Locally Consistent Unsupervised Projection

Technical Report. Title: Manifold learning and Random Projections for multi-view object recognition

Locality Preserving Projections (LPP) Abstract

Learning a Manifold as an Atlas Supplementary Material

Multidimensional scaling Based in part on slides from textbook, slides of Susan Holmes. October 10, Statistics 202: Data Mining

Feature selection. Term 2011/2012 LSI - FIB. Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/ / 22

Selecting Models from Videos for Appearance-Based Face Recognition

The Pre-Image Problem in Kernel Methods

Unsupervised Learning

Global versus local methods in nonlinear dimensionality reduction

Locally Linear Landmarks for Large-Scale Manifold Learning

Dimension reduction for hyperspectral imaging using laplacian eigenmaps and randomized principal component analysis

Face detection and recognition. Many slides adapted from K. Grauman and D. Lowe

Dimension reduction : PCA and Clustering

Face Recognition using Tensor Analysis. Prahlad R. Enuganti

Spectral Clustering X I AO ZE N G + E L HA M TA BA S SI CS E CL A S S P R ESENTATION MA RCH 1 6,

School of Computer and Communication, Lanzhou University of Technology, Gansu, Lanzhou,730050,P.R. China

Appearance Manifold of Facial Expression

Head Frontal-View Identification Using Extended LLE

DIMENSION REDUCTION FOR HYPERSPECTRAL DATA USING RANDOMIZED PCA AND LAPLACIAN EIGENMAPS

CSE 6242 / CX October 9, Dimension Reduction. Guest Lecturer: Jaegul Choo

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

Remote Sensing Data Classification Using Combined Spectral and Spatial Local Linear Embedding (CSSLE)

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components

Differential Structure in non-linear Image Embedding Functions

Linear local tangent space alignment and application to face recognition

CSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo

Collaborative filtering based on a random walk model on a graph

Lecture Topic Projects

Using Graph Model for Face Analysis

Recognition: Face Recognition. Linda Shapiro EE/CSE 576

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\

Linear and Non-linear Dimentionality Reduction Applied to Gene Expression Data of Cancer Tissue Samples

Face Recognition using Laplacianfaces

Cluster Analysis (b) Lijun Zhang

Semi-supervised Learning by Sparse Representation

Non-Local Estimation of Manifold Structure

Local Linear Embedding. Katelyn Stringer ASTR 689 December 1, 2015

SELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM. Olga Kouropteva, Oleg Okun and Matti Pietikäinen

Time and Space Efficient Spectral Clustering via Column Sampling

COMPRESSED DETECTION VIA MANIFOLD LEARNING. Hyun Jeong Cho, Kuang-Hung Liu, Jae Young Park. { zzon, khliu, jaeypark

Application of Spectral Clustering Algorithm

Extended Isomap for Pattern Classification

Global versus local methods in nonlinear dimensionality reduction

A Discriminative Non-Linear Manifold Learning Technique for Face Recognition

Lecture 8 Mathematics of Data: ISOMAP and LLE

Online Subspace Estimation and Tracking from Missing or Corrupted Data

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Locally Linear Landmarks for Large-Scale Manifold Learning

A New Orthogonalization of Locality Preserving Projection and Applications

Learning an Affine Transformation for Non-linear Dimensionality Reduction

A Taxonomy of Semi-Supervised Learning Algorithms

Recognizing Handwritten Digits Using the LLE Algorithm with Back Propagation

Generalized Autoencoder: A Neural Network Framework for Dimensionality Reduction

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Stratified Structure of Laplacian Eigenmaps Embedding

Advanced Machine Learning Practical 2: Manifold Learning + Clustering (Spectral Clustering and Kernel K-Means)

Numerical Analysis and Statistics on Tensor Parameter Spaces

Face detection and recognition. Detection Recognition Sally

Understanding Faces. Detection, Recognition, and. Transformation of Faces 12/5/17

Manifold Learning Theory and Applications

Hyperspectral image segmentation using spatial-spectral graphs

APPROXIMATE SPECTRAL LEARNING USING NYSTROM METHOD. Aleksandar Trokicić

Iterative Non-linear Dimensionality Reduction by Manifold Sculpting

Classification Performance related to Intrinsic Dimensionality in Mammographic Image Analysis

无监督学习中的选代表和被代表问题 - AP & LLE 张响亮. Xiangliang Zhang. King Abdullah University of Science and Technology. CNCC, Oct 25, 2018 Hangzhou, China

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma

A Supervised Non-linear Dimensionality Reduction Approach for Manifold Learning

Alternative Statistical Methods for Bone Atlas Modelling

Matrix Approximation for Large-scale Learning

Low-dimensional Representations of Hyperspectral Data for Use in CRF-based Classification

Introduction to Machine Learning

A GPU-based Approximate SVD Algorithm Blake Foster, Sridhar Mahadevan, Rui Wang

Non-Local Manifold Tangent Learning

Discovering Shared Structure in Manifold Learning

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

Laplacian Faces: A Face Recognition Tool

Matrix Approximation for Large-scale Learning

GEOMETRIC MANIFOLD APPROXIMATION USING LOCALLY LINEAR APPROXIMATIONS

Data Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\

Large-Scale Graph-based Semi-Supervised Learning via Tree Laplacian Solver

arxiv: v1 [cs.cv] 9 Sep 2013

Transcription:

Large-Scale Face Manifold Learning Sanjiv Kumar Google Research New York, NY * Joint work with A. Talwalkar, H. Rowley and M. Mohri 1

Face Manifold Learning 50 x 50 pixel faces R 2500 50 x 50 pixel random images Space of face images significantly smaller than 256 2500 Want to recover the underlying (possibly nonlinear) space! (Dimensionality Reduction) 2

Dimensionality Reduction Linear Techniques PCA, Classical MDS Assume data lies in a subspace Directions of maximum variance 3 Nonlinear Techniques Manifold learning methods LLE [Roweis & Saul 00] ISOMAP [Tenanbaum et al. 00] Laplacian Eigenmaps [Belkin & Niyogi 01] Assume local linearity of data Need densely sampled data as input Bottleneck: Computational Complexity O(n 3 )!

Outline Manifold Learning ISOMAP Approximate Spectral Decomposition Nystrom and Column-Sampling approximations Large-scale Manifold learning 18M face images from the web Largest study so far ~270 K points People Hopper A Social Application on Orkut 4

ISOMAP [Tanenbaum et al., 00] Find the low-dimensional representation that best preserves geodesic distances between points 5

ISOMAP [Tanenbaum et al., 00] Find the low-dimensional representation that best preserves geodesic distances between points Output co-ordinates Geodesic distance 6 Recovers true manifold asymptotically!

ISOMAP [Tanenbaum et al., 00] Given n input images: Find t nearest neighbors for each image : O(n 2 ) Find shortest path distance for every (i, j), Δ ij : O(n 2 log n) Construct n n matrix G with entries as centered Δ ij 2 G ~ 18M x 18M dense matrix i j Optimal k reduced dims: U k Σ k 1/2 Eigenvectors Eigenvalues O(n 3 )! 7

Spectral Decomposition Need to do eigen-decomposition of symmetric positive semi-definite matrix G O(n 3 ) For, G 1300 TB ~100,000 x 12GB RAM machines Iterative methods [ ] n n Jacobi, Arnoldi, Hebbian [Golub & Loan, 83][Gorell, 06] Need matrix-vector products and several passes over data Not suitable for large dense matrices Sampling-based methods Column-Sampling Approximation [Frieze et al., 98] Nystrom Approximation [Williams & Seeger, 00] Relationship and comparative performance? 8

Approximate Spectral Decomposition Sample l columns randomly without replacement l l C Column-Sampling Approximation SVD of C Nystrom Approximation SVD of W [Frieze et al., 98] [Williams & Seeger, 00][Drineas & Mahony, 05] 9

10 Column-Sampling Approximation

11 Column-Sampling Approximation

Column-Sampling Approximation [n l ] O(nl 2 )! [l l ] O(l 3 )! 12

Nystrom Approximation l l C 13

Nystrom Approximation l l C O(l 3 )! 14

Nystrom Approximation l l C O(l 3 )! Not Orthonormal! 15

Nystrom Vs Column-Sampling Experimental Comparison A random set of 7K face images Eigenvalues, eigenvectors, and low-rank approximations 16 [Kumar, Mohri & Talwalkar, ICML 09]

Eigenvalues Comparison % deviation from exact 17

Eigenvectors Comparison Principal angle with exact 18

Low-Rank Approximations Nystrom gives better reconstruction than Col-Sampling! 19

20 Low-Rank Approximations

21 Low-Rank Approximations

Orthogonalized Nystrom Nystrom-orthogonal gives worse reconstruction than Nystrom! 22

23 Low-Rank Approximations Matrix Projection

24 Low-Rank Approximations Matrix Projection

Low-Rank Approximations Matrix Projection G col = C C T C G nys = C l n W 2 C T G ( ) 1 C T G 25

Low-Rank Approximations Matrix Projection 26 Col-Sampling gives better Reconstruction than Nystrom! Theoretical guarantees in special cases [Kumar et al., ICML 09]

How many columns are needed? Columns needed to get 75% relative accuracy 27 Sampling Methods Theoretical analysis of uniform sampling method Adaptive sampling methods Ensemble sampling methods [Kumar et al., NIPS 09] [Kumar et al., AISTATS 09] [Deshpande et al. FOCS 06] [Kumar et al., ICML 09]

So Far Manifold Learning ISOMAP Approximate Spectral Decomposition Nystrom and Column-Sampling approximations Large-scale Face Manifold learning 18 M face images from the web People Hopper A Social Application on Orkut 28

Large-Scale Face Manifold Learning Construct Web dataset Extracted 18M faces from 2.5B internet images ~15 hours on 500 machines Faces normalized to zero mean and unit variance Graph construction [Talwalkar, Kumar & Rowley, CVPR 08] Exact search ~3 months (on 500 machines) Approx Nearest Neighbor Spill Trees (5 NN, ~2 days) New methods for hashing based knn search Less than 5 hours! [Liu et al., 04] [CVPR 10] [ICML 10] [ICML 11] 29

Neighborhood Graph Construction Connect each node (face) with its neighbors Is the graph connected? Depth-First-Search to find largest connected component 10 minutes on a single machine Largest component depends on number of NN ( t ) 30

Samples from connected components From Largest Component From Smaller Components 31

Graph Manipulation Approximating Geodesics Shortest paths between pairs of face images Computing for all pairs infeasible O(n 2 log n)! Key Idea: Need only a few columns of G for sampling-based decomposition require shortest paths between a few ( l ) nodes and all other nodes 1 hour on 500 machines (l = 10K) 32 Computing Embeddings (k = 100) Nystrom: 1.5 hours, 500 machine Col-Sampling: 6 hours, 500 machines Projections: 15 mins, 500 machines

18M-Manifold in 2D Nystrom Isomap 33

Shortest Paths on Manifold 18M samples not enough! 34

Summary Large-scale nonlinear dimensionality reduction using manifold learning on 18M face images Fast approximate SVD based on sampling methods Open Questions Does a manifold really exist or data may form clusters in low dimensional subspaces? How much data is really enough? 35

People Hopper A fun social application on Orkut Face manifold constructed with Orkut database Extracted 13M faces from about 146M profile images ~3 days on 50 machines Color face image (40x48 pixels) 5760-dim vector Faces normalized to zero mean and unit variance in intensity space Shortest path search using bidirectional Dijkstra Users can opt-out Daily incremental graph update 36

37 People Hopper Interface

38 From the Blogs

CMU-PIE Dataset 68 people, 13 poses, 43 illuminations, 4 expressions 35,247 faces detected by a face detector Classification and clustering on poses 39

Clustering K-means clustering after transformation (k = 100) K fixed to be the same as number of classes Two metrics Purity - points within a cluster come from the same class Accuracy - points from a class form a single cluster 40 Matrix G is not guaranteed to be positive semi-definite in Isomap! - Nystrom: EVD of W (can ignore negative eigenvalues) - Col-sampling: SVD of C (signs are lost)!

41 Optimal 2D embeddings

Laplacian Eigenmaps Minimize weighted distances between neighbors Find t nearest neighbors for each image : O(n 2 ) Compute weight matrix W: [Belkin & Niyogi, 01] Compute normalized laplacian where Optimal k reduced dims: U k O(n 3 ) Bottom eigenvectors of G 42

43 Different Sampling Procedures