CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

Similar documents
CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

Dimension Reduction CS534

CSC 411: Lecture 12: Clustering

CSC321: Neural Networks. Lecture 13: Learning without a teacher: Autoencoders and Principal Components Analysis. Geoffrey Hinton

CSC 411: Lecture 02: Linear Regression

Unsupervised Learning

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components

CSC 411: Lecture 05: Nearest Neighbors

CSC 411 Lecture 18: Matrix Factorizations

Modelling and Visualization of High Dimensional Data. Sample Examination Paper

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Programming Exercise 7: K-means Clustering and Principal Component Analysis

Unsupervised learning in Vision

Discriminate Analysis

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47

Grundlagen der Künstlichen Intelligenz

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

CSE 481C Imitation Learning in Humanoid Robots Motion capture, inverse kinematics, and dimensionality reduction

Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing

Work 2. Case-based reasoning exercise

HW Assignment 3 (Due by 9:00am on Mar 6)

Network Traffic Measurements and Analysis

Machine Learning for Physicists Lecture 6. Summer 2017 University of Erlangen-Nuremberg Florian Marquardt

Face detection and recognition. Detection Recognition Sally

Lecture 07 Dimensionality Reduction with PCA

Understanding Faces. Detection, Recognition, and. Transformation of Faces 12/5/17

CIE L*a*b* color model

Recognition: Face Recognition. Linda Shapiro EE/CSE 576

Face detection and recognition. Many slides adapted from K. Grauman and D. Lowe

Data preprocessing Functional Programming and Intelligent Algorithms

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders

Feature selection. Term 2011/2012 LSI - FIB. Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/ / 22

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos

Unsupervised Learning

Applied Neuroscience. Columbia Science Honors Program Fall Machine Learning and Neural Networks

CSE 258 Lecture 5. Web Mining and Recommender Systems. Dimensionality Reduction

Lecture Topic Projects

CHAPTER 3 PRINCIPAL COMPONENT ANALYSIS AND FISHER LINEAR DISCRIMINANT ANALYSIS

Linear Discriminant Analysis in Ottoman Alphabet Character Recognition

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

Applications Video Surveillance (On-line or off-line)

A Course in Machine Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

CSE 255 Lecture 5. Data Mining and Predictive Analytics. Dimensionality Reduction

COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS

Data Compression. The Encoder and PCA

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.

Clustering and Dimensionality Reduction

CS325 Artificial Intelligence Ch. 20 Unsupervised Machine Learning

Unsupervised Learning

International Journal of Digital Application & Contemporary research Website: (Volume 1, Issue 8, March 2013)

CS 195-5: Machine Learning Problem Set 5

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

Deep Generative Models Variational Autoencoders

Lecture 4 Face Detection and Classification. Lin ZHANG, PhD School of Software Engineering Tongji University Spring 2018

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Locality Preserving Projections (LPP) Abstract

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction

Locality Preserving Projections (LPP) Abstract

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 1st, 2018

Dimensionality reduction as a defense against evasion attacks on machine learning classifiers

Feature Selection Using Principal Feature Analysis

Data Preprocessing. Chapter 15

Recognition, SVD, and PCA

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP( 1

Image Processing. Image Features

Recognizing Handwritten Digits Using the LLE Algorithm with Back Propagation

Slides adapted from Marshall Tappen and Bryan Russell. Algorithms in Nature. Non-negative matrix factorization

Intro to Artificial Intelligence

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Dimensionality Reduction, including by Feature Selection.

Principal Component Analysis (PCA) is a most practicable. statistical technique. Its application plays a major role in many

Two-view geometry Computer Vision Spring 2018, Lecture 10

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

Facial Expression Detection Using Implemented (PCA) Algorithm

Introduction to Machine Learning Prof. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Announcements. Recognition I. Gradient Space (p,q) What is the reflectance map?

Large-Scale Face Manifold Learning

CSE 40171: Artificial Intelligence. Learning from Data: Unsupervised Learning

ECE 285 Class Project Report

Face Recognition for Mobile Devices

Alternative Statistical Methods for Bone Atlas Modelling

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\

Emotion Classification

FMRI data: Independent Component Analysis (GIFT) & Connectivity Analysis (FNC)

Human Action Recognition Using Independent Component Analysis

Face Recognition Using Eigen-Face Implemented On DSP Processor

Dimension Reduction of Image Manifolds

arxiv: v2 [cs.lg] 6 Jun 2015

Outline. Advanced Digital Image Processing and Others. Importance of Segmentation (Cont.) Importance of Segmentation

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Deep Learning for Computer Vision

Robust Pose Estimation using the SwissRanger SR-3000 Camera

Motion Interpretation and Synthesis by ICA

Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes

Transcription:

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders Raquel Urtasun & Rich Zemel University of Toronto Nov 4, 2015 Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 1 / 18

Today Dimensionality Reduction PCA Autoencoders Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 2 / 18

Mixture models and Distributed Representations One problem with mixture models: each observation assumed to come from one of K prototypes Constraint that only one active (responsibilities sum to one) limits representational power Alternative: Distributed representation, with several latent variables relevant to each observation Can be several binary/discrete variables, or continuous Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 3 / 18

Example: continuous underlying variables What are the intrinsic latent dimensions in these two datasets? How can we find these dimensions from the data? Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 4 / 18

Principal Components Analysis PCA: most popular instance of second main class of unsupervised learning methods, projection methods, aka dimensionality-reduction methods Aim: find small number of directions in input space that explain variation in input data; re-represent data by projecting along those directions Important assumption: variation contains information Data is assumed to be continuous: linear relationship between data and learned representation Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 5 / 18

PCA: Common tool Handles high-dimensional data if data has thousands of dimensions, can be difficult for classifier to deal with Often can be described by much lower dimensional representation Useful for: Visualization Preprocessing Modeling prior for new data Compression Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 6 / 18

PCA: Intuition Assume start with N data vectors, of dimensionality D Aim to reduce dimensionality: linearly project (multiply by matrix) to much lower dimensional space, M << D Search for orthogonal directions in space w/ highest variance project data onto this subspace Structure of data vectors is encoded in sample covariance Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 7 / 18

Finding principal components To find the principal component directions, we center the data (subtract the sample mean from each variable) Calculate the empirical covariance matrix: C = 1 N N (x (n) x)(x (n) x) T n=1 with x the mean What s the dimensionality of x? Find the M eigenvectors with largest eigenvalues of C: these are the principal components Assemble these eigenvectors into a D M matrix U We can now express D-dimensional vectors x by projecting them to M-dimensional z z = U T x Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 8 / 18

Standard PCA Algorithm: to find M components underlying D-dimensional data 1. Select the top M eigenvectors of C (data covariance matrix): N C = 1 N n=1 (x (n) x)(x (n) x) T = UΣU T UΣ 1:M U T 1:M where U: orthogonal, columns = unit-length eigenvectors U T U = UU T = 1 and Σ: matrix with eigenvalues in diagonal = variance in direction of eigenvector 2. Project each input vector x into this subspace, e.g., z j = u T j x; z = U T 1:Mx Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 9 / 18

Two Derivations of PCA Two views/derivations: Maximize variance (scatter of green points) Minimize error (red-green distance per datapoint) Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 10 / 18

PCA: Minimizing Reconstruction Error We can think of PCA as projecting the data onto a lower-dimensional subspace One derivation is that we want to find the projection such that the best linear reconstruction of the data is as close as possible to the original data J = n x (n) x (n) 2 where x (n) = M j=1 z (n) j u j + D j=m+1 b j u j Objective minimized when first M components are the eigenvectors with the maximal eigenvalues z (n) j = (x (n) ) T u j ; b j = x T u j Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 11 / 18

Applying PCA to faces Run PCA on 2429 19x19 grayscale images (CBCL data) Compresses the data: can get good reconstructions with only 3 components PCA for pre-processing: can apply classifier to latent representation PCA w/ 3 components obtains 79% accuracy on face/non-face discrimination in test data vs. 76.8% for m.o.g with 84 states Can also be good for visualization Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 12 / 18

Applying PCA to faces: Learned basis Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 13 / 18

Applying PCA to digits Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 14 / 18

Relation to Neural Networks PCA is closely related to a particular form of neural network An autoencoder is a neural network whose outputs are its own inputs The goal is to minimize reconstruction error Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 15 / 18

Autoencoders Define z = f (W x); ˆx = g(v z) Goal: min W,V 1 2N N x (n) ˆx (n) 2 n=1 If g and f are linear min W,V 1 2N N x (n) VW x (n) 2 n=1 In other words, the optimal solution is PCA. Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 16 / 18

Autoencoders: Nonlinear PCA What if g() is not linear? Then we are basically doing nonlinear PCA Some subtleties but in general this is an accurate description Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 17 / 18

Comparing Reconstructions Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 18 / 18