Slides adapted from Marshall Tappen and Bryan Russell. Algorithms in Nature. Non-negative matrix factorization

Similar documents
CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Dimension Reduction CS534

Dimension reduction : PCA and Clustering

Data Compression. The Encoder and PCA

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018

Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017

Unsupervised learning in Vision

Facial Expression Recognition Using Non-negative Matrix Factorization

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

Unsupervised Learning

Efficient Visual Coding: From Retina To V2

Visual object classification by sparse convolutional neural networks

FACE RECOGNITION USING INDEPENDENT COMPONENT

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

CS325 Artificial Intelligence Ch. 20 Unsupervised Machine Learning

Machine Learning 13. week

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

C. Poultney S. Cho pra (NYU Courant Institute) Y. LeCun

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Radial Basis Function Networks: Algorithms

CSC321: Neural Networks. Lecture 13: Learning without a teacher: Autoencoders and Principal Components Analysis. Geoffrey Hinton

Applications Video Surveillance (On-line or off-line)

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Why Normalizing v? Why did we normalize v on the right side? Because we want the length of the left side to be the eigenvalue

Active Appearance Models

Feature selection. Term 2011/2012 LSI - FIB. Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/ / 22

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

PATTERN CLASSIFICATION AND SCENE ANALYSIS

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

Grundlagen der Künstlichen Intelligenz

Modelling and Visualization of High Dimensional Data. Sample Examination Paper

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

CS664 Lecture #21: SIFT, object recognition, dynamic programming

Face detection and recognition. Detection Recognition Sally

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Face Recognition using Rectangular Feature

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Effectiveness of Sparse Features: An Application of Sparse PCA

Learning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation

Clustering and Visualisation of Data

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\

Why equivariance is better than premature invariance

International Journal of Digital Application & Contemporary research Website: (Volume 1, Issue 8, March 2013)

Network Traffic Measurements and Analysis

Data Mining. Neural Networks

Facial Expression Classification with Random Filters Feature Extraction

COMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

Persistent Homology for Characterizing Stimuli Response in the Primary Visual Cortex

Locating ego-centers in depth for hippocampal place cells

HW Assignment 3 (Due by 9:00am on Mar 6)

ECE 484 Digital Image Processing Lec 17 - Part II Review & Final Projects Topics

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders

Cluster Analysis (b) Lijun Zhang

Deep Learning. Volker Tresp Summer 2014

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

Dimensionality Reduction, including by Feature Selection.

CS 4510/9010 Applied Machine Learning. Deep Learning. Paula Matuszek Fall copyright Paula Matuszek 2016

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

Unsupervised Learning

CPSC 340: Machine Learning and Data Mining. Multi-Dimensional Scaling Fall 2017

Neural Networks and Deep Learning

Introduction to Deep Learning

Face recognition based on improved BP neural network

Divide and Conquer Kernel Ridge Regression

Applying Supervised Learning

Image Compression: An Artificial Neural Network Approach

CSC 411 Lecture 18: Matrix Factorizations

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

Digital Image Processing (CS/ECE 545) Lecture 5: Edge Detection (Part 2) & Corner Detection

For Monday. Read chapter 18, sections Homework:

CLASSIFICATION AND CHANGE DETECTION

CS6220: DATA MINING TECHNIQUES

A NEURAL NETWORK BASED IMAGING SYSTEM FOR fmri ANALYSIS IMPLEMENTING WAVELET METHOD

All good things must...

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

Online Subspace Estimation and Tracking from Missing or Corrupted Data

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

Learning via Optimization

Data Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Learning Sparse Basis Vectors in Small-Sample Datasets through Regularized Non-Negative Matrix Factorization

Motion Interpretation and Synthesis by ICA

Detecting Salient Contours Using Orientation Energy Distribution. Part I: Thresholding Based on. Response Distribution

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos

Nonlinear projections. Motivation. High-dimensional. data are. Perceptron) ) or RBFN. Multi-Layer. Example: : MLP (Multi(

Recovery of Piecewise Smooth Images from Few Fourier Samples

Artificial Neural Networks (Feedforward Nets)

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

CMPT 882 Week 3 Summary

Face detection and recognition. Many slides adapted from K. Grauman and D. Lowe

Transcription:

Slides adapted from Marshall Tappen and Bryan Russell Algorithms in Nature Non-negative matrix factorization

Dimensionality Reduction The curse of dimensionality: Too many features makes it difficult to visualize and interpret data Harder to efficiently learn robust statistical models Problem statement: Given a set of images.. 1. Create basis images that can be linearly combined to reconstruct the original (or new) images 2. Find weights to reproduce every input image from the basis images One set of weights for each input image

Principal Components A low-dimensionality representation that minimizes reconstruction error Analysis face face eigenfaces

PCA weaknesses Only allows linear projections Co-variance matrix is of size dxd. If d=10 4, then Σ = 10 8 Solution: singular value decomposition (SVD) PCA restricts to orthogonal vectors in feature space that minimize reconstruction error Solution: independent component analysis (ICA) seeks directions that are statistically independent, often measured using information theory Assumes points are multivariate Gaussian Solution: Kernel PCA that transforms input data to other spaces

PCA vs. Neural Networks PCA Unsupervised dimensionality reduction Linear representation that gives best squared error fit No local minima (exact) Non-iterative Orthogonal vectors ( eigenfaces ) Neural Networks Supervised dimensionality reduction Non-linear representation that gives best squared error fit Possible local minima (gradient descent) Iterative Auto-encoding NN with linear units may not yield orthogonal vectors

Is this really how humans characterize and identify faces?

What don t we like about PCA? Basis images aren t physically intuitive Humans can explain why a face is a face PCA involves adding up some basis images and subtracting others which may not make sense in some applications: What does it mean to subtract a face? A document?

Going from the whole to parts.. Recording from neurons in the temporal lobe in the macaque monkey [Wachsmuth et al. 1994]

Going from the whole to parts.. [Wachsmuth et al. 1994] Neurons that respond primarily to the body spontaneous background activity control

Going from the whole to parts.. [Wachsmuth et al. 1994] Overall, recorded from 53 neurons: 17 (32%) responded to the head only 5 (9%) responded to the body only 22 (41%) responded to both the head and the body in isolation 9 (17%) responded to the whole body only (neither part in isolation) Suggestive of a parts-based (Today) representation with possible hierarchy

Non-negative matrix factorization Trained on 2,429 faces Like PCA, except the coefficients in the linear combination must be non-negative sparser encoding (vanishing coefficients) Forcing positive coefficients implies an additive combination of basis parts to reconstruct whole Several versions of mouths, noses, etc. Better physical analogue in neurons

Formal definition of NMF n m matrix of image database. n= # pixels/face; m= # faces n r matrix; r columns are the basis images, each of size n; eigenfaces r m matrix; r coefficients to represent each of the m faces non-negativity constraints WH is a compressed version of V How to choose the rank r? Want (n+m)r < nm

A similar neural network view n m matrix; input image database. n=# of pixels/face; m = # of faces n r matrix; r columns are the basis images, each of size n r m matrix; r coefficients to represent each of the m faces non-negativity constraints hidden variables; parts-based representation original image pixels

One possible objective function Reconstruction error: Update rule: a th basis projection for i th pixel update a th coefficient for the u th face ratio of actual to reconstructed pixel value for the u sum over all pixels th face Normalize

One possible objective function Update rule: a th basis projection for i th pixel update a th coefficient for the u th face ratio of actual to reconstructed pixel value for the u th sum over all pixels face Normalize Basic idea: multiply current value by a factor depending on the quality of the approximation. If ratio > 1, then we need to increase denominator. If ratio < 1, then we need to decrease denominator. If ratio = 1, do nothing.

What is significant about this? The update rule is multiplicative instead of additive In the initial values for W and H are non-negative, then W and H can never become negative This guarantees a non-negative factorization Will it converge? Yes, to a local optima: see [Lee and Seung, NIPS 2000] for proof

PCA vs. NMF PCA Unsupervised dimensionality reduction Orthogonal vectors with positive and negative coefficients Holistic ; difficult to interpret Non-iterative CS developed NMF Unsupervised dimensionality reduction Non-negative coefficients Parts-based ; easier to interpret Iterative (the presented algorithm) Biologically- inspired (alas, there are inhibitory neurons in the brain)

The Jennifer Aniston neuron [Quiroga et al., Nature 2005] UCLA neurosurgeon Itzhak Fried and researcher Quian Quiroga operating on patients with epileptic seizures Procedure requires implanting a probe in the brain, but doctor first needs to map surgical area (fyi, open brains do not hurt) Mind if I try some exploratory science? Flashed one-second snapshots of celebrities, animals, objects, and landmark buildings. Each person shown ~2,000 pictures. When Aniston was shown, one neuron in the medial temporal lobe always flashed Invariant to: different poses, hair styles, smiling, not smiling, etc. Never flashed for: Julia Roberts, Kobe Bryant, other celebrities, places, animals, etc.

Hierarchical models of object Stirred a controversy: Are there grandmother cells in the brain? [Lettvin, 1969] Or are there populations of cells that respond to a stimuli? Are the cells organized into a hierarchy? (Riesenhuber and Poggio model; see website) recognition