Modeling Visual Cortex V4 in Naturalistic Conditions with Invari. Representations

Similar documents
Learning-based Methods in Vision

Nonparametric sparse hierarchical models describe V1 fmri responses to natural images

OBJECT RECOGNITION ALGORITHM FOR MOBILE DEVICES

Sparse Coding and Dictionary Learning for Image Analysis

Beyond Spatial Pyramids: Receptive Field Learning for Pooled Image Features

String distance for automatic image classification

arxiv: v1 [cs.lg] 20 Dec 2013

Efficient Visual Coding: From Retina To V2

PIXELS TO VOXELS: MODELING VISUAL REPRESENTATION IN THE HUMAN BRAIN

The most cited papers in Computer Vision

A Hierarchial Model for Visual Perception

The SIFT (Scale Invariant Feature

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction

Robotics Programming Laboratory

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

Aggregating Descriptors with Local Gaussian Metrics

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications

Pixels to Voxels: Modeling Visual Representation in the Human Brain

ImageCLEF 2011

Facial Expression Classification with Random Filters Feature Extraction

CSC 411 Lecture 18: Matrix Factorizations

Seeking Interpretable Models for High Dimensional Data

Sparse Models in Image Understanding And Computer Vision

Generalized Lasso based Approximation of Sparse Coding for Visual Recognition

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

Reconstructing visual experiences from brain activity evoked by natural movies

Sparse coding for image classification

Human Vision Based Object Recognition Sye-Min Christina Chan

Multi-Class Image Classification: Sparsity Does It Better

EECS150 - Digital Design Lecture 14 FIFO 2 and SIFT. Recap and Outline

Exploring Bag of Words Architectures in the Facial Expression Domain

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Motion Estimation and Optical Flow Tracking

Detecting Salient Contours Using Orientation Energy Distribution. Part I: Thresholding Based on. Response Distribution

On Compact Codes for Spatially Pooled Features

Discriminative classifiers for image recognition

BossaNova at ImageCLEF 2012 Flickr Photo Annotation Task

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

Bilevel Sparse Coding

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

Learning Convolutional Feature Hierarchies for Visual Recognition

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Learning Feature Hierarchies for Object Recognition

Nicolai Petkov Intelligent Systems group Institute for Mathematics and Computing Science

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Beyond Mere Pixels: How Can Computers Interpret and Compare Digital Images? Nicholas R. Howe Cornell University

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

CAP 5415 Computer Vision Fall 2012

Multiple Kernel Learning for Emotion Recognition in the Wild

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

Nicolai Petkov Intelligent Systems group Institute for Mathematics and Computing Science

Local invariant features

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

A 2-D Histogram Representation of Images for Pooling

C. Poultney S. Cho pra (NYU Courant Institute) Y. LeCun

Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study

Biologically-Inspired Translation, Scale, and rotation invariant object recognition models

Sparsity-Regularized HMAX for Visual Recognition

By Suren Manvelyan,

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

Mapping of Hierarchical Activation in the Visual Cortex Suman Chakravartula, Denise Jones, Guillaume Leseur CS229 Final Project Report. Autumn 2008.

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Supplementary material: Strengthening the Effectiveness of Pedestrian Detection with Spatially Pooled Features

Chapter 3 Image Registration. Chapter 3 Image Registration

Action Recognition Using Super Sparse Coding Vector with Spatio-Temporal Awareness

A Probabilistic Model for Recursive Factorized Image Features

Artistic ideation based on computer vision methods

Supervised Translation-Invariant Sparse Coding

Bag-of-features. Cordelia Schmid

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

2D Image Processing Feature Descriptors

Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms

Sparsity and image processing

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Online Learning for Object Recognition with a Hierarchical Visual Cortex Model

Nicolai Petkov Intelligent Systems group Institute for Mathematics and Computing Science

Robust Shape Retrieval Using Maximum Likelihood Theory

TEXTURE CLASSIFICATION METHODS: A REVIEW

The importance of intermediate representations for the modeling of 2D shape detection: Endstopping and curvature tuned computations

CS 223B Computer Vision Problem Set 3

Multi-view Facial Expression Recognition Analysis with Generic Sparse Coding Feature

arxiv: v3 [cs.cv] 3 Oct 2012

Adaptive Deconvolutional Networks for Mid and High Level Feature Learning

Three-Dimensional Computer Vision

Introduction to visual computation and the primate visual system

Fitting Models to Distributed Representations of Vision

Affine-invariant scene categorization

Administrivia. Enrollment Lottery Paper Reviews (Adobe) Data passwords My office hours

CS 231A Computer Vision (Fall 2011) Problem Set 4

Selection of Scale-Invariant Parts for Object Class Recognition

Learning Representations for Visual Object Class Recognition

MULTI ORIENTATION PERFORMANCE OF FEATURE EXTRACTION FOR HUMAN HEAD RECOGNITION

Frequent Inner-Class Approach: A Semi-supervised Learning Technique for One-shot Learning

Vision and Image Processing Lab., CRV Tutorial day- May 30, 2010 Ottawa, Canada

Natural image statistics and efficient coding

Salient Region Detection and Segmentation in Images using Dynamic Mode Decomposition

Outline 7/2/201011/6/

Simple Method for High-Performance Digit Recognition Based on Sparse Coding

Transcription:

Modeling Visual Cortex V4 in Naturalistic Conditions with Invariant and Sparse Image Representations Bin Yu Departments of Statistics and EECS University of California at Berkeley Rutgers University, May 2, 2014 1/51

Co-authors Yu Group: Julien Mairal and Yuval Benjemani (leads) Gallant Lab at UC Berkeley: Ben Willmore, Michael Oliver, Jack Gallant Supported by NSF STC, Center for Science of Information (CSoI) 2/51

Brain Science 2013 new genomics 3/51

!"#"$%&'(&)*+",'-*+&".'(/-%",'"+&"-,'%)!*!"#$%&'%(%')#%!*!&!%!" $'()*#$ #$ +$"#%!*!! 4/51

!"#"$%&'"()&"*+(#,-$*".&%/(0#1,22(01"02!!!"#$%$&'()(*$+$&$,(-../ 5/51

!"#$%&$'("#'()*&'(+"&'%+,"-."%+'($%!! /*0'1*&23"4"5$%%$(6"7888 6/51

!"#$%&'()*"'+$,-.).$'-"./-,("01"-,/+'-!! 2)3$(4"5)6(,-"7"8)**)-.4"9::; 7/51

Sources of Knowledge About the Brain Ways of Understanding the Visual Cortex study of lesions and associated impairments; electrodes (single or arrays); imaging studies (fmri,...); image below from Hansen et al. [2007] 8/51

Classical Models for V1 Simple Cells Image from Olshausen and Field [2005]: V1 Models based on Gabor filters achieve impressive prediction performance with experimental data(signle neuron and fmri); V1 receptive fields are relatively small and well localized. V1 is the most well understood area, but not all is... It serves as a performance benchmark for other areas. 9/51

V4: an intermediate area on the what pathway [see Roe et al., 2012] What we know about V4 affected by attention; diverse selectivity; larger receptive fields and is more invariant than V1/V2; no good predictive model with natural image inputs. Question about V4 what are the roles of V4? Roe et al advocated a background-foreground thesis, among other things. 10/51

Experimental set-up in the Gallant Lab for single neuron data collection Image Sequence Subject Recording Response Signal Filtering Spike Sorting Time Binning Firing Rate 11/51

Objectives and data Aim: a statistical/computational model with Data good prediction performance on natural scenes (validation data); elucidation of properties of a population of V4 neurons; biologically interpretability. consists of 4000 12000 grayscale images (with no motion or color content) and average firing rates for 71 neurons; the image sequence is shown at 30 Hz; the stimuli is centered on an estimated receptive field (RF) while the subject performs a fixation task; the stimuli size is 2 4 times larger than the estimated RF. 12/51

Outline of Today s Talk Multi-layer invariant feature extraction Prediction model via low-rank regularization Model interpretation 13/51

Part I: Invariant Image Representation: Feature Extraction 14/51

Methodology Classical computer vision image representation for scene analysis 1 dense low-level feature extraction (local histograms of gradient orientations) [Lowe, 2004]; 2 feature encoding into visual words using vector quantization or sparse coding [Olshausen and Field, 1997]; 3 feature pooling. A state-of-the-art pipeline for scene and object recognition [Lazebnik et al., 2006, Yang et al., 2009, Boureau et al., 2010]; 15/51

Methodology Classical computer vision image representation for scene analysis 1 dense low-level feature extraction (local histograms of gradient orientations) [Lowe, 2004]; 2 feature encoding into visual words using vector quantization or sparse coding [Olshausen and Field, 1997]; 3 feature pooling. A state-of-the-art pipeline for scene and object recognition [Lazebnik et al., 2006, Yang et al., 2009, Boureau et al., 2010]; Can we exploit some ideas from this line of thinking to mimic invariance properties of V4 neurons? obtain a model tuned to natural images? be as biologically compatible as possible? 15/51

Our pipeline 16/51

First Layer 17/51

Patch Extraction Across Orientation Maps The 3D-patches are of size 4 4 8 = 128 (8 orientations); correspond in the original image domain to 32 32 patches; are invariant to local deformations in the original image domain; 18/51

Patch Extraction Across Orientation Maps What is the effect of the first processing layer? Comparing patches on orientation maps and in the image domain leads to a different similarity measure. 1.0 0.92 0.92 0.91 0.86 0.76 1.0 0.65 0.76 0.97 0.95 0.93 In blue, correlation in the original image domain. In red, correlation in the new domain. 19/51

Constrast Normalization Given a 3D-patch x in R 128 +, we apply x x/ max( x 2, c). In short, our 3D-patches have similar properties as dense SIFT descriptors [Lowe, 2004] but are based on simple filtering/subsampling and normalization step. 20/51

Second Layer 21/51

Sparse Coding 1 learn a dictionary on a database of 3D-patches once in for all; 2 encode all 3D-patches of an image to obtain feature maps. Dictionary Learning Formulation n 1 min A R p n,d D 2 xi Dα i 2 2 + λ α i 1, i=1 }{{}}{{} sparsity data fitting We also force the codes α and the dictionary D to be non-negative. The dictionary is fairly large (p = 2 048). The original formulation of Olshausen and Field [1997] was in the image domain. It was successfully used on image descriptors in computer vision [Yang et al., 2009, Boureau et al., 2010]. 22/51

Third Layer 23/51

Feature Pooling A = α 1 α 2 α 29 29 }{{} sparse codes of an image = β 1. β p }{{} single vector The pooling operation is the l 2 -norm of features β k = 29 29 i=1 (αk i )2 for one pooling region and k = 1,..., 2048. Another alternative is the max-pooling operation [Riesenhuber et al., 1999, Cadieu et al., 2007], often used in computer vision [Lazebnik et al., 2006, Yang et al., 2009, Boureau et al., 2010]. 24/51

Summary 25/51

Our Feature Extraction Process in a Nutshell It consists of local simple operations and uses the sparse coding principle. In particular, it has some invariance to small image deformation (first layer); has some selectivity to features learned from natural image statistics (second layer); is shift invariant within the receptive field (third layer); 26/51

Part II: Prediction Model based on Extracted Features 27/51

Prediction Pipeline Neuron Responses Linear Model Input Image Sequence Nonlinear Encoding 28/51

Temporal Aspect of Data Typical Time Response to an Excitatory Stimulus 29/51

Prediction with Low-Rank Regularization Input: image sequence and neuron responses y 1, y 2,..., y T ; Preprocessing: center and normalize the neuron responses; Image feature vector: (2048*5)-dimensional β t vector at time t = 1,..., T ; Model: y t τ j=1 βt j w j with a lag τ = 9 starting at time t + 1; Constraint on the weights w j : Formulation: W = [w 1, w 2,..., w τ ] R p τ }{{} time window min W R p τ T t=1 1 2 (y t should be low-rank. τ β t j w j ) 2 + γ W.. is the trace norm (l 1 -norm or sum of singular values) [see Fazel et al., 2001]. j=0 30/51

Prediction Performance on Validation Data The baseline is a well engineered non-linear Gabor model [see, e.g., Nishimoto and Gallant, 2011] (state of the art for V1/V2 prediction). 31/51

Prediction Performance on Validation Data 32/51

Prediction Performance on Validation Data Example for a neuron with ρ = 0.67 on validation data 33/51

Prediction Performance: first model to achieve similar prediction performance on natural images as the ones achieved for V1 and V2 cells; significantly outperforms the Gabor model; 5-fold cross-validation leads to semi time-separable models. 34/51

Part III: Model Interpretation 35/51

Visualization of Dictionary Elements Difficulty: 3D-patches are not in the image domain We build a database of one million correspondence pairs of (32 32 image patches, 4 4 8 3D-patches) and find best matches. (A) Complex Feature Visualization 1 2 3 4 5 6 7 8 9 10 11 12 (B) Feature Classification by Type Straight line Curve inward Curve outward Multiple curves 36/51

5 Categorization of Dictionary Elements 1/2 6 (B) Feature Classification by Type Straight line Curve inward Curve outward Multiple curves 11 12 White bar Black bar Stripes Smooth Transition White corner Black corner Acute white corner Acute black corner White blob Black blob Double white blobs Double black blobs 37/51

Categorization of Dictionary Elements 2/2 White blob Black blob Double white blobs Double black blobs Complex blobs Crosses and Junctions Others We also categorize each dictionary element by orientation; scale; texture affinity. 38/51

Visualization of Feature Channels curves/edges bars corners blobs image 39/51

Model Analysis Methodology for Population Analysis 1 find the peak excitation lag; 2 look at the center pooling region; 3 compute 19 impact values: 8 feature type, 3 scales, 3 texture affinity, 5 orientation; 4 perform a sparse principal component analysis [Zou et al., 2006]. Methodology for Individual Neuron Analysis retrieve most excitatory and inhibitory images; compute impact values (possibly with refined categories). 40/51

Individual Neuron Analysis neuron 64, ρ = 0.64 41/51

Individual Neuron Analysis neuron 40, ρ = 0.76 42/51

Individual Neuron Analysis neuron 71, ρ = 0.75 43/51

Individual Neuron Analysis neuron 22, ρ = 0.64 44/51

Individual Neuron Analysis neuron 24, ρ = 0.63 45/51

Individual Neuron Analysis neuron 31, ρ = 0.66 46/51

Individual Neuron Analysis neuron 59, ρ = 0.67 47/51

Population Analysis via Sparse PCA 48/51

Population Analysis via Sparse PCA 49/51

Summary Main conclusions first quantitative model with natural images that meet the benchmark performance; the most striking observation is the role of contours vs texture discrimination; V4 neurons are selective to a large diversity of features types such as bars, edges, corners... some V4 neurons are selective to orientation, some of them not. 50/51

Current Directions include time and color to deal with fully naturalistic conditions; using the new V4 model for movie reconstruction based on fmri data; theoretical and simulation studies of deep learning methods; sparse coding: a key step in analyzing fruitfly TF images. 51/51

References I Y-L. Boureau, F. Bach, Y. Lecun, and J. Ponce. Learning mid-level features for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010. C. Cadieu, M. Kouh, A. Pasupathy, C.E. Connor, M. Riesenhuber, and T. Poggio. A model of v4 shape selectivity and invariance. Journal of Neurophysiology, 98(3):1733 1750, 2007. M. Fazel, H. Hindi, and S.P. Boyd. A rank minimization heuristic with application to minimum order system approximation. In American Control Conference, 2001. Proceedings of the 2001, volume 6, pages 4734 4739, 2001. K.A. Hansen, K.N. Kay, and J.L. Gallant. Topographic organization in and near human visual area v4. The Journal of Neuroscience, 27(44): 11896 11911, 2007. 52/51

References II S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006. D.G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91 110, 2004. S. Nishimoto and J.L. Gallant. A three-dimensional spatiotemporal receptive field model explains responses of area mt neurons to naturalistic movies. The Journal of Neuroscience, 31(41): 14551 14564, 2011. B. A. Olshausen and D. J. Field. Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research, 37: 3311 3325, 1997. B.A. Olshausen and D.J. Field. How close are we to understanding v1? Neural computation, 17(8):1665 1699, 2005. 53/51

References III M. Riesenhuber, T. Poggio, et al. Hierarchical models of object recognition in cortex. Nature neuroscience, 2:1019 1025, 1999. A.W. Roe, L. Chelazzi, C.E. Connor, B.R. Conway, I. Fujita, J.L. Gallant, H. Lu, and W. Vanduffel. Toward a unified theory of visual area v4. Neuron, 74(1):12 29, 2012. J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009. H. Zou, T. Hastie, and R. Tibshirani. Sparse principal component analysis. Journal of computational and graphical statistics, 15(2): 265 286, 2006. 54/51

SIFT Representation [Lowe, 2004] 1 compute pixel orientations: (ρ= I, θ =arctan( y I / x I )). 2 binning + histograms. 3 normalization. standard parameters: 8 orientations, 16 16 patches, 4 4 bins. 55/51