PIXELS TO VOXELS: MODELING VISUAL REPRESENTATION IN THE HUMAN BRAIN

Size: px

Start display at page:

Download "PIXELS TO VOXELS: MODELING VISUAL REPRESENTATION IN THE HUMAN BRAIN"

Lillian Holmes
5 years ago
Views:

1 PIXELS TO VOXELS: MODELING VISUAL REPRESENTATION IN THE HUMAN BRAIN By Pulkit Agrawal, Dustin Stansbury, Jitendra Malik, Jack L. Gallant University of California Berkeley Presented by Tim Patzelt

2 AGENDA Introduction Goals Previous models New model approach Methods Source Data constructing Encoding Models Encoding Model Performance Investigating Voxel Tuning Conclusion

3 GOALS Why are tasks like image interpretation and object recognition performed by the human brain almost effortlessly? Already good progress in understanding how the brain represents categories of objects and action with the help of hand annotated images. Model low-level input (pixels) to high-level brain activity (voxels) without semantic tags by humans New platform for exploring the functional principles of human vision

4 PREVIOUS MODELS Functional localizer approach Regions of Interest represent highlevel semantic information ROIs for animate features: EBA, OFA, FFA ROIs for natural scenes: OPA, PPA, RSC Agrawal P, Stansbury D, Malik J, Gallant JL (2014)

5 Previous Computational Encoding Model Nonlinear mapping between stimulus and measured brain activity Single experiment can contain arbitrary number of semantic categories Prediction for stimuli not used to fit the model features: binary vector of pre- or absence of semantic categories (created by hand) Drawbacks: Hand annotations bias the fit encoding model Slow Limited space of encoding models that can be explored Good prediction of brain activity

6 NEW ENCODING MODEL APPROACH Create a candidate feature space to model brain activity State-of-the-art CV/ML algorithms without need of hand annotations 1.Fisher Vectors encode a local image descriptors network 2. Hierarchical image representation learned by a convolutional neural network category model created by hand to compare the results

7 Agrawal P, Stansbury D, Malik J, Gallant JL (2014)

8 METHODS Source data: Two subjects 1260 images shown twice (training set) 126 images shown 12 times (validation set) voxels in the cerebral cortex recorded Separate encoding model for each voxel Accuracy was expressed as correlation coefficient between observed and predicted stimulus

9 FISHER VECTOR FEATURE REPRESENTATION SIFT features capture the distribution of egde orientations in one patch Prototypical Patch Features are learned by using a Gaussian Mixture Model Concatenated mean vector distance of all patches

CNN FEATURE REPRESENTATION Seven layers (conv-1 to conv-5, fc-6, fc-7) as potential feature space Image classification trained on ImageNet Database > 1mio.

10 CNN FEATURE REPRESENTATION Seven layers (conv-1 to conv-5, fc-6, fc-7) as potential feature space Image classification trained on ImageNet Database > 1mio. natural images with 1000 distinct object categories Stimulus overlap between ImageNet and estimation/training set is less than 0.5% Feature space was selected by maximizing prediction accuracy of voxel accuracy

11 19-CAT FEATURE REPRESENTATION 19-dimensional binary vector High-level semantic categories annotated by hand 19-Cat model predicts brain acitivity nearly as well as more complicated models

12 ENCODING MODEL PERFORMANCE

13 ENCODING MODEL PERFORMANCE Agrawal P, Stansbury D, Malik J, Gallant JL (2014)

14 ENCODING MODEL PERFORMANCE Agrawal P, Stansbury D, Malik J, Gallant JL (2014)

15 INVESTIGATING VOXEL TUNING Gain better understanding of human visual representation Use ConvNet model weights to generate theoretical responses to a large collection of natural images present images with highest/lowest responses predicted for one voxel

16 GOALS Why are tasks like image interpretation and object recognition performed by the human brain almost effortlessly? Already good progress in understanding how the brain represents categories of objects and action with the help of hand annotated images. Model low-level input (pixels) to high-level brain activity (voxels) without semantic tags by humans New platform for exploring the functional principles of human vision

17 INVESTIGATING VOXEL TUNING Encoding models provide means to investigate classical ROIs in detail Run k-means clustering of the model weights of all voxels in one ROI to find out if there are functional subdivisions

18 GOALS Why are tasks like image interpretation and object recognition performed by the human brain almost effortlessly? Already good progress in understanding how the brain represents categories of objects and action with the help of hand annotated images. Model low-level input (pixels) to high-level brain activity (voxels) without semantic tags by humans New platform for exploring the functional principles of human vision

19 GOALS

20 CONCLUSION Computer vision and machine learning models provide a powerful framework to predict human brain activity evoked by complex images Replicate the results of multiple localizer approaches in a single experiment The algorithms learned the features without hand annotations The models fit can be used to visualize the patterns which predict to increase/ decrease brain activity in certain regions Means to explore classical ROIs in more detail

Pixels to Voxels: Modeling Visual Representation in the Human Brain

Pixels to Voxels: Modeling Visual Representation in the Human Brain Authors: Pulkit Agrawal, Dustin Stansbury, Jitendra Malik, Jack L. Gallant Presenters: JunYoung Gwak, Kuan Fang Outlines Background Motivation