Pixels to Voxels: Modeling Visual Representation in the Human Brain

Size: px

Start display at page:

Download "Pixels to Voxels: Modeling Visual Representation in the Human Brain"

Holly Hoover
6 years ago
Views:

1 Pixels to Voxels: Modeling Visual Representation in the Human Brain Authors: Pulkit Agrawal, Dustin Stansbury, Jitendra Malik, Jack L. Gallant Presenters: JunYoung Gwak, Kuan Fang

2 Outlines Background Motivation Related Works Models Experiments

3 Background Brain areas related to vision Simulation of brain activities

4 Background Functional Magnetic resonance imaging (fmri) Blood oxygenation level-dependent (BOLD)

5 Motivation Human Computer Vision mountain/people/fishes /... Feature Representations mountain/human/fishes /...

6 Related Works T Naselaris et al, Bayesian reconstruction of natural images from human brain activity, Neuron 2009

7 Related Works Shinji Nishimoto, Reconstructing Visual Experiences from Brain Activity Evoked by Natural Movies, Current Biology 2011

8 Model: Ridge Regression

9 19-Category (19-Cat) Feature Representation 19-dimensional binary indicating the presence (1) or absence (0) of 19 semantic image categories (e.g. furniture, vehicle, water, etc.)

classification 7 feature space (conv-1 to conv-5, fc-6 to fc-7) select

10 ConvNet Feature Representation For each brain voxel, select one of the ConvNet layers as the ConvNet feature: AlexNet pretrained on ImageNet classification 7 feature space (conv-1 to conv-5, fc-6 to fc-7) select the optimal ConvNet layer maximizing prediction accuracy of voxel activity

11 Fisher-Vector (FV) Feature Representation Patch features: Extracted using SIFT descriptors Prototypical Patch Features: A dictionary of 64 modes of patch features learned using Gaussian Mixture Model (GMM) on random natural image patches FV Features: Reflect the difference between patch features and prototypical patch features.

12 Encoding model performance How accurately does each model predict the brain activity? Measure the performance of features spaces from computer vision Fisher-Vector (FV) Feature ConvNet Feature Compared to previously studied 19-Category Feature Accuracy measure: correlation coefficient In order to avoid correlation by chance, focus on significant values Null distribution: obtained by chance, 1000 permutation of validation set response Take upper (1-p-value)-th percentile of null distribution as threshold for significant correlation

13 Encoding model performance

14 Encoding model performance For each plot each point: accuracy of single voxel prediction x-axis: 19-cat accuracy y-axis: computer vision feature accuracy gray area: accuracy below significance threshold (p < ) What this means Red dots: voxels where prediction of computer vision feature outperforms 19-cat model Blue dots: voxels where prediction of 19-cat model outperforms computer vision feature Gray dots: indistinguishable from noise, discarded

15 Encoding model performance Plotted according to ROIs, as identified in earlier studies Early visual areas Higher visual areas Early visual areas Higher visual areas

16 Encoding model performance Computer vision features clearly outperforms 19-cat features at lower visual areas Known to be selective for structural information in natural images 19-cat features does not have any structural information

17 Encoding model performance Computer vision features shows comparable performance to 19-cat features at higher visual areas Believed to be involved in form processing and object segmentation In general, ConvNet Feature performs better than FV feature

18 Encoding model performance FV feature vs ConvNet feature Red: ConvNet feature outperforms FV feature Blue: FV feature outperforms ConvNet feature ConvNet feature outperforms FV feature at earlier and intermediate visual areas Comparable for higher visual areas

19 Investigating Voxel Tuning What we have shown: What we want to do: FV and ConvNets features can be used to predict brain activity in many visual areas Gain better understanding of human visual representation by examining FV and ConvNet models (for this paper, analysis using ConvNet only) How: Visualizing top five images which activates/deactivates each voxel in different ROIs Clustering model weights of the same ROI. Visualizing each cluster

20 Voxel activation analysis One voxel from each ROI Top five images which activates/deactivates each voxel out of 170k images Top five images which activates this voxel Top five images which deactivates this voxel

21 Voxel activation analysis V1: Increase activity with high-frequency texture Decrease activity with low-frequency texture Top five images which activates this voxel Top five images which deactivates this voxel

22 Voxel activation analysis V4: (largely unknown area from previous studies) Increase activity with blob in the center Decrease activity with large-scale texture Top five images which activates this voxel Top five images which deactivates this voxel

23 Voxel activation analysis EBA Increase activity with people or animals Decrease activity with scenes or texture Top five images which activates this voxel Top five images which deactivates this voxel

24 Voxel activation analysis PPA Increase activity with large scenes Decrease activity with small item with high texture Top five images which activates this voxel Top five images which deactivates this voxel

25 Model weight clustering Investigate fine-grained structure of ROI K-Means clustering on ConvNet model weights within EBA with high accuracy Two subjects, or people (reproducible) Visualize top five images which activates/deactivates cluster average model weight

26 Model weight clustering C1 Increase activity with group of people in action Decrease activity with rounded shape (s1) or landscape (s2) Top five images which activates this cluster Top five images which deactivates this cluster

27 Model weight clustering C2 Increase activity with single person Decrease activity with landscape Top five images which activates this cluster Top five images which deactivates this cluster

28 Model weight clustering Clusters are spatially coherent Clusters are present at corresponding anatomical location in both subjects

29 Conclusion Show that features used in computer vision can be used to predict human brain activity Propose a new way to investigate visual representation in the human brain using analysis in ConvNet Analysis results match with previous findings about different ROIs Analysis can be used to explore conventional ROIs in further details

PIXELS TO VOXELS: MODELING VISUAL REPRESENTATION IN THE HUMAN BRAIN

PIXELS TO VOXELS: MODELING VISUAL REPRESENTATION IN THE HUMAN BRAIN By Pulkit Agrawal, Dustin Stansbury, Jitendra Malik, Jack L. Gallant University of California Berkeley Presented by Tim Patzelt AGENDA