Reconstructing visual experiences from brain activity evoked by natural movies

Reconstructing visual experiences from brain activity evoked by natural movies Shinji Nishimoto, An T. Vu, Thomas Naselaris, Yuval Benjamini, Bin Yu, and Jack L. Gallant, Current Biology, 2011 -Yi Gao, PhD candidate in Neuroscience 11/19/2018

Brain activity Functional magnetic resonance imaging(fmri) measures brain activity by detecting changes associated with blood flow Unit: voxel The activity of neurons in your brain constantly changes as you are involved in different activities https://vimeo.com/143768892 by Dr. Mark Lescroart

Part 1: Encoding Part 2: Decoding f(x)? ffffffff f(x)

Part 1: Encoding Recorded BOLD signals from three human subjects while they viewed a series of color natural movies (20 * 20 at 15 Hz) Two separate data sets were obtained from each subjects Training data: BOLD signals evoked by 7,200 s of color natural movies Data fit to a separate encoding model for each voxel located in posterior and ventral occipitotemporal visual cortex Test data: BOLD signals evoked by 540 s of color natural movies Data used to assess accuracy of encoding model and as targets for movie reconstruction

Motion-energy encoding model A two-stage process: Motion-energy filters Temporal hemodynamic response filters

Step 1: Motion-energy filters Spatially down-sample all movie frames to 96x96 pixels Convert RGB to CIE color space: luminance info maintained, color info discarded Luminance patterns pass through a bank of three-dimensional spatiotemporal Gabor wavelet filters

Gabor Wavelet filters Each filter is constructed by multiplying a three-dimensional spatiotemporal (2 dimensions for space, 1 dimension for time) sinusoid by a three-dimensional spatiotemporal Gaussian envelope Contains 6,555 separate three-dimensional Gabor filters Filters occur at six spatial frequencies (0, 2, 4, 8, 16, and 32 cycles/image), three temporal frequency filters (0, 2, and 4 Hz), and eight directions (0, 45,,135 degrees) NIshimoto & Gallant, 2011

Step 1: Motion-energy filters Each filter occurs at two quadratic phases (0 and 90 degrees) to facilitate motionenergy computation Output of each quadrature pair of filters is squared and summed to yield local motion-energy measurements(adelson & Bergen, 1985)

Step 1: Motion-energy filters Compression: log transformation

Step 1: Motion-energy filters Temporally down-sampled from 15 Hz to the sampling rate used to measure BOLD signals (1 Hz) Z-score transformation (mean = 0, standard deviation = 1)

Step 2: Hemodynamic response filters

Step 2: Hemodynamic response filters To capture temporal delays of BOLD signals in the model, vector s is constructed by concatenating motion-energy filtered stimulus vectors at various temporal delays Temporal window of hemodynamic response filters was 3-6 s before BOLD signal(4 time samples) L1-regularized least squares regression procedure was used to obtain the linear weights

Part 1: Encoding

Validity of motion-energy encoding model Comparison of retinotopic angle maps estimated using (top) the motion-energy encoding model and (bottom) conventional multi-focal mapping on a flattened cortical map

Movie identification accuracy Can the model correctly associate an observed BOLD signal pattern with the specific stimulus that evoked it? Motion-energy encoding model identified the specific movie stimulus that evoked an observed BOLD signal 95% of the time Identification accuracy was >75% for all three subjects

Motion-energy encoding model Valid Temporal information Spatial information Might provide good reconstructions of natural movies from brain activity measurements

Part 2: Decoding

A Bayesian approach to reconstruct movies from evoked BOLD signals Use BOLD signals measured from a set of voxels to recreate a picture of an unknown stimulus Observed signals Unknown movie?

A Bayesian approach to reconstruct movies from evoked BOLD signals Prior: Sampled natural movie (~18 million 1-s movie clips from YouTube) Posterior: 1. Each clip in the sampled prior is processed using the motion-energy encoding models fit to each voxel 2. Predicted signals are compared to measured signals evoked by an unknown stimulus Posterior rank: Likelihood of the observed response given the clip

Maximum a posteriori (MAP) and Averaged high posterior (AHP) reconstructions

Quantifying reconstruction quality Reconstruction accuracy was significantly higher than chance (p <.0001)

Summary Developed an encoding model that predicts BOLD signals in early visual areas with high accuracy. By using this model in a Bayesian framework, the authors were the first to reconstuct natural movies from human brain activity.