MultiVariate Bayesian (MVB) decoding of brain images

Similar documents
Multivariate analyses & decoding

Bayesian Inference in fmri Will Penny

MULTIVARIATE ANALYSES WITH fmri DATA

Karl Friston, Carlton Chu, Janaina Mourão-Miranda, Oliver Hulme, Geraint Rees, Will Penny, John Ashburner

Pattern Recognition for Neuroimaging Data

Multi-voxel pattern analysis: Decoding Mental States from fmri Activity Patterns

Extending the GLM. Outline. Mixed effects motivation Evaluating mixed effects methods Three methods. Conclusions. Overview

Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques

Correction for multiple comparisons. Cyril Pernet, PhD SBIRC/SINAPSE University of Edinburgh

Multivariate Pattern Classification. Thomas Wolbers Space and Aging Laboratory Centre for Cognitive and Neural Systems

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

Supplementary Figure 1. Decoding results broken down for different ROIs

An Introduction To Automatic Tissue Classification Of Brain MRI. Colm Elliott Mar 2014

Data Visualisation in SPM: An introduction

Multivariate pattern classification

Introduction to Neuroimaging Janaina Mourao-Miranda

Session scaling; global mean scaling; block effect; mean intensity scaling

Data Visualisation in SPM: An introduction

Voxel selection algorithms for fmri

The organization of the human cerebral cortex estimated by intrinsic functional connectivity

Building Classifiers using Bayesian Networks

Representational similarity analysis. Dr Ian Charest,

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

First-level fmri modeling

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

Cognitive States Detection in fmri Data Analysis using incremental PCA

Data mining for neuroimaging data. John Ashburner

Neuroimaging and mathematical modelling Lesson 2: Voxel Based Morphometry

Decoding analyses. Neuroimaging: Pattern Analysis 2017

Graphical Models, Bayesian Method, Sampling, and Variational Inference

CSE 158. Web Mining and Recommender Systems. Midterm recap

Basic Introduction to Data Analysis. Block Design Demonstration. Robert Savoy

the PyHRF package P. Ciuciu1,2 and T. Vincent1,2 Methods meeting at Neurospin 1: CEA/NeuroSpin/LNAO

Interpreting predictive models in terms of anatomically labelled regions

Introductory Concepts for Voxel-Based Statistical Analysis

Statistical Analysis of Neuroimaging Data. Phebe Kemmer BIOS 516 Sept 24, 2015

Machine Learning in Biology

Understanding multivariate pattern analysis for neuroimaging applications

CS 229 Final Project Report Learning to Decode Cognitive States of Rat using Functional Magnetic Resonance Imaging Time Series

CHAPTER 2. Morphometry on rodent brains. A.E.H. Scheenstra J. Dijkstra L. van der Weerd

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

7/15/2016 ARE YOUR ANALYSES TOO WHY IS YOUR ANALYSIS PARAMETRIC? PARAMETRIC? That s not Normal!

Computational Neuroanatomy

Quick Manual for Sparse Logistic Regression ToolBox ver1.2.1

Effect of age and dementia on topology of brain functional networks. Paul McCarthy, Luba Benuskova, Liz Franz University of Otago, New Zealand

Methods for data preprocessing

Medical Image Analysis

Using Machine Learning to Optimize Storage Systems

ICA mixture models for image processing

Learning from High Dimensional fmri Data using Random Projections

08 An Introduction to Dense Continuous Robotic Mapping

Decoding the Human Motor Cortex

Neuroimage Processing

Tutorial MICCAI'04. fmri data analysis: state of the art and future challenges. Jean-Baptiste Poline Philippe Ciuciu Alexis Roche

Gene Clustering & Classification

Sparse & Redundant Representations and Their Applications in Signal and Image Processing

Feature Selection for fmri Classification

Missing Data Analysis for the Employee Dataset

Supplementary Information. Anne Maass, Hartmut Schütze, Oliver Speck, Andrew Yonelinas, Claus Tempelmann,

Topographic Mapping with fmri

Functional MRI data preprocessing. Cyril Pernet, PhD

Multi-Voxel Pattern Analysis (MVPA) for fmri

MR IMAGE SEGMENTATION

Attention modulates spatial priority maps in human occipital, parietal, and frontal cortex

x 1 SpaceNet: Multivariate brain decoding and segmentation Elvis DOHMATOB (Joint work with: M. EICKENBERG, B. THIRION, & G. VAROQUAUX) 75 x y=20

C. Poultney S. Cho pra (NYU Courant Institute) Y. LeCun

Factor Topographic Latent Source Analysis: Factor Analysis for Brain Images

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Quality Checking an fmri Group Result (art_groupcheck)

Lecture 25: Review I

Weka ( )

SPM8 for Basic and Clinical Investigators. Preprocessing. fmri Preprocessing

CS 229 Midterm Review

Linear Models in Medical Imaging. John Kornak MI square February 22, 2011

Journal of Articles in Support of The Null Hypothesis

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Machine Learning / Jan 27, 2010

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Understanding multivariate pattern analysis for neuroimaging applications

ECG782: Multidimensional Digital Signal Processing

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Generalized Additive Models

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS249: ADVANCED DATA MINING

Evaluating Classifiers

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

Basic fmri Design and Analysis. Preprocessing

Classification: Linear Discriminant Functions

Directional Tuning in Single Unit Activity from the Monkey Motor Cortex

Random projection for non-gaussian mixture models

Preprocessing II: Between Subjects John Ashburner

Toward a rigorous statistical framework for brain mapping

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis

What is machine learning?

STATISTICS (STAT) Statistics (STAT) 1

Functional MRI in Clinical Research and Practice Preprocessing

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

BHMSMAfMRI User Manual

EPI Data Are Acquired Serially. EPI Data Are Acquired Serially 10/23/2011. Functional Connectivity Preprocessing. fmri Preprocessing

Transcription:

MultiVariate Bayesian (MVB) decoding of brain images Alexa Morcom Edinburgh SPM course 2015 With thanks to J. Daunizeau, K. Brodersen for slides

stimulus behaviour encoding of sensorial or cognitive state? experimental variables (design matrix) encoding neuronal measures (fmri data) X v decoding Y n, n v What if neuronal responses are distributed (over space)? Haxby et al., 2001

1 Introduction Overview of the talk 1.1 Lexicon 1.2 Decoding : so what? 1.3 Multivariate: so what? 1.4 Preliminary statistical considerations 2 Multivariate Bayesian decoding 2.1 From classical encoding to Bayesian decoding 2.2 Hierarchical priors on patterns 2.3 Probabilistic inference 3 Example 4 Summary

1 Introduction Overview of the talk 1.1 Lexicon 1.2 Decoding : so what? 1.3 Multivariate: so what? 1.4 Preliminary statistical considerations 2 Multivariate Bayesian decoding 2.1 From classical encoding to Bayesian decoding 2.2 Hierarchical priors on patterns 2.3 Probabilistic inference 3 Example 4 Summary

Lexicon the jargon to swallow 1 Encoding or decoding? An encoding model (or generative model) relates context (independent variable) to brain activity (dependent variable). X Y A decoding model (or recognition model) relates brain activity (independent variable) to context (dependent variable). Y X 2 Univariate or multivariate? In a univariate model, brain activity is the signal measured in one voxel. Y In a multivariate model, brain activity is the signal measured in many voxels (NB: decoding ill-posed problem). Y n, n v 3 Regression or classification? In a regression model, the dependent variable is continuous. In a classification model, the dependent variable is categorical (typically binary). X or X { 1, 1} n Y

Decoding : so what? The seminal approach: classification fmri time series 1 feature extraction e.g., voxels voxels trials classification accuracy e.g., leave-one-out 2 train: learn mapping Y X test: cross-validate Y train Y test X train A A B A B A A B A A A B A A A X test?

Decoding : so what? Reversing the X-Y mapping: target questions (1) X-Y mapping overall reliability (2) X-Y mapping spatial deployment Accuracy [%] 80% 100 % 50 % Left or right button? Truth or lie? Healthy or diseased? Classification task 55% (3) X-Y mapping temporal evolution (4) X-Y mapping: subtle issues Accuracy [%] 100 % Participant indicates decision functionally selective vs segregated representations degenerate (many-to-one) structure-function mappings 50 % Accuracy rises above chance Intra-trial time

Multivariate: so what? Well, we might need it. Multivariate approaches can reveal information jointly encoded by several voxels. This is because multivariate distance measures can take into account correlations among voxels

Multivariate: so what? Why we might need it: subvoxel processing. Multivariate approaches can exploit a sampling bias in voxelized images. Such subvoxel processing is unlikely to be detected by univariate methods. Boynton 2005 Nature Neuroscience

Preliminary statistical considerations lessons from the Neyman-Pearson lemma Do neuronal responses encode some sensorial or cognitive state of the subject? Null assumption: there is no dependency between Y and X So what? Well 0 : H p Y X p Y Neyman-Pearson lemma: the likelihood ratio (or Bayes factor) p Y X p X Y u p Y p X is the most powerful test of size a to test the null choose threshold u such that 1 2 All we have to do is compare a model that links Y to X with a model that does not. The link can be from X to Y or from Y to X. From the point of view of inferring a link exists, its direction is not important (but ).

Preliminary statistical considerations prediction and inference Some confusion about the roles of prediction and inference may arise from the use of classification accuracy to infer a significant relationship between X and Y. This is because «cross-validation» relies on the predictive density: Note: new new,, new new,, p X Y X Y p X Y p X Y d where are unknown parameters of the mapping Y X to check the «generalization error» of the inferred mapping. 1 2 The only situation that legitimately requires us to predict a new target is when we do not know it, e.g.: - brain-computer interface - automated diagnostic classification When used in the context of experimental neuroscience, standard classifiers provide suboptimal inference on the mapping Y X

Preliminary statistical considerations prediction and inference 1 2 * * Although MVB alone does not provide this

1 Introduction Overview of the talk 1.1 Lexicon 1.2 Decoding : so what? 1.3 Multivariate: so what? 1.4 Preliminary statistical considerations 2 Multivariate Bayesian decoding 2.1 From classical encoding to Bayesian decoding 2.2 Hierarchical priors on patterns 2.3 Probabilistic inference 3 Example 4 Summary

From classical encoding to Bayesian decoding MVB: inferring on the multivariate X-Y mapping Multivariate analyses in SPM are not implemented in terms of the classification schemes outlined in the previous section. Instead, SPM brings decoding into the conventional inference framework of hierarchical models and their inversion (c.f. Neyman-Pearson lemma). MVB can be used to address two questions: Overall significance of the X-Y mapping (as with classical SPM or classifiers) using probabilistic inference (model comparison, cross-validation) Inference on the form of the X-Y mapping (no other alternative), e.g. 1 1. Identify the spatial structure of the X-Y mapping (smooth, sparse, etc ) 2 2. Tell whether the X-Y mapping is degenerate (many-to-one regions-to-function). 3. Disambiguate between category-specific representations that are functionally selective (with overlap) and functionally segregated (without). 3

From classical encoding to Bayesian decoding reversing the standard GLM Encoding models X as a cause X: scalar psychological target variable Decoding models X as a consequence X Y: noisy measurements of A A: underlying neuronal activity in n voxels T: BOLD convolution of A/X A X A A X G: confounds (null space of c) scaled by unknown Y TA G Y TA G Y g ( ) : X Y TX G TX g ( ) :Y X X A Y G

Hierarchical priors on patterns spatial deployment of the X-Y mapping Decoding models are typically ill-posed: there is an infinite number of equally likely solutions. We therefore require constraints or priors to estimate the voxel weights. MVB specifies several alternative coding hypotheses in terms of empirical spatial priors on voxel weights. project onto spatial basis function set: U patterns cov( ) Ucov( ) U T null: sparse vectors: smooth vectors: compact vectors: support vectors: U U I U( x, x i UDV T j U RY ) T exp( RY T 1 2 xi x 2 2 ( j) ) (compact: previously singular, now SVD of the smooth vectors)

Hierarchical priors on patterns Expectation-Maximization and the greedy search In pattern space: A nested set of pattern weights with initially equal variance Select partitions with largest weights until no increase in free energy F Simplified EM algorithm (see Friston et al., 2008)

Probabilistic inference classical inference with cross-validation p-values from a standard leave-one-out scheme can t be used for inference (train and test data are not independent) Recall compact form for the decoding model: WX RY W RT R I GG target variable weighting matrix: temporal convolution + confounds removal residual forming matrix: confounds removal Use train/test k-fold data features that are linearly independent: ˆ ( k ) Y( k ) train (identify mapping) test (measure generalization error) Y R Y ( k) ( k) R( k ) I G( k ) G( k ) ( k ) ( k ) R I G G ( k ) ( k ) ( k ) G G I ( k ) Xˆ WX? Xˆ ( k ) ˆ ( k ) R( k ) Y ( k ) G( k ) G I I

1 Introduction Overview of the talk 1.1 Lexicon 1.2 Decoding : so what? 1.3 Multivariate: so what? 1.4 Preliminary statistical considerations 2 Multivariate Bayesian decoding 2.1 From classical encoding to Bayesian decoding 2.2 Hierarchical priors on patterns 2.3 Probabilistic inference 3 Example 4 Summary

Example finger tapping dataset SPM{T} : left > right fixation cross + pace (right) >> record response SPM{T} : right > left 400 events (100 left, 100 right, 100 left & right, 100 null) average ITI = 2 sec block design (10 trials/block) TR = 1.3 sec

Example MVB in SPM: decoding within a search volume target: TX Xc design matrix contrast confounds: G X I cc Gc 0 (confounds = null space of the contrast)

Example predicted responses from left & right motor cortices MVB-based predictions closely match the observed responses. But crucially, they don t perfectly match them. Perfect match would indicate overfitting. target and (MVB) predicted responses over scans target versus (MVB) predicted responses

Example pattern sparsity The highest model evidence is achieved by a model that recruits 7 partitions. The weights attributed to each voxel in the sphere are sparse and bidirectional. This suggests sparse coding (in pattern space) log-evidence of partitions (relative to null) distribution of estimated weights

Example model comparison illustration The best model corresponds to a sparse representation of motion ; as one would expect from functional segregation. Better evidence for spatially sparse than smooth (clustered) coding log-evidence of X-Y bilateral mappings: effect of spatial deployment Sparse spatial model also better than compact (SVD on smooth model) or support models see also Morcom & Friston (2012), Neuroimage

Example model comparison illustration Model comparison between regions (given optimal sparse model of spatial deployment) log-evidence of X-Y sparse mappings: effect of lateralization The right-lateralized model is better than the left-lateralized model The bilateral model is better than the right-lateralized model Consistent with non-redundant (joint) coding of finger tapping

Example cross-validation : k-fold scheme k = 8 p-value < 0.0001 classification accuracy = 65.8% R-squared = 20.7% absolute correlation among k-fold feature weights test predictions versus test k-fold features maximum intensity projection: k P 0 Y( k )

Example episodic memory, MTL Distinct spatial patterns for overlapping memories only in hippocampus CA3, consistent w/ pattern separation Overlapping episodic memories Same events, different contexts (scenes) Use MVB to fit sparse patterns, then measure their correlation over voxels in MTL sub-regions A form of representational similarity analysis Chadwick, Bonnici, Maguire (2014, PNAS)

1 Introduction Overview of the talk 1.1 Lexicon 1.2 Decoding : so what? 1.3 Multivariate: so what? 1.4 Preliminary statistical considerations 2 Multivariate Bayesian decoding 2.1 From classical encoding to Bayesian decoding 2.2 Hierarchical priors on patterns 2.3 Probabilistic inference 3 Example 4 Summary

Summary 11. Inference on the form of the X-Y mapping rests on model comparison, using the marginal likelihood of competing models. The marginal likelihood derives from the specification of a generative model prescribing the form of the joint density over observations (X,Y) and model parameters (). 22. Multivariate models can map from experimental variables (X) to brain responses (Y) or from Y to X. In the latter case (i.e., decoding), identifying the mapping is an ill-posed problem, which is resolved with appropriate constraints or priors on model parameters. These constraints are part of the model and can be evaluated using model comparison. 33. Cross-validation is not necessary for decoding brain activity but generalization error is a proxy for testing whether the observed X-Y mapping is unlikely to have occured by chance. This can be useful when the null distribution of the likelihood ratio (i.e. Bayes factor) is not evaluated easily.

References Chadwick, M. J., Bonnici, H. M. and Maguire, E. a. (2014) CA3 size predicts the precision of memory recall, Proceedings of the National Academy of Sciences, 111(29), pp. 10720 5. doi: 10.1073/pnas.1319641111. Friston, K., Chu, C., Mouruo-Miranda, J., Hulme, O., Rees, G., Penny, W. and Ashburner, J. (2008) Bayesian decoding of brain images, NeuroImage, 39(1), pp. 181 205. doi: 10.1016/j.neuroimage.2007.08.013. Friston, K., Mattout, J., Trujillo-Barreto, N., Ashburner, J. and Penny, W. (2007) Variational free energy and the Laplace approximation, NeuroImage, 34(1), pp. 220 234. doi: 10.1016/j.neuroimage.2006.08.035. Morcom, A. M. and Friston, K. J. (2012) Decoding episodic memory in ageing: A Bayesian analysis of activity patterns predicting memory, NeuroImage, 59(2). doi: 10.1016/j.neuroimage.2011.08.071.