Pattern Recognition for Neuroimaging Data

Similar documents
Pa#ern Recogni-on for Neuroimaging Toolbox

Multi-voxel pattern analysis: Decoding Mental States from fmri Activity Patterns

Pattern Recognition for Neuroimaging Toolbox: PRoNTo

Interpreting predictive models in terms of anatomically labelled regions

MULTIVARIATE ANALYSES WITH fmri DATA

MultiVariate Bayesian (MVB) decoding of brain images

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

Multivariate Pattern Classification. Thomas Wolbers Space and Aging Laboratory Centre for Cognitive and Neural Systems

Understanding multivariate pattern analysis for neuroimaging applications

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Multivariate pattern classification

Correction for multiple comparisons. Cyril Pernet, PhD SBIRC/SINAPSE University of Edinburgh

Decoding analyses. Neuroimaging: Pattern Analysis 2017

LECTURE 5: DUAL PROBLEMS AND KERNELS. * Most of the slides in this lecture are from

All lecture slides will be available at CSC2515_Winter15.html

Topics in Machine Learning

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

CS 229 Midterm Review

Multivariate analyses & decoding

CS 229 Final Project Report Learning to Decode Cognitive States of Rat using Functional Magnetic Resonance Imaging Time Series

Applied Statistics for Neuroscientists Part IIa: Machine Learning

7/15/2016 ARE YOUR ANALYSES TOO WHY IS YOUR ANALYSIS PARAMETRIC? PARAMETRIC? That s not Normal!

Voxel selection algorithms for fmri

Support vector machines

Machine Learning for. Artem Lind & Aleskandr Tkachenko

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Can we interpret weight maps in terms of cognitive/clinical neuroscience? Jessica Schrouff

Kernel Methods & Support Vector Machines

Introduction to Support Vector Machines

Support Vector Machines

Evaluating Classifiers

Network Traffic Measurements and Analysis

Quick Manual for Sparse Logistic Regression ToolBox ver1.2.1

Supervised vs unsupervised clustering

Machine Learning Practical NITP Summer Course Pamela K. Douglas UCLA Semel Institute

Data Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)

SVM: Multiclass and Structured Prediction. Bin Zhao

Generative and discriminative classification techniques

Support Vector Machines + Classification for IR

Support Vector Machines

Machine Learning: Think Big and Parallel

Information Management course

CS570: Introduction to Data Mining

Machine Learning in Biology

Lecture 9: Support Vector Machines

Bayesian Inference in fmri Will Penny

x 1 SpaceNet: Multivariate brain decoding and segmentation Elvis DOHMATOB (Joint work with: M. EICKENBERG, B. THIRION, & G. VAROQUAUX) 75 x y=20

Univariate Margin Tree

Introduction to Neuroimaging Janaina Mourao-Miranda

Neural Networks and Deep Learning

Supervised Learning Classification Algorithms Comparison

I211: Information infrastructure II

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning

CS249: ADVANCED DATA MINING

6 Model selection and kernels

Machine Learning Classifiers and Boosting

Stat 602X Exam 2 Spring 2011

CS5670: Computer Vision

Introduction to Machine Learning

Instance-based Learning

A Systematic Overview of Data Mining Algorithms

SUPPORT VECTOR MACHINES

Classification of Whole Brain fmri Activation Patterns. Serdar Kemal Balcı

Nonparametric Methods Recap

Content-based image and video analysis. Machine learning

Support vector machine (II): non-linear SVM. LING 572 Fei Xia

08 An Introduction to Dense Continuous Robotic Mapping

Multi-Voxel Pattern Analysis (MVPA) for fmri

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

Methods for data preprocessing

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

IMAGE ANALYSIS, CLASSIFICATION, and CHANGE DETECTION in REMOTE SENSING

Pixels to Voxels: Modeling Visual Representation in the Human Brain

Support Vector Machines

Machine Learning for NLP

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Support Vector Machines.

Data Mining in Bioinformatics Day 1: Classification

Discriminative classifiers for image recognition

Lecture Linear Support Vector Machines

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Lecture 10: Support Vector Machines and their Applications

Applying Supervised Learning

Second revision: Supplementary Material Linking brain-wide multivoxel activation patterns to behaviour: examples from language and math

Multiple-Choice Questionnaire Group C

HW2 due on Thursday. Face Recognition: Dimensionality Reduction. Biometrics CSE 190 Lecture 11. Perceptron Revisited: Linear Separators

Lecture 25: Review I

Notes and Announcements

Quality Checking an fmri Group Result (art_groupcheck)

Introduction to Automated Text Analysis. bit.ly/poir599

Data Visualisation in SPM: An introduction

Evaluating Classifiers

Data Visualisation in SPM: An introduction

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

Supervised Learning for Image Segmentation

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016

Medical Image Analysis

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS

Transcription:

Pattern Recognition for Neuroimaging Data Edinburgh, SPM course April 2013 C. Phillips, Cyclotron Research Centre, ULg, Belgium http://www.cyclotron.ulg.ac.be

Overview Introduction Univariate & multivariate approaches Data representation Pattern Recognition Machine learning Validation & inference Weight maps & feature selection fmri application Multiclass problem Conclusion & PRoNTo

Overview Introduction Univariate & multivariate approaches Data representation Pattern Recognition Machine learning Validation & inference Weight maps & feature selection fmri application Multiclass problem Conclusion & PRoNTo

Introduction fmri time series = 4D image = time series of 3D fmri s = 3D array of time series.

BOLD signal Univariate vs. multivariate Standard univariate approach (SPM) Standard Statistical Analysis (encoding) Input Time... Voxel-wise GLM model estimation Independent statistical test at each voxel Correction for multiple comparisons Output Univariate statistical Parametric map Find the mapping g from explanatory variable X to observed data Y g: X Y

Univariate vs. multivariate Multivariate approach, aka. pattern recognition Input Volumes from task 1 Volumes from task 2 Training Phase Output Classifiers weights or discrimination map New example Test Phase Prediction: task 1 or task 2 Find the mapping h from observed data Y to explanatory variable X h: Y X

Neuroimaging data feature vector or data point Data dimensions 3D brain image dimensionality of a data point = #voxels considered number of data point = #scans/images considered Note that #voxels >> #scans! ill posed problem

Advantages of pattern recognition Accounts for the spatial correlation of the data (multivariate aspect) images are multivariate by nature. can yield greater sensitivity than conventional (univariate) analysis. Enable classification/prediction of individual subjects Mind-reading or decoding applications Clinical application Haynes & Rees, 2006

Pattern recognition framework Input (brain scans) X 1 X 2 X 3 No mathematical model available Machine Learning Methodology Output (control/patient) y 1 y 2 y 3 Computer-based procedures that learn a function from a series of examples Training Examples: (X 1,y 1 ),...,(X s,y s ) Learning/Training Phase Generate a function or classifier f such that f(x i ) y i f Test Example X i Testing Phase Prediction f(x i ) = y i

Overview Introduction Univariate & multivariate approaches Data representation Pattern Recognition Machine learning Validation & inference Weight maps & feature selection fmri application Multiclass problem Conclusion & PRoNTo

voxel 2 Classification example L R task 1 task 2 task 1 task 2 task? 4 2 volume in t 1 volume in t 2 volume in t 3 volume in t 4 Volume with unknown label Different classifiers will compute different hyperplanes! 2 volume in t 2 volume in t 3 volume in t 4 volume in t 1 w 4 voxel 1 Note: task1/2 ~ disease/controle

Neuroimaging data feature vector or data point Brain volume Problem:1000 s of features vs. 10 s of data points Possible solutions to dimensionality problem: Feature selection strategies (e.g. ROIS, select only activated voxels) (Searchlight) Kernel Methods

Kernel approaches Mathematical trick! powerful and unified framework (e.g. classification & regression) Consist of two parts: - build the kernel matrix (mapping into the feature space) - train using the kernel matrix (designed to discover linear patterns in the feature space) Advantages: - computational shortcut represent linear patterns efficiently in high dimensional space. - Using the dual representation with proper regularization efficient solution of ill-conditioned problems. Examples Support Vector Machine (SVM), Gaussian Processes (GP), Kernel Ridge Regression (KRR),

Kernel matrix Kernel matrix = similarity measure Brain scan 2 Dot product = (4*-2)+(1*3) = -5 Brain scan 4 4 1-2 3 The kernel function 2 patterns x and x* a real number characterizing their similarity (~distance measure). simple similarity measure = a dot product linear kernel.

Linear classifier hyperplanes through the feature space parameterized by a weight vector w and a bias term b. weight vector w = linear combination of training examples x i (where i = 1,,N and N is the number of training examples) Find the α i!!! w = N å i =1 a i x i

Linear classifier prediction General equation: making predictions for a test example x * with kernel methods w = N å i =1 a i x i kernel definition f (x * ) = w x * + b f (x * ) = f (x * ) = N å i =1 N å i =1 a i x i x * + b a i K(x i,x * ) + b Primal representation Dual representation f(x * ) = signed distance to boundary (classification) predicted score (regression)

Support Vector Machine SVM = maximum margin classifier (w x i + b) =-1 (w x i + b) =+1 (w x i + b) < 0 (w x i + b) > 0 w w = N å i =1 a i x i Support vectors have α i 0 Data: <x i,y i >, i=1,..,n Observations: x i R d Labels: y i {-1,+1}

Illustrative example: Classifiers as decision functions Examples of class 1 Voxel 1 Voxel 2 Voxel 1 Voxel 2 Weight vector or Discrimination map Examples of class 2 Training w 1 = +5 w 2 = -3 Voxel 1 Voxel 2 Voxel 1 Voxel 2 New example v 1 = 0.5 v 2 = 0.8 Testing f(x) = (w 1 *v 1 +w 2 *v 2 )+b = (+5*0.5-3*0.8)+0 = 0.1 Positive value Class 1

SVM vs. GP SVM Hard binary classification simple & efficient, quick calculation but NO grading in output {-1, 1} Gaussian Processes probabilistic model more complicated, slower calculation but returns a probability [0 1] can be multiclass

Overview Introduction Univariate & multivariate approaches Data representation Pattern Recognition Machine learning Validation & inference Weight maps & feature selection fmri application Multiclass problem Conclusion & PRoNTo

Test set Training set Validation principle Samples label variables: var 1 var 2 var 3 var m 1 2 3 i 1-1 -1 1 Trained classifier i+1 i+2 1 1 1-1 n -1-1 True label Accuracy evaluation Predicted label

M-fold cross-validation Split data in 2 sets: train & test evaluation on 1 fold Rotate partition and repeat evaluations on M folds Applies to scans/events/blocks/subjects/ Leave-one-out (LOO) approach

Confusion matrix & accuracy Confusion matrix = summary table Accuracy estimation Class 0 accuracy, p 0 = A/(A+B) Class 1 accuracy, p 1 = D/(C+D) Accuracy, p = (A+D)/(A+B+C+D) Other criteria Positive Predictive Value, PPV = D/(B+D) Negative Predictive Value, NPV = A/(A+C)

Accuracy & Dataset balance Watch out if #samples/class are different! Example: Good overall accuracy (72%) but Majority class (N 1 = 80), excellent accuracy (90%) Minority class (N 2 = 20), poor accuracy (0%) Good practice: Report class accuracies [p 0, p 1,, p C ] balanced accuracy p bal = (p 0 + p 1 + + p C )/C

Regression MSE LOO error in one fold Across all LOO folds Out-of-sample mean squared error (MSE) Other measure: Correlation between predictions (across folds!) and true targets

Inference by permutation testing H 0 : class labels are non-informative Test statistic = CV accuracy Estimate distribution of test statistic under H0 Random permutation of labels Estimate CV accuracy Repeat M times Calculate p-value as

Overview Introduction Univariate & multivariate approaches Data representation Pattern Recognition Machine learning Validation & inference Weight maps & feature selection fmri application Multiclass problem Conclusion & PRoNTo

voxel 1 Weight vector task1 Weight vector interpretation weight (or discrimination) image! how important each voxel is for which class it votes (mean centred data & b=0) 1 4 task2 0.5 0.3 task1 3 2.5 2 H: Hyperplane Weight vector W = [0.45 0.89] b = -2.8 2 3 task2 1.5 1 1.5 1 0.45 0.89 task1 2.5 4.5 task2 0.5 0 w 0 1 2 3 4 5 voxel 2 2 1

Example of masks Linear machine Weight map 29

Feature selection 1 sample image 1 predicted value use ALL the voxels NO thresholding of weight allowed! Feature selection: a priori mask a priori filtering recursive feature elimination/addition nested cross-validation (MUST be independent from test data!)

Overview Introduction Univariate & multivariate approaches Data representation Pattern Recognition Machine learning Validation & inference Weight maps & feature selection fmri application Multiclass problem Conclusion & PRoNTo

Level of inference fmri designs within subject FFX with SPM decode subject s brain state between subjects RFX with SPM classify groups, or regress subjects parameter

Design Between subjects 2 groups: group A vs. group B 1 group: 2 conditions per subject Extract 1 (or 2) summary image(s) per subject, and classify Leave-one-out (LOO) cross-validation: Leave one subject out (LOSO) Leave one subject per group out (LOSGO) Note: this works for any type of image

Design: Within subject Block or event-related design Accounting for haemodynamic function Use single scans C1 C1 C1 BL BL BL C2 C2 C2 BL BL BL Data Matrix = voxels Single volumes

Design: Within subject Block or event-related design Accounting for haemodynamic function

Design: Within subject Block or event-related design Accounting for haemodynamic function Averaging/deconvolution C1 C1 C1 BL BL BL C2 C2 C2 BL BL BL Data Matrix = voxels Mean of volumes or betas How to? Average scans over blocks/events Parameter estimate from the GLM with 1 regressor per block/event

Design: Within subject Block or event-related design Accounting for haemodynamic function Leave-one-out (LOO) cross-validation: Leave one session/run out Leave one block/event out (danger of dependent data!!!)

Overview Introduction Univariate & multivariate approaches Data representation Pattern Recognition Machine learning Validation & inference Weight maps & feature selection fmri application Multiclass problem Conclusion & PRoNTo

Multiclass problem Multiclass machine C1 Binary machine & one-vs.-others C1 Binary machine & one-vs.-one C1 C2 C3 C2 C3 C2 C3 ECOC SVM codewords C1-C2 C1-C3 C2-C3 L C1 1 1 0 3 C2-1 0 1 2 C3 0-1 -1 1 Example -1-1 -1 C3 Error-Correcting Output Coding (ECOC) approach

Overview Introduction Univariate & multivariate approaches Data representation Pattern Recognition Machine learning Validation & inference Weight maps & feature selection fmri application Multiclass problem Conclusion & PRoNTo

Conclusions Key points: More sensitivity (~like omnibus test with SPM) NO local (voxel/blob) inference CANNOT report coordinates nor thresholded weight map Require cross-validation (split in train/test sets) report accuracy/ppv (or MSE) MUST assess significance of accuracy permutation approach

PRoNTo Pattern Recognition for Neuroimaging Toolbox, aka. PRoNTo : http://www.mlnl.cs.ucl.ac.uk/pronto/ with references, manual, demo data, course, etc. Paper: http://dx.doi.org/10.1007/s12021-013-9178-1

Thank you for your attention! Any question? Thanks to the PRoNTo Team for the borrowed slides.