Introductory Concepts for Voxel-Based Statistical Analysis

Introductory Concepts for Voxel-Based Statistical Analysis John Kornak University of California, San Francisco Department of Radiology and Biomedical Imaging Department of Epidemiology and Biostatistics ADVANCED STATISTICAL CONCEPTS FOR MULTIMODAL MRI: THEORY AND APPLICATIONS June 19, 2010

Acknowledgement / Disclaimer Many of the slides in this lecture have been adapted from presentations available on the SPM web site.

Overview 1. Motivation 2. Linear modeling 3. Multiple comparison correction 4. Multivariate methods

Motivation Imaging data statistical methods to look for regional effects Tissue differences between groups or over time VBM, TBM (voxel/tensor-based morphometry) PET (positron emmission tomography), fmri (functional MRI) determine activation in the brain due to thought, stimulus or task Diffusion (DWI, DTI, tractography), Bone mineral density etc. etc.

image data kernel design matrix parameter estimates realignment & motion correction smoothing Linear Model model fitting statistic image thresholding & multiple comparisons normalisation anatomical reference Statistical Parametric Map (test statistics) Corrected thresholds & p-values

Software SPM PET, fmri, VBM and TBM, EEG/MEG (http://www.fil.ion.ucl.uk/spm/ needs Matlab) FSL fmri primarily + DTI (http://www.fmrib.ox.ac.uk/fsl/) R AnalyzeFMRI package + linear models in general (http://www.r-project.org/ and then go to your nearest CRAN mirror)

Part 1 Linear Modeling

Definitions Univariate response variable y ij subject i - at voxel j in Covariates (x i1, x i2,..., x ik ) = (variables of interest and nuisance variables) Continuous covariates: e.g. age, blood pressure, stimulus intensity, etc., (random or controlled) Factors: e.g. diagnosis, gender, drinking level, experimental condition, (low, medium, high) etc. x i T Complete data is: y ij,x T i ;i 1,...,n; j 1,...,m (n subjects, m voxels)

The (General) Linear Model The linear model is applied at each individual voxel: A linear model takes the form: y ij 1 j x i2 2 j x i3 3 j... x im mj ij e.g. y ij mean, j x i,age age, j x i,gender gender, j... x i,diagnosis diagnosis, j ij ij ~ N(0, 2 ), i.i.d. i 1,...,n j 1,...,m i.i.d. = independently and identically distributed

Eg. Hippocampal Volume HCV ~ Age + Diagnosis + Age*Diagnosis Diagnosis can be normal control (NC) or Alzheimer s disease (AD)

Eg. Hippocampal Volume Structural T1 weighted MRI s Volume measures at each voxel in HC (Voxel or Deformation Based Morphometry) Volume measure = response for each subject Disease status encoded 1 for AD and 0 for NC (the term) x diagnosis

y ij 1 x i,age age, j x i,diag. diag., j x i.age x i,diag. inter, j ij HCV Case 1 age, j 0, diag., j 0, inter, j 0 age

y ij 1 x i,age age, j x i,diag. diag., j x i.age x i,diag. inter, j ij HCV Case 2 age, j 0, diag., j 0, inter, j 0 age

y ij 1 x i,age age, j x i,diag. diag., j x i.age x i,diag. inter, j ij HCV Case 3 age, j 0, diag., j 0, inter, j 0 NC AD age

y ij 1 x i,age age, j x i,diag. diag., j x i.age x i,diag. inter, j ij HCV Case 4 age, j 0, diag., j 0, inter, j 0 NC AD age

F-test for General Linear Hypothesis Goal: detect voxels with significant disease or condition effects y X T N n 0, c.f. simpler model 2 I n H 0 : A c This is the General Linear Hypothesis and uses F-test (based on ratio of models sum of squares residuals) Can answer e.g., does disease status affect HCV? Or does an age by disease interaction exist? Or is condition A equivalent to condition B in an fmri expmt?

Parameter Estimates Same model for all voxels Different parameters for each voxel beta_0001.img ˆ 0.83 0.16 2.98 beta_0002.img ˆ 0.03 0.06 2.04 ˆ 0.68 0.82 2.17 beta_0003.img

Parameter Estimates Same model for all voxels Different parameters for each voxel beta_0001.img Note: This approach estimates parameters at each voxel for conditions that affect the brain ˆ 0.83 0.16 2.98 beta_0002.img But what if the brain affects the condition - e.g. loss of tissue affects cognitive ability - is this still the right model? Then there is only 1 outcome and the brain voxels form a set of predictors. ˆ ˆ 0.68 0.82 2.17 0.03 0.06 2.04 beta_0003.img

Summary of Part 1 Linear model fitted separately to each voxel to estimate effects of interest Hypothesis test statistics are obtained at each voxel and combined to form a statistical parametric map (SPM)

Part 2 Multiple Comparison Correction

Multiple Comparison Problem Each voxel obtains a test statistic from the linear model, e.g. t or F Forms statistical maps of the statistics (statistical parametric maps, SPMs) E.g., which of 100,000 voxels are significant? =0.05 5,000 false positive voxels

How can we determine a sensible threshold level? Assessing Statistic Images Where s the signal or change? High Threshold Med. Threshold Low Threshold t > 5.5 t > 3.5 t > 0.5 Good Specificity Poor Power (risk of false negatives) Poor Specificity (risk of false positives) Good Power

Multiple Comparison Solutions: Measuring False Positives 1. Familywise Error Rate (FWER) Familywise Error Existence of one or more false positives 2. False Discovery Rate (FDR) FDR = E(V/R) R voxels declared active, V falsely so Realized false discovery rate: V/R 3. Permutation Testing

FWER: Bonferroni Correction FWE, α, for N independent voxels is α = Nv (v = voxelwise error rate) To control FWE set v = α / N Independent Voxels Spatially Correlated Voxels Bonferroni is too conservative for brain images

FWER: Random Field Theory Euler Characteristic u Topological Measure #blobs - #holes At high thresholds = blobs Threshold Random Field See description at http://imaging.mrc-cbu.cam.ac.uk/imaging/principlesrandomfields Suprathreshold Sets

p-value 0 1 FDR: Benjamini & Hochberg Procedure Select desired limit q on FDR Order p-values, p (1) p (2)... p (V) Let r be largest i such that p (i) i/v q/c(v) Reject all hypotheses corresponding to p (1),..., p (r) NB, no spatial consideration Journal of the Royal Statistical Society Series B (1995) 57:289-300 i/v i/v p (i) q/c(v) 0 1

FDR v FWER (GRF) Illustration Noise Signal Signal+Noise

Control of Per Comparison Rate at 10% 11.3% 11.3% 12.5% 10.8% 11.5% 10.0% 10.7% 11.2% 10.2% 9.5% Percentage of Null Pixels that are False Positives Control of Familywise Error Rate at 10% Occurrence of Familywise Error FWE Control of False Discovery Rate at 10% 6.7% 10.4% 14.9% 9.3% 16.2% 13.8% 14.0% 10.5% 12.2% 8.7% Percentage of Observed Above Threshold Pixels that are False Positives

Summary of Part 2 Massive multiple comparison problem Spatial correlation Multiple approaches (FWER, FDR, NP) and levels (voxel, cluster, set) for multiple comparison correction

Part 3 Multivariate Methods

PCA and ICA Principal Components Analysis and Independent Components Analysis Methods to succinctly summarize highdimensional data sets (transform many variables to a few variables) Many variables are summarized in terms of just a few new variables (the components) Projection of data: high- to low-dimension

PCA Finds a series of projections/principal Components (PCs) each with maximal variability The first PC explains as much of the variability in the full data set as possible The second PC explains as much variability as possible after the variability from the first (PC) has been removed (uncorrelated/orthogonal to the first) Constraint: each PC is a linear combination of the original variables

Linear Combinations A linear combination of N variables is defined as: x, x,..., x 1 2 N a x a x... a x 1 1 2 2 N N x, x,..., xn where 1 2 could be voxel intensities or points in time

2D Toy Example Var 1 Var 2

2D Toy Example 1 st PC Var 1 Var 2

2D Toy Example Minimize sum of squares 1 st PC Var 1 Var 2

2D Toy Example 1 st PC Var 1 2 nd PC Var 2

Note: that there is an issue of scaling Var 1 2D Toy Example 1 st PC 2 nd PC Var 2

Voxel-Based Analysis Applications Consider a set of images e.g. repeated brain images in time for a single subject or single images of many subjects The first PC (image) may explain the effect of group or time The second PC image may explain between subject variability of brain shape It is hoped that each of the PCs is meaningful

Independent Components Analysis (ICA) ICA differs from PCA for technical reasons (but can produce very different results) In PCA the PCs have to be uncorrelated however, uncorrelated does not imply independent independence is a stronger requirement (if data are Gaussian then uncorrelated does imply independent)

Uncorrelated but not independent Var 1 Var 2

Independent Components Analysis (ICA) ICA demands that the components are (maximally) independent of each other It provides a different decomposition of the full data than does PCA (i.e. it gives a different set of linear combinations / components) ICA is computationally more challenging and does not have an inherent ordering of components

Summary of Part 3 PCA & ICA attempt to summarize highdimensional datasets in terms of just a few components PCA seeks linear combinations of the data matrix that are of high variance and uncorrelated ICA seeks linear combinations that are independent PCA & ICA both find components of variability that can be explained by linear combinations of input variables