Multiple Testing and Thresholding

Similar documents
Multiple Testing and Thresholding

Multiple Testing and Thresholding

Controlling for multiple comparisons in imaging analysis. Wednesday, Lecture 2 Jeanette Mumford University of Wisconsin - Madison

Controlling for mul2ple comparisons in imaging analysis. Where we re going. Where we re going 8/15/16

Controlling for mul-ple comparisons in imaging analysis. Wednesday, Lecture 2 Jeane:e Mumford University of Wisconsin - Madison

Contents. comparison. Multiple comparison problem. Recap & Introduction Inference & multiple. «Take home» message. DISCOS SPM course, CRC, Liège, 2009

New and best-practice approaches to thresholding

Extending the GLM. Outline. Mixed effects motivation Evaluating mixed effects methods Three methods. Conclusions. Overview

Statistical Methods in functional MRI. Standard Analysis. Data Processing Pipeline. Multiple Comparisons Problem. Multiple Comparisons Problem

Multiple comparisons problem and solutions

Introductory Concepts for Voxel-Based Statistical Analysis

Medical Image Analysis

Correction for multiple comparisons. Cyril Pernet, PhD SBIRC/SINAPSE University of Edinburgh

Linear Models in Medical Imaging. John Kornak MI square February 22, 2011

Linear Models in Medical Imaging. John Kornak MI square February 19, 2013

Linear Models in Medical Imaging. John Kornak MI square February 23, 2010

7/15/2016 ARE YOUR ANALYSES TOO WHY IS YOUR ANALYSIS PARAMETRIC? PARAMETRIC? That s not Normal!

Nonparametric Permutation Tests For Functional Neuroimaging: APrimer with Examples

Linear Models in Medical Imaging. John Kornak MI square February 21, 2012

Statistical inference on images

SnPM is an SPM toolbox developed by Andrew Holmes & Tom Nichols

Advances in FDR for fmri -p.1

Introduction to Neuroimaging Janaina Mourao-Miranda

FMRI Pre-Processing and Model- Based Statistics

Power analysis. Wednesday, Lecture 3 Jeanette Mumford University of Wisconsin - Madison

The Effect of Correlation and Error Rate Specification on Thresholding Methods in fmri Analysis

A Non-Parametric Approach

CHAPTER 2. Morphometry on rodent brains. A.E.H. Scheenstra J. Dijkstra L. van der Weerd

Statistical Methods in functional MRI. False Discovery Rate. Issues with FWER. Lecture 7.2: Multiple Comparisons ( ) 04/25/13

Bias in Resampling-Based Thresholding of Statistical Maps in fmri

Journal of Articles in Support of The Null Hypothesis

HST.583 Functional Magnetic Resonance Imaging: Data Acquisition and Analysis Fall 2006

An improved theoretical P-value for SPMs based on discrete local maxima

Controlling the Familywise Error Rate in Functional Neuroimaging: A Comparative Review

Group Sta*s*cs in MEG/EEG

Introduction to fmri. Pre-processing

First-level fmri modeling

EMPIRICALLY INVESTIGATING THE STATISTICAL VALIDITY OF SPM, FSL AND AFNI FOR SINGLE SUBJECT FMRI ANALYSIS

Fmri Spatial Processing

Multiple Comparisons of Treatments vs. a Control (Simulation)

Network statistics and thresholding

Basic fmri Design and Analysis. Preprocessing

Functional MRI in Clinical Research and Practice Preprocessing

SPM8 for Basic and Clinical Investigators. Preprocessing. fmri Preprocessing

fmri Basics: Single Subject Analysis

Cluster failure: Why fmri inferences for spatial extent have inflated false positive rates

Fields. J-B. Poline A.P. Holmes K.J. Worsley K.J. Friston. 1 Introduction 2. 2 Testing for the intensity of an activation in SPMs 3

5. Feature Extraction from Images

Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques

Chapter 6: Linear Model Selection and Regularization

The organization of the human cerebral cortex estimated by intrinsic functional connectivity

EPI Data Are Acquired Serially. EPI Data Are Acquired Serially 10/23/2011. Functional Connectivity Preprocessing. fmri Preprocessing

Pair-Wise Multiple Comparisons (Simulation)

Robust Shape Retrieval Using Maximum Likelihood Theory

Function-Structure Integration in FreeSurfer

Zurich SPM Course Voxel-Based Morphometry. Ged Ridgway (Oxford & UCL) With thanks to John Ashburner and the FIL Methods Group

R-Square Coeff Var Root MSE y Mean

Bayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis

Basic Introduction to Data Analysis. Block Design Demonstration. Robert Savoy

Statistical Analysis of Neuroimaging Data. Phebe Kemmer BIOS 516 Sept 24, 2015

Detecting Changes In Non-Isotropic Images

Voxel-Based Morphometry & DARTEL. Ged Ridgway, London With thanks to John Ashburner and the FIL Methods Group

Pattern Recognition for Neuroimaging Data

SPM Introduction. SPM : Overview. SPM: Preprocessing SPM! SPM: Preprocessing. Scott Peltier. FMRI Laboratory University of Michigan

SPM Introduction SPM! Scott Peltier. FMRI Laboratory University of Michigan. Software to perform computation, manipulation and display of imaging data

Group (Level 2) fmri Data Analysis - Lab 4

Feature Detectors and Descriptors: Corners, Lines, etc.

arxiv: v1 [stat.ap] 1 Jun 2016

Surface-based Analysis: Inter-subject Registration and Smoothing

Edge Detection (with a sidelight introduction to linear, associative operators). Images

Supplementary methods

SPM8 for Basic and Clinical Investigators. Preprocessing

FUNCTIONAL magnetic resonance imaging (fmri) is

Lab 5: Multi Subject fmri

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

ALE Meta-Analysis: Controlling the False Discovery Rate and Performing Statistical Contrasts

Single Subject Demo Data Instructions 1) click "New" and answer "No" to the "spatially preprocess" question.

NA-MIC National Alliance for Medical Image Computing fmri Data Analysis

Naïve Bayes, Gaussian Distributions, Practical Applications

08 An Introduction to Dense Continuous Robotic Mapping

Digital Image Processing COSC 6380/4393

Wavelet-Based Statistical Analysis in Functional Neuroimaging

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

Bayesian Inference in fmri Will Penny

Cognitive States Detection in fmri Data Analysis using incremental PCA

1.7.1 Laplacian Smoothing

fmri Analysis Sackler Ins2tute 2011

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

EE795: Computer Vision and Intelligent Systems

Figure 1. Comparison of the frequency of centrality values for central London and our region of Soho studied. The comparison shows that Soho falls

Data Visualisation in SPM: An introduction

Data Visualisation in SPM: An introduction

9.2 Types of Errors in Hypothesis testing

Wavelet Applications. Texture analysis&synthesis. Gloria Menegaz 1

the PyHRF package P. Ciuciu1,2 and T. Vincent1,2 Methods meeting at Neurospin 1: CEA/NeuroSpin/LNAO

Improved Non-Local Means Algorithm Based on Dimensionality Reduction

- with application to cluster and significance analysis

Multimedia Computing: Algorithms, Systems, and Applications: Edge Detection

Wavelets and functional MRI

Correction CORRECTION. PNAS August 16, 2016 vol. 113 no. 33 E4929.

Transcription:

Multiple Testing and Thresholding NITP, 2009 Thanks for the slides Tom Nichols!

Overview Multiple Testing Problem Which of my 100,000 voxels are active? Two methods for controlling false positives Familywise Error Rate Controlling the chance of any false positives Bonferroni, Random Field and Nonparametric Methods False Discovery Rate Controlling the fraction of false positives

Overview Multiple Testing Problem Which of my 100,000 voxels are active? Two methods for controlling false positives Familywise Error Rate Controlling the chance of any false positives Bonferroni, Random Field and Nonparametric Methods False Discovery Rate Controlling the fraction of false positives

Hypothesis Testing Review Establish H 0 : no activation in voxel i Establish significance level Derive threshold Calculate test statistic t P-value Decision: Reject or accept H 0

Hypothesis Testing Review Establish H 0 : no activation in voxel i Establish significance level Derive threshold Calculate test statistic t P-value Decision: Reject or accept H 0

Hypothesis Testing in fmri Mass Univariate Modeling Fit a separate model for each voxel Look at images of statistics Apply Threshold

Assessing Statistic Images What threshold will show us signal? High Threshold Med. Threshold Low Threshold t > 5.5 t > 3.5 t > 0.5 Good Specificity Poor Power (risk of false negatives) Poor Specificity (risk of false positives) Good Power

Voxel-level Inference Retain voxels above α-level threshold u α Gives best spatial specificity The null hyp. at a single voxel can be rejected Statistic values space

Voxel-level Inference Retain voxels above α-level threshold u α Gives best spatial specificity The null hyp. at a single voxel can be rejected u α space

Voxel-level Inference Retain voxels above α-level threshold u α Gives best spatial specificity The null hyp. at a single voxel can be rejected u α space Significant Voxels No significant Voxels

Cluster-level Inference Two step-process Define clusters by arbitrary threshold u clus u clus space

Cluster-level Inference Two step-process Define clusters by arbitrary threshold u clus Retain clusters larger than α-level threshold k α u clus space Cluster not significant k α k α Cluster significant

Cluster-level Inference Typically better sensitivity Worse spatial specificity The null hyp. of entire cluster is rejected Only means that one or more of voxels in cluster active u clus space Cluster not significant k α k α Cluster significant

Voxel-wise Inference & Multiple Testing Problem (MTP) Standard Hypothesis Test Controls Type I error of each test, at say 5% But what if I have 100,000 voxels? 5,000 false positives on average! Must control false positive rate What false positive rate? Chance of 1 or more Type I errors? Proportion of Type I errors?

Overview Multiple Testing Problem Which of my 100,000 voxels are active? Two methods for controlling false positives Familywise Error Rate Controlling the chance of any false positives Bonferroni, Random Field and Nonparametric Methods False Discovery Rate Controlling the fraction of false positives

FWER MTP Solutions Bonferroni Maximum Distribution Methods Random Field Theory Permutation

Bonferroni Based on the Bonferroni inequality For uncorrelated events E i If then For 100,000 voxels

Bonferroni Can be too conservative Bonferroni assumes all tests are independent fmri data tend to be spatially correlated # of independent tests < # voxels

Bonferroni Where does spatial correlation come from? How images are constructed from the scanner Physiologic signal Preprocessing steps (realignment, smoothing, etc.)

Why not use a spatial model If we can model temporal correlation, why not spatial? Need an explicit spatial model No routine spatial modeling methods exist High-dimensional mixture modeling problem Activations don t look like Gaussian blobs Need realistic shapes, sparse representation Some work by Hartvig et al., Penny et al.

FWER MTP Solutions: Controlling FWER w/ Max FWER & distribution of maximum FWER = P(FWE) = P(One or more voxels u H o ) = P(Max voxel u H o ) 100(1-α)%ile of max dist n controls FWER FWER = P(Max voxel u α H o ) α u α α

FWER MTP Solutions: Random Field Theory Euler Characteristic χ u Topological Measure No holes Never more than 1 blob #blobs - #holes At high thresholds, just counts blobs FWER= P(Max voxel u H o ) Random Field Threshold = P(One or more blobs H o ) P(χ u 1 H o ) E(χ u H o ) Suprathreshold Sets

RFT Details: Expected Euler Characteristic E(χ u ) V Λ (u 2-1) exp(-u 2 /2) / (2π) 2 V volume Λ roughness, measuring the covariance of the gradient of GRF, G

Random Field Theory Smoothness Parameterization Smoothness parameterized as Full Width at Half Maximum FWHM of Gaussian kernel needed to smooth a white noise random field to roughness Λ FWHM Parameterize Λ in terms of FWHM

Random Field Theory Smoothness Parameterization RESELS Resolution Elements 1 RESEL = FWHM x FWHM y FWHM z RESEL Count R R = V Λ The only data-dependent part of E(χ u ) Volume of search region in units of smoothness Eg: 10 voxels, 2.5 voxel FWHM smoothness, 4 RESELS. RESELs not # of independent things in the image

Random Field Intuition Corrected P-value for voxel value t P c = P(max T > t) E(χ t ) V Λ t 2 exp(-t 2 /2) Statistic value t increases P c decreases (of course!) Search volume V increases P c increases (more severe MCP) Smoothness increases ( Λ smaller) P c decreases (less severe MCP)

Nonparametric Permutation Test Parametric methods Assume distribution of statistic under null hypothesis Nonparametric methods Use data to find distribution of statistic under null hypothesis Any statistic! 5% Parametric Null Distribution 5% Nonparametric Null Distribution

Permutation Test Toy Example Data from voxel in visual stim. experiment A: Active, flashing checkerboard B: Baseline, fixation 6 blocks, ABABAB Just consider block averages... A B A B A B 103.00 90.48 99.93 87.83 99.76 96.06 Null hypothesis H o No experimental effect, A & B labels arbitrary Statistic Mean difference

Permutation Test Toy Example Under H o Consider all equivalent relabelings AAABBB ABABAB BAAABB BABBAA AABABB ABABBA BAABAB BBAAAB AABBAB ABBAAB BAABBA BBAABA AABBBA ABBABA BABAAB BBABAA ABAABB ABBBAA BABABA BBBAAA

Permutation Test Toy Example Under H o Consider all equivalent relabelings Compute all possible statistic values AAABBB 4.82 ABABAB 9.45 BAAABB -1.48 BABBAA -6.86 AABABB -3.25 ABABBA 6.97 BAABAB 1.10 BBAAAB 3.15 AABBAB -0.67 ABBAAB 1.38 BAABBA -1.38 BBAABA 0.67 AABBBA -3.15 ABBABA -1.10 BABAAB -6.97 BBABAA 3.25 ABAABB 6.86 ABBBAA 1.48 BABABA -9.45 BBBAAA -4.82

Permutation Test Toy Example Under H o Consider all equivalent relabelings Compute all possible statistic values Find 95%ile of permutation distribution AAABBB 4.82 ABABAB 9.45 BAAABB -1.48 BABBAA -6.86 AABABB -3.25 ABABBA 6.97 BAABAB 1.10 BBAAAB 3.15 AABBAB -0.67 ABBAAB 1.38 BAABBA -1.38 BBAABA 0.67 AABBBA -3.15 ABBABA -1.10 BABAAB -6.97 BBABAA 3.25 ABAABB 6.86 ABBBAA 1.48 BABABA -9.45 BBBAAA -4.82

Permutation Test Toy Example Under H o Consider all equivalent relabelings Compute all possible statistic values Find 95%ile of permutation distribution AAABBB 4.82 ABABAB 9.45 BAAABB -1.48 BABBAA -6.86 AABABB -3.25 ABABBA 6.97 BAABAB 1.10 BBAAAB 3.15 AABBAB -0.67 ABBAAB 1.38 BAABBA -1.38 BBAABA 0.67 AABBBA -3.15 ABBABA -1.10 BABAAB -6.97 BBABAA 3.25 ABAABB 6.86 ABBBAA 1.48 BABABA -9.45 BBBAAA -4.82

Permutation Test Toy Example Under H o Consider all equivalent relabelings Compute all possible statistic values Find 95%ile of permutation distribution -8-4 0 4 8

Small Sample Sizes Permutation test doesn t work well with small sample sizes Possible p-values for previous example: 0.05, 0.1, 0.15, 0.2, etc Tends to be conservative for small sample sizes

Controlling FWER: Permutation Test Parametric methods Assume distribution of max statistic under null hypothesis Nonparametric methods Use data to find distribution of max statistic under null hypothesis Again, any max statistic! 5% Parametric Null Max Distribution 5% Nonparametric Null Max Distribution

Permutation Test & Exchangeability Exchangeability is fundamental Def: Distribution of the data unperturbed by permutation Under H 0, exchangeability justifies permuting data Allows us to build permutation distribution Subjects are exchangeable Under Ho, each subject s A/B labels can be flipped fmri scans are not exchangeable under Ho If no signal, can we permute over time? No, permuting disrupts order, temporal autocorrelation

Permutation Test & Exchangeability fmri scans are not exchangeable Permuting disrupts order, temporal autocorrelation Intrasubject fmri permutation test Must decorrelate data, model before permuting What is correlation structure? Usually must use parametric model of correlation E.g. Use wavelets to decorrelate Bullmore et al 2001, HBM 12:61-78 Intersubject fmri permutation test Create difference image for each subject For each permutation, flip sign of some subjects

Permutation Test Other Statistics Collect max distribution To find threshold that controls FWER Consider smoothed variance t statistic To regularize low-df variance estimate

Permutation Test Smoothed Variance t Collect max distribution To find threshold that controls FWER Consider smoothed variance t statistic mean difference variance t-statistic

Permutation Test Smoothed Variance t Collect max distribution To find threshold that controls FWER Consider smoothed variance t statistic mean difference smoothed variance Smoothed Variance t-statistic

Permutation Test Example fmri Study of Working Memory Active... 12 subjects, block design Marshuetz et al (2000) Item Recognition Active:View five letters, 2s pause, view probe letter, respond Baseline: View XXXXX, 2s pause, view Y or N, respond Second Level RFX Difference image, A-B constructed for each subject One sample, smoothed variance t test... D UBKDA yes Baseline XXXXX... N no...

Permutation Test Example Permute! 2 12 = 4,096 ways to flip 12 A/B labels For each, note maximum of t image. Permutation Distribution Maximum t Maximum Intensity Projection Thresholded t

Permutation Test Example Compare with Bonferroni α = 0.05/110,776 Compare with parametric RFT 110,776 2 2 2mm voxels 5.1 5.8 6.9mm FWHM smoothness 462.9 RESELs

u Perm = 7.67 58 sig. vox. t 11 Statistic, Nonparametric Threshold u RF = 9.87 u Bonf = 9.80 5 sig. vox. t 11 Statistic, RF & Bonf. Threshold RFT threshold is conservative (not smooth enough, d.f. too small) 378 sig. vox. Smoothed Variance t Statistic, Nonparametric Threshold Permutation test is more efficient than Bonferroni since it accounts for smoothness Smooth variance is more efficient for small d.f.

Does this Generalize? RFT vs Bonf. vs Perm. t Threshold (0.05 Corrected) df RF Bonf Perm Verbal Fluency 4 4701.32 42.59 10.14 Location Switching 9 11.17 9.07 5.83 Task Switching 9 10.79 10.35 5.10 Faces: Main Effect 11 10.43 9.07 7.92 Faces: Interaction 11 10.70 9.07 8.26 Item Recognition 11 9.87 9.80 7.67 Visual Motion 11 11.07 8.92 8.40 Emotional Pictures 12 8.48 8.41 7.15 Pain: Warn ing 22 5.93 6.05 4.99 Pain: Anticipation 22 5.87 6.05 5.05

Does this Generalize? RFT vs Bonf. vs Perm. No. Significant Voxels (0.05 Corrected) t SmVar t df RF Bonf Perm Perm Verbal Fluency 4 0 0 0 0 Location Switching 9 0 0 158 354 Task Switching 9 4 6 2241 3447 Faces: Main Effect 11 127 371 917 4088 Faces: Interaction 11 0 0 0 0 Item Recognition 11 5 5 58 378 Visual Motion 11 626 1260 1480 4064 Emotional Pictures 12 0 0 0 7 Pain: Warning 22 127 116 221 347 Pain: Anticipation 22 74 55 182 402

Overview Multiple Testing Problem Which of my 100,000 voxels are active? Two methods for controlling false positives Familywise Error Rate Controlling the chance of any false positives Bonferroni, Random Field and Nonparametric Methods False Discovery Rate Controlling the fraction of false positives

False Discovery Rate For any threshold, all voxels can be cross-classified: Null True (no effect) Null False (true effect) Accept Null Negative V 0N V 1N m 1 V N V P v False Discovery Proportion FDP = V 0P /V P (FDP=0 if V P =0) But only can observe V P, don t know V 0P We control the expected FDP FDR = E(FDP) Reject Null Positive V 0P V 1P m 0

False Discovery Rate Illustration: Noise Signal Signal+Noise

Control of Per Comparison Rate at 10% 11.3% 11.3% 12.5% 10.8% 11.5% 10.0% 10.7% 11.2% 10.2% 9.5% Percentage of Null Pixels that are False Positives Control of Familywise Error Rate at 10% Occurrence of Familywise Error FWE Control of False Discovery Rate at 10% 6.7% 10.4% 14.9% 9.3% 16.2% 13.8% 14.0% 10.5% 12.2% 8.7% Percentage of Activated Pixels that are False Positives

Benjamini & Hochberg Procedure Select desired limit α on FDR Order p-values, p (1) p (2)... p (v) Let r be largest i such that p (i) i/v α Reject all hypotheses corresponding to p (1),..., p (r). p-value 0 1 i/v p (i) i/v α 0 1

Adaptiveness of Benjamini & Hochberg FDR 1 When no signal: P-value threshold α/v Ordered p-values p (i) 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 Fractional index i/v When all signal: P-value threshold α...fdr adapts to the amount of signal in the data

Benjamini & Hochberg: Key Properties FDR is controlled E(FDP) α m 0 /v Conservative, if large fraction of nulls false Adaptive Threshold depends on amount of signal More signal, More small p-values, More p (i) less than i/v α/c(v)

FDR Example FWER Perm. Thresh. = 7.67 58 voxels FDR Threshold = 3.83 3,073 voxels

Conclusions for voxelwise tests Multiple Testing Problem Choose a MTP metric (FDR, FWE) Use a powerful method that controls the metric Nonparametric Inference More power for small group FWE inferences References GRF: Worsley, et al., J.Cerb.Blood.Flow.Met., 1992: 900-918 Permutation: Nichols & Holmes, HBM, 2001: 1-20 FDR: Nichols & Hayasaka, Stat. Meth. Med. Res., 2003:419-416

Cluster-based inference We use RFT all the time, so it can t be as bad as the RFT results we just saw Use cluster size as the test statistic for RFT Permutation tests use cluster size or cluster mass

Cluster-level Inference Two step-process Define clusters by arbitrary threshold u clus u clus space

Cluster-level Inference Typically better sensitivity Worse spatial specificity The null hyp. of entire cluster is rejected Only means that one or more of voxels in cluster active u clus space Cluster not significant k α k α Cluster significant

Extent vs Mass Cluster extent How many voxels are in cluster Sensitive to spatially extended signals Cluster mass Combines signal extent and intensity Can be done with FSL s randomise and SnPM Generally works better, but RFT-based distribution is difficult

RF vs Perm: cluster mass Most popular threshold Hayasaka, et al, NI 2003

Conclusions Cluster extent RF test Generally conservative (especially for low smoothness) Only close to 0.05 for high threshold (0.0001) and smooth data In some cases extremely anticonservative Results seem to worsen with larger sample sizes (not sure why)

Conclusions Cluster extent permutation test In general works well for smooth data with sufficient DF Generally conservative due to discreteness of the test

What to do? Start with fast RFT-based approaches If you think you have something use longer permutation-based thresholding Also check out new threshold free cluster enhancement (TFCE) option in FSL No need to choose 2 thresholds!