Learning Objectives. Outline. Lung Cancer Workshop VIII 8/2/2012. Nicholas Petrick 1. Methodologies for Evaluation of Effects of CAD On Users

Methodologies for Evaluation of Effects of CAD On Users Nicholas Petrick Center for Devices and Radiological Health, U.S. Food and Drug Administration AAPM - Computer Aided Detection in Diagnostic Imaging (CAD) Subcommittee Learning Objectives To review the most common reader performance assessment techniques and associated performance metrics To review the AAPM CAD subcommittee s current thinking on best practices for performing and analyzing CAD reader performance assessment studies To understand the importance and limitations of various reader performance assessment approaches and metrics 8/1/2012 Petrick, AAPM 2012 2 Outline Introduce reader performance testing Preliminary work before starting a study Various types of studies and study designs Performance metrics Statistical analysis Summary 8/1/2012 Petrick, AAPM 2012 3 Nicholas Petrick 1

Reader Performance Testing Denotes a test (study) designed to estimate the performance of physicians using CAD as part of the diagnostic process Evaluating performance of physicians when aided by CAD *http://www.imagingcentersofamerica.com/services/interpretation.php 8/1/2012 Petrick, AAPM 2012 4 Why reader performance testing? Reader studies are more indicative of clinical performance compared with standalone testing 8/1/2012 Petrick, AAPM 2012 5 CAD marks obvious lesions Marking only large lung nodules may not substantially improve physician performance 8/1/2012 Petrick, AAPM 2012 6 Nicholas Petrick 2

CAD marks subtle lesions Marking subtle lung nodules may improve physician performance Physician must be able to identify CAD mark as a nodule 8/1/2012 Petrick, AAPM 2012 7 Reader Performance Testing Preliminary Work 8/1/2012 Petrick, AAPM 2012 8 Before Starting a Study Define how CAD should be used Conduct pre-study statistical analysis Determine reader and case sample sizes Identify readers and image case set Establish reference standard Define mark labeling rules 8/1/2012 Petrick, AAPM 2012 9 Nicholas Petrick 3

How will CAD be used? Reading paradigm impacts design of reader performance study CAD Reading Paradigms First reader Physician reviews only regions or findings marked by the CAD device Sequential reading Physician first conducts a full interpretation without CAD (unaided read) Then re-evaluates with CAD aid (aided read) Concurrent reading Physician performs full interpretation in presence of CAD CAD marks are available at any time 8/1/2012 Petrick, AAPM 2012 10 Pre-study Statistical Analysis Pilot study is critical Estimating effect size and variance components Sizing reader study No. of readers and no. of cases Identifying problems in study design & protocol Freeware study sizing tools imrmc. http://js.cx/~xin/index.html Obuchowski. http://www.bio.ri.ccf.org/html/rocanalysis.html 8/1/2012 Petrick, AAPM 2012 11 Example: imrmc Tool Estimates nos. of readers and cases imrmc plans to include a database of reader studies for references 8/1/2012 Petrick, AAPM 2012 12 Nicholas Petrick 4

Readers Identify Readers & Cases Identify and sample from distribution of readers Physicians expected to utilize device Cases Test case set should matches target population General population for intended use Sub-population clearly identified in the study aims Inclusion/exclusion criteria clearly defined and justified Distribution of known co-variates listed and any differences justified Disease stage, lesion type, etc. 8/1/2012 Petrick, AAPM 2012 13 Other Pre-study Definitions Reference standard Patient level Whether or not disease is present Lesion level Location and/or extent of the disease Mark labeling Rules for declaring a physician s mark as TP or FP Only necessary when localizing lesions is study endpoint 8/1/2012 Petrick, AAPM 2012 14 Reader Performance Testing MRMC Study Designs 8/1/2012 Petrick, AAPM 2012 15 Nicholas Petrick 5

Generalizability Generalizabliltiy Study results generalize to a wider population Ideally, results should generalize to clinical populations of patients, physicians, imaging hardware & protocols, reading environments, etc. Controlled study results generalize to specific but limited populations 8/1/2012 Petrick, AAPM 2012 16 Multi-Reader Multi-Case (MRMC) A controlled reader study where a set of readers interpret a common set of patient images Typically under competing reading conditions Readers unaided vs. readers aided by CAD 8/1/2012 Petrick, AAPM 2012 17 Prospective MRMC Study Designs Retrospective 8/1/2012 Petrick, AAPM 2012 18 Nicholas Petrick 6

Prospective Reader Studies Randomized Control Trial CAD performance measured as part of actual clinical practice Field testing a CAD device 8/1/2012 Petrick, AAPM 2012 19 Prospective Reader Studies Advantages Estimates clinical utility of CAD to readers as device is used in actual practice Good generalizability Issues Generally require large patient population Low prevalence of disease in most CAD application areas 8/1/2012 Petrick, AAPM 2012 20 Types of Prospective Reader Studies Cross-sectional comparison studies Historical-control studies Double reading comparison studies 8/1/2012 Petrick, AAPM 2012 21 Nicholas Petrick 7

Cross-sectional Comparisons Physician interprets case without CAD assistance, records findings, interprets case again with CAD Only possible when CAD is designed for sequential reading Variation: Each case read independently by two physicians, one reading with CAD, one without Issues Without CAD read may not match clinical practice 8/1/2012 Petrick, AAPM 2012 22 Historical-control Studies Compare physicians interpretation with CAD to their without CAD readings in a different (usually prior) time interval Advantage Can be implemented directly within routine clinical practice Issues Changes over study period confound CAD effect Differences in patients, readers, interpretation process, etc. 8/1/2012 Petrick, AAPM 2012 23 Historical-control Studies Selection of the performance metric* Cancer detection rate May not correctly measure impact Introduction of CAD may impact cancer prevalence in population of interest CAD introduced Cancers in screened pop. Cancers detected Alternate endpoints Change in cancer stage, nodal status, etc. *Nishikawa, Pesce. Radiology. 2009;251(3):634-6. 8/1/2012 Petrick, AAPM 2012 24 Nicholas Petrick 8

Comparisons with Double Reading Double reading Cases are read separately by two physicians Increases detection sensitivity Lowers specificity Compare physicians interpretation with CAD to double reading Determine if single reading with CAD is as effective as double reading 8/1/2012 Petrick, AAPM 2012 25 Retrospective Reader Studies Cases are collected prior to image interpretation and are read offline by the readers Most common approach for CAD is an enriched reader study design Population of cases enriched with patient known to be diseased 8/1/2012 Petrick, AAPM 2012 26 Retrospective Reader Studies Advantages Substantially reducing no. of cases required to achieve statistically significant results Allows for more rigorous study controls Issues Impacts behavior of readers compared with clinical practice Know their decisions don t impact patient care Reader may become cognizant of enrichment relative to clinical practice Control for reader behavior Compare performance to control modality Reading without CAD 8/1/2012 Petrick, AAPM 2012 27 Nicholas Petrick 9

Retrospective Reader Studies Warren-Burhenne Study Design* Two separate studies Retrospective study of CAD sensitivity to detect abnormalities missed in clinical practice Estimated relative reduction in false negative (FN) rate with CAD Study of the work-up rate of readers with & without CAD Difference in work-up rate is attributed to use of CAD Study results may not be statistically interpretable *Warren Burhenne et al, Radiology 215:554-562, 2000. 8/1/2012 Petrick, AAPM 2012 28 Retrospective Reader Studies Preferred approach Evaluate all primary endpoints within a single study with physicians actually using the CAD [Se, Recall Rate] within one study Use ROC analysis 8/1/2012 Petrick, AAPM 2012 29 Reader Performance Testing MRMC Performance Metrics 8/1/2012 Petrick, AAPM 2012 30 Nicholas Petrick 10

MRMC Performance Metrics Same as those discussed in prior talk on Standalone CAD Assessment Performance Metrics ROC, LROC, JAFROC Figures of Merit Area under the curve (AUC) Partial area under the curve Operating points [Se, Sp], [Se, FPPI], etc. 8/1/2012 Petrick, AAPM 2012 31 ROC Curve 1 Specificity 0 1 Sensitivity 0.8 0.6 0.4 ROC Curve 0.2 0 0 0.2 0.4 0.6 0.8 1 False Positive Fraction (FPF) 8/1/2012 Petrick, AAPM 2012 32 ROC Curve Sources of Variability Cases Reader skill Sensitivity 1 Specificity 0 1 0.8 0.6 0.4 Cases Cases + Readers 0.2 0 0 0.2 0.4 0.6 0.8 1 False Positive Fraction (FPF) 8/1/2012 Petrick, AAPM 2012 33 Nicholas Petrick 11

ROC Curve with Operating Point 1 Specificity 0 1 Sensitivity 0.8 0.6 0.4 Operating Point 0.2 0 0 0.2 0.4 0.6 0.8 1 False Positive Fraction (FPF) 8/1/2012 Petrick, AAPM 2012 34 ROC Curve with Operating Point Source of variability Cases Reader skill Reader mindset Different operating threshold for different readers Sensitivity 1 Specificity 0 1 0.8 0.6 0.4 0.2 Case Case + Reader Case + Reader + Threshold 0 0 0.2 0.4 0.6 0.8 1 False Positive Fraction (FPF) 8/1/2012 Petrick, AAPM 2012 35 Operating Point & ROC FOMs Hologic DBT reader studies 2 reader studies Compared 2D interpretation with 2D+3D interpretation Study 1 48 cancer cases 264 non-cancer cases 12 readers Study 2 51 cancers (48 same as in Study 1) 259 non-cancer cases (171 same as in Study 1) 15 readers (no overlap in readers with Study 1) *http://www.fda.gov/advisorycommittees/committeesmeetingmaterials/medicaldevices/medicaldevice sadvisorycommittee/radiologicaldevicespanel/ucm226660.htm 8/1/2012 Petrick, AAPM 2012 36 Nicholas Petrick 12

Study 1 ROC Curves Study 2 Operating Points Study 1 Study Operating 2 ROC Points Curves 8/1/2012 Petrick, AAPM 2012 37 Reader Performance Testing MRMC Statistical Analysis 8/1/2012 Petrick, AAPM 2012 38 MRMC Statistical Analysis Sources of variability Readers and cases Purpose of analysis Determine statistically significant effects 8/1/2012 Petrick, AAPM 2012 39 Nicholas Petrick 13

Patient-Based MRMC Analysis Tools Dorfman, Berbaum and Metz (DBM) 1 ANOVA analysis of jackknife pseudovalues Obuchowski-Rockette (OR) method 2 ANOVA with correlated error analysis Directly models accuracy of each reader One-shot MRMC method 3 Non-parametric approach Based on mechanistic MRMC variance Sum estimable moments to determine total variance 1 Dorfman, Berbaum, Metz. Invest Radiol. 1992;27(9):723-31. 2 Obuchowski, Rockette. Communications in Statistics - Simulation and Computation. 1995;24(2):285-308. 3 Gallas. Acad Radiol. 2006;13(3):353-62 8/1/2012 Petrick, AAPM 2012 40 Location-based MRMC Analysis Tools Jackknife AFROC (JAFROC) 1 Discussed in prior talk on Standalone Assessment Region-of-Interest (clustered) ROC analysis 2 Divide patient data into regions 1 Chakraborty, Berbaum. Med Phys. 2004;31(8):2313-30. 2 Obuchowski. Biometrics. 1997;53(2):567-78. 8/1/2012 Petrick, AAPM 2012 41 Region-of-Interest (clustered) ROC Analysis Non-clustered Physicians scores enter 4-view set Clustered analysis Physicians scores Right & Left pairs separately R Score Patient Score *http://www.wjso.com/content/figures/1477-7819-5-124-4.jpg L Score 8/1/2012 Petrick, AAPM 2012 42 Nicholas Petrick 14

MRMC Analysis Freeware DBM software http://perception.radiology.uiowa.edu/ OR software http://www.bio.ri.ccf.org/html/rocanalysis.html One-shot software http://js.cx/~xin/index.html JAFROC software http://www.devchakraborty.com/ 8/1/2012 Petrick, AAPM 2012 43 Summary Reader performance testing evaluates performance of physicians when aided by CAD Most often compared to unaided reading Statistical analysis/sizing freeware is available Pilot studies are critical for both Sizing trials Identify reading protocol issues 8/1/2012 Petrick, AAPM 2012 44 Additional Resources FDA Guidance for Industry and FDA Staff Clinical Performance Assessment: Considerations for Computer-Assisted Detection Devices Applied to Radiology Images and Radiology Device Data - Premarket Approval (PMA) and Premarket Notification [510(k)] Submissions http://www.fda.gov/medicaldevices/deviceregulationandguida nce/guidancedocuments/ucm187277.htm Computer-Assisted Detection Devices Applied to Radiology Images and Radiology Device Data - Premarket Notification [510(k)] Submissions http://www.fda.gov/medicaldevices/deviceregulationandguida nce/guidancedocuments/ucm187249.htm 8/1/2012 Petrick, AAPM 2012 45 Nicholas Petrick 15

Acknowledgements The mention of commercial entities, or commercial products, their sources, or their use in connection with material reported herein is not to be construed as either an actual or implied endorsement of such entities or products by the Department of Health and Human Services 8/1/2012 Petrick, AAPM 2012 46 Nicholas Petrick 16