Modelling Personalized Screening: a Step Forward on Risk Assessment Methods

Size: px

Start display at page:

Download "Modelling Personalized Screening: a Step Forward on Risk Assessment Methods"

Phillip Matthews
5 years ago
Views:

1 Modelling Personalized Screening: a Step Forward on Risk Assessment Methods Validating Prediction Models Inmaculada Arostegui Universidad del País Vasco UPV/EHU Red de Investigación en Servicios de Salud en Enfermedades Crónicas - REDISSEC Basque Center for Applied Mathematics - BCAM 38th Annual Conference of the ISCB Vigo, 9-13 July 2017 I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 1 / 29

2 Outline 1 Introduction and Motivation 2 CPRs: Validation process 3 Application to ecopd evolution 4 Discussion I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 2 / 29

3 Introduction and Motivation Prediction models and clinical practice Prediction on the prognosis of a disease is necessary for screening, prevention and choice of treatment The probabilities of diagnosis and prognostic outcomes are conditioning decision-making process Evidence-based medicine applies the scientific method to medical practice Towards shared decision-making on choices for diagnostic tests and therapeutic interventions Clinical prediction rules may provide the evidence-based input for shared decision-making in clinical practice I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 3 / 29

4 Introduction and Motivation Motivating data: The IRYSS COPD Study COPD is a leading chronic condition in many countries Exacerbation of COPD (ecopd) often requires assessment in an ED and hospitalization Severe exacerbations lead to death or intubation Moderate exacerbations require an adjustment of the therapy Exacerbations play a major role in the burden of COPD, its evolution, and its cost Physicians must rely largely on their experience and the patient s personal criteria for gauging how an ecopd will evolve A clinical prediction rule for ecopd evolution would allow physicians to make better informed decisions about treatment Goal The development of clinical prediction rules (scores) for risk stratification of patients with ecopd I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 4 / 29

5 Introduction and Motivation Goal A method for the development of validated clinical prediction rules (scores) for risk stratification and to make them available as easy to use tools for clinical decision-making process scores development validated easy to use tools stratification I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 5 / 29

development and validation 3 Stratification: Score categorization

6 CPRs: Validation process Step-by-step process General overview 1 Modeling: Model development and validation 2 Scoring: Score development and validation 3 Stratification: Score categorization and validation I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 6 / 29

7 CPRs: Validation process Modeling: Development Model development and validation In general: Outcome k predictors Model In our case: Binary outcome Continuous and categorical predictors Logistic regression model Selection of predictors Model discrimination: Area under the receiver operating characteristic (ROC) curve (AUC) Model calibration: Calibration plot & H-L test I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 7 / 29

8 Modeling: Validation CPRs: Validation process Model development and validation 1 Predictors: Relationship predictor-outcome Missing values 2 Selection of predictors: Stability of the predictors with internal bootstrap validation 3 Overestimation of the AUC: Same data were used for modeling (logistic regression) and discrimination (AUC) purposes Consequently, AUC is biased Optimism correction for the AUC is proposed: bootstrap bias-correction method Harrell, Split validation: Application to a different sample I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 8 / 29

9 Predictors CPRs: Validation process Model development and validation Relationship predictor-outcome (logistic function) Linear Non linear Smooth functions (GAM) Categorize predictor: Look for optimal categorization Missing values Ignore (drop out subjects) Imputation techniques Consider missing category I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 9 / 29

10 Density Density N = 1997 Bandwidth = N = 2000 Bandwidth = CPRs: Validation process Model development and validation Selection of predictors: Step 1 STEP 1: Variable selection Derivation sample Variables with p-value <0.20 (X 1,, X n ) Subsample 1 Model 1 (β 11,, β n1 ) Generation of 2000 bootstrap samples*... Subsample 2000 Model 2000 (β 12000,, β n2000 ) If 0 β i CI 80% =(p 10 p 90 ) βi X i was not considered for the Step 2.0 If 0 β i CI 80% =(p 10 p 90 ) βi X i was considered for the Step 2.0 *Bootstrap samples: subsamples with replacement (of the same size as the derivation sample) I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 10 / 29

11 CPRs: Validation process Selection of predictors: Step 2 Model development and validation STEP 2: Model building Step 2.j : j=1,.. Subsample 1 Model 1 (β 11,, β n1 ) Risk factors associated with the outcome in Step 2.j-1 (X rj,, X sj ) 1 r j <s j n Generation of 2000 NEW boostraps... Subsample 2000 Model 2000 (β 12000,, β n2000 ) If 0 β i CI 95% =(p 2,5 p 97,5 ) βi X i was not considered for the Step 2.j+1 If 0 β i CI 95% =(p 2,5 p 97,5 ) βi X i was considered for the Step 2.j+1 Step 2.j is repeated since all the variables in the model verify 0 β ici 95% i {r j,, s j } FINAL MODEL I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 11 / 29

AUC correction CPRs: Validation process Model development and validation Step 1 Fit the logistic regression model on the basis of the original sample {(x i, y i )} N i=1 and compute the corresponding

12 AUC correction CPRs: Validation process Model development and validation Step 1 Fit the logistic regression model on the basis of the original sample {(x i, y i )} N i=1 and compute the corresponding AUC, ÂUC app. Step 2 For b = 1,..., B, generate the bootstrap resample (b.r) {(x ib, y ib )}N i=1 by drawing a random sample of size N with replacement from the original sample. Step 3 Fit the logistic regression model to the bootstrap resample and compute the corresponding AUC, ÂUCb boot. Step 4 Obtain the predicted probabilities for the original sample based on the fitted logistic regression model obtained in Step 3 and compute the AUC, ÂUC b o. The optimism O of the original AUC is calculated as follows O = 1 B B (ÂUCb boot ÂUCb o) b=1 and the bias corrected AUC is then computed as ÂUC app O. I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 12 / 29

13 CPRs: Validation process Score development and validation Scoring: Development Step1: Estimate the parameters of the model f (y) = β 0 + β 1 X β nx n Step2: Determine reference values for each category j of each predictor X i (W ij ) Dichotomous predictor: reference values are 0/1 Continuous predictor (X i ): Categorize in k contiguous classes (X i1, X i2,, X ik ) Step3: Determine the reference value of the base category for each predictor (W iref ) Step4: Set the number of regression units that reflects 1 point in the score (B) Step5: Weight each category of each predictor by its significance level (b j ) p > 0.1 b ij = < p < 0.1 b ij = < p < 0.05 b ij = < p < 0.01 b ij = 1.2 p < b ij = 1.4 Step6: Determine the number of points for each category of each predictor (S ij ) S ij = b ij β i (W ij W iref ) B Sullivan et al., Statistics in Medicine, I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 13 / 29

Scoring: Validation CPRs: Validation process Score development and validation 1 Comparing AUC(model) vs. AUC(score): DeLong test DeLong et al., Biometrics, 1988.

14 Scoring: Validation CPRs: Validation process Score development and validation 1 Comparing AUC(model) vs. AUC(score): DeLong test DeLong et al., Biometrics, Optimism correction for the AUC: Bootstrap bias-correction of the overestimation Harrell, I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 14 / 29

15 CPRs: Validation process Score categorization Stratification: Categorization method Let Y be a dichotomous response variable and X the continuous score which we want to categorize Look for the vector of k optimal cut points v = (x 1,..., x k ) by using genetic algorithms The aim is to maximize the AUC of the model P(Y = 1 X catk ) = exp(β 0 + k l=1 β l1 {Xcatk =l}) 1 + exp(β 0 + k l=1 β l1 {Xcatk =l}) The arguments used in developing the genetic algorithm: AUC function to be maximized k number of parameters to be estimated Range of the score X in which we look for the cut points X Catk the categorized score taking k + 1 values (l = 0,..., k) Barrio et al., Statistical Methods in Medical Research, I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 15 / 29

Risk stratification CPRs: Validation process Score categorization Continuous score: X After categorization: X Catk (k = 4) 4 risk categories: low - moderate - high - very high Comparing AUC(X Cat4 )

16 Risk stratification CPRs: Validation process Score categorization Continuous score: X After categorization: X Catk (k = 4) 4 risk categories: low - moderate - high - very high Comparing AUC(X Cat4 ) vs. AUC(X): DeLong test Optimism correction for the AUC: Modified Harrell s proposal Evaluation of the integrated discrimination improvement (IDI) Steyerberg et al., Epidemiology, I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 16 / 29

17 Application to ecopd evolution Data Description of the IRYSS-COPD Study Prospective cohort of patients with ecopd (n = 2487) Outcome: Short-term mortality Potential predictors: 16 clinical variables collected from medical records and direct interview (age, baseline FEV1%, dyspnea,comorbidities, arterial blood gasses,...) Goal The development of a clinical prediction rule for short-term mortality of patients with ecopd Quintana et al., BMC Health Services Research, I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 17 / 29

18 Application to ecopd evolution Methods Modeling Scoring Stratification Implementation I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 18 / 29

85 CI95% = (0.77-0.93) H-L test: p = 0.3131 I.

19 Application to ecopd evolution Results Model development and validation AUC (Model) = 0.85 CI95% = ( ) H-L test: p = I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 19 / 29

20 Application to ecopd evolution Results Scoring: development and validation Score: 0 27 AUC (Score) = 0.84 CI95% = ( ) DeLong test(score vs. model): p = I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 20 / 29

21 Application to ecopd evolution Results Scoring: development and validation I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 21 / 29

91) AUC (Categorical Score) = 0.84 CI95% = (0.78-0.

22 Risk stratification Application to ecopd evolution Results Subsample 2 AUC (Score) = 0.84 CI95% = ( ) AUC (Categorical Score) = 0.84 CI95% = ( ) DeLong test(categorical vs. score): p = I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 22 / 29

23 Risk stratification Application to ecopd evolution Results I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 23 / 29

24 Application to ecopd evolution Computer tool: PrEveCOPD Implementation: PrEveCOPD App Windows (under installation and web-application) Available at: I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 24 / 29

25 Application to ecopd evolution Computer tool: PrEveCOPD Implementation: PrEveCOPD App Android: Available at Google Play I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 25 / 29

26 Validation step-by-step Discussion 1 Modeling: Proper validation of a prediction model can lead to better and more stable discrimination ability 2 Scoring: A prediction model can be summarized into a valid and easy to obtain clinical prediction rule (score) 3 Stratification: Categorization of the score allows for valid stratification of patients by risk 4 Implementation: An easy to use computer application can guide the medical decision process in clinical practice I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 26 / 29

27 Discussion Conclusions 1 The proposed methodology as a whole allows for valid stratification of patients with ecopd by their risk of short-term mortality 2 The PrEveCOPD computer tool can guide medical decision process at patient s ED arrival I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 27 / 29

28 Discussion Is it finished? External validation The CPR performs well across samples from different but related source populations (transportability) 1 Relatedness of original (derivation) and new (validation) samples 2 Assessment of the CPR s performance in the new study 3 Interpretation of the results: Correction of poor performance if necessary External validation is missing! Waiting for a new sample I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 28 / 29

29 Discussion Thank you! I. Arostegui (UPV/EHU) SY2:Validating Prediction Models 29 / 29

Evaluating generalization (validation) Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support

Evaluating generalization (validation) Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Topics Validation of biomedical models Data-splitting Resampling Cross-validation