Model-based Recursive Partitioning for Subgroup Analyses

Size: px
Start display at page:

Download "Model-based Recursive Partitioning for Subgroup Analyses"

Transcription

1 EBPI Epidemiology, Biostatistics and Prevention Institute Model-based Recursive Partitioning for Subgroup Analyses Torsten Hothorn; joint work with Heidi Seibold and Achim Zeileis

2 Subgroup analyses Identifying groups of patients for whom the treatment has a different effect than for others. Effect is: Stronger Lower Contrary than the average treatment effect. Suitable models promise better prediction of treatment effect and thus individualised treatments. University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 2

3 Situations of interest for subgroup analyses EMA (2014, 1): 1. The clinical data presented are overall statistically persuasive with therapeutic efficacy demonstrated globally. It is of interest to verify that the conclusions of therapeutic efficacy (and safety) apply consistently across subgroups of the clinical trial population. 2. The clinical data presented are overall statistically persuasive but with therapeutic efficacy or benefit/risk which is borderline or unconvincing and it is of interest to identify post-hoc a subgroup, where efficacy and risk-benefit is convincing. 3. The clinical data presented fail to establish statistically persuasive evidence but there is interest in identifying a subgroup, where a relevant treatment effect and compelling evidence of a favourable risk-benefit profile can be assessed. University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 3

4 Subgroup analyses Goal: Find predictive factors ˆ= covariate treatment interactions treatment predictive factor prognostic factor primary endpoint primary endpoint University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 4

5 Covariate treatment interactions Subgroups known: incorporate subgroup treatment interaction in linear predictor. (Few) Categorical covariates at few levels: incorporate covariate treatment interaction in linear predictor. Many, potentially numeric covariates: Interactions hard to interpred, difficult to derive subgroups. Common approach: Automated interaction detection. BUT: Ordinary trees don t know treatment effect parameters. We need both parametric models AND trees. Solution: Model-based recursive partitioning (MOB). University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 5

6 MOB Basics MOB: Model-based recursive partitioning (Zeileis, Hothorn, Hornik, JCGS, 2008, 3) Start with model M((Y, X), ϑ) with α ϑ = β γ ν which fits data (Y, X) (Y, X ). intercept(s) treatment effect other parameter(s) of interest nuisance parameter(s), University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 6

7 MOB Basics Estimation ˆϑ = arg min ϑ N Ψ((y, x) i, ϑ) i=1 or equivalently solving the score equation N ψ((y, x) i, ϑ) = 0 i=1 Ψ: objective function ψ: score function; gradient of Ψ University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 7

8 MOB Basics Maybe the treatment effect is not the same for all patients, but depends on their characteristics (covariates). Find partitions {B b } (b = 1,..., B) based on patient characteristics Z = (Z 1,..., Z J ) Z Fit separate models M((Y, X), ϑ(b)) in partitions. M((Y, X), ϑ(1)) M((Y, X), ϑ(2)) ϑ(b) = (α(b), β(b), γ, ν) University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 8

9 MOB Basics If partition {B b } is known, the partitioned model parameters ϑ(b) could be estimated by minimizing the segmented objective function: (ˆϑ(b)) b=1,...,b = arg min ϑ(b) N i=1 b=1 B 1 (z i B b ) Ψ((y, x) i, ϑ(b)) Subgroup-specific intercept and treatment parameters can be written as functions of the partitioning variables α(z) = B 1(z B b ) α(b) and β(z) = b=1 B 1(z B b ) β(b). b=1 University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 9

10 Partitioning How to find the partitions? Test: H α,j 0 : ψ α ((Y, X), ˆϑ) Z j H β,j 0 : ψ β ((Y, X), ˆϑ) Z j, j = 1,..., J ψ α, ψ β partial derivatives of Ψ with respect to α/β. Partition if global permutation p-value smaller than significance level Use as split variable the one with the smallest p-value University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 10

11 Example: Linear model Two-arm trial comparing active (A) to control (C): Y X = x N (α + βx A + γx stratum, σ 2 ). Y X = x, Z = z N (α(z) + β(z)x A + γx stratum, σ 2 ), ψ((y, x), ˆϑ) = Ψ((y,x),θ) α θ=ˆϑ Ψ((y,x),θ) β θ=ˆϑ = 1 σ 2 ( y (ˆα + ˆβx A + ˆγx stratum ) (y (ˆα + ˆβx A + ˆγx stratum )) x A ) University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 11

12 Example: Linear model Prognostic α(1) α(2) ; β(1) = β(2) Predictive α(1) = α(2) ; β(1) β(2) Predictive α(1) α(2) ; β(1) β(2) Both α(1) α(2) ; β(1) β(2) Mean endpoint Subgroup 1 Subgroup 2 C A C A C A C A Treatment University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 12

13 Example: Linear model Linear model: Y X = x N (α + βx B, σ 2 ) Data generating process (Loh, He, Man, 2014, 4): Y X = x, Z = z N ( x A (z 1 < 0) (z 1 > 0) x A, 0.7) 2 2 ψ α 0 2 ψ β 0 2 C A z z 1 University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 13

14 Example: Linear model Linear model: Y X = x N (α + βx B, σ 2 ) Data generating process: Y X = x, Z = z N ( x A (z 1 < 0) (z 1 < 0) x A, 0.7) ψ α ψ β C A z z 1 University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 14

15 Example: Linear model Linear model: Y X = x N (α + βx B, σ 2 ) Data generating process: Y X = x, Z = z N (2 x A + 1(z 1 > 0), 0.7) ψ α 2 0 ψ β 2 0 C A z z 1 University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 15

16 Partitioning effects of Riluzole on ALS patients PRO-ACT database (2) Amyotrophic lateral sclerosis (ALS) patients Data of several clinical trials Treatment of interest: Riluzole Primary endpoints of interest: ALS Functional Rating Scale (ALSFRS) ALSFRS items Survival time University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 16

17 Question Riluzole modestly prolongs life expectancy But: Are there any groups of patients for whom it is better or worse? Subgroup analysis University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 17

18 ALSFRS ALS functional rating scale: Measure of functional status of ALS patients Sum-score of ten items (0 < 1 < 2 < 3 < 4): speech salivation swallowing handwriting cutting food and handling utensils, dressing and hygiene turning in bed and adjusting bed clothes walking climbing stairs breathing University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 18

19 ALSFRS ( ) ALSFRS6 E ALSFRS 0 X = x or equivalently = E(ALSFRS 6 X = x) ALSFRS 0 = exp{α + βx R } E(ALSFRS 6 X = x) = exp{α + βx R } exp{log(alsfrs 0 )} GLM with log-link and offset University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 19

20 ALSFRS Results 1 time_onset_treatment p < > FVC p < phosphorus p < > > 1 3 Value (CI) α ( , ) βr ALSFRS ( , ) n = 625 No Yes Riluzole 4 Value (CI) α ( , ) βr ALSFRS ( , ) n = 378 No Yes Riluzole 6 Value (CI) α ( , ) βr ALSFRS ( , ) n = 292 No Yes Riluzole 7 Value (CI) α ( , ) βr ALSFRS ( , ) n = 1239 No Yes Riluzole University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 20

21 ALSFRS items Unidimensionality of ALSFRS is questionable. Two components to consider: 1. Baseline adjustment Adjust for item score at beginning of treatment compute seperate models 2. Multivariate primary endpoint Look at 10 items simultaneously compute 10 item-models in every node Score matrix of dimension n p 10 University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 21

22 ALSFRS items Each item ordinal with values 0,..., 4 proportional odds model adjusted for baseline: P(Y 6 r Y 0 = k, X = x) = exp(α rk β k x R ) 1 + exp(α rk β k x R ) for k = 0,..., 4 Compute stratified permutation tests treating baseline item values as blocks. University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 22

23 ALSFRS items Results 2 FVC p < > time_onset_treatment p < > lymphocytes p = > n = 348 n = 1005 n = 407 n = 774 University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 23

24 ALSFRS items Results Item No. Start Node 3 Node 4 Node 6 Node 7 Speech Salivation Swallowing Handwriting Cutting University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 23.

25 ALSFRS items Results Item No. Start Node 3 Node 4 Node 6 Node 7 Hygiene Bed Walking Stairs Respiratory University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 23

26 Survival time We present two ways of modeling: Weibull model (parametric survival model) Cox model (semiparametric survival model) University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 24

27 Weibull model ( ) log(y) α1 βx P(Y y X = x) = F R α 2 with F cumulative distribution function of Gompertz distribution ( ) α1 intercept and α = α 2 scale parameter α defines the shape of the baseline hazard use as "intercept" University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 25

28 Weibull model Results 1 age p < Riluzole No Riluzole 55.7 > age p < time_onset_treatment p < > > Value (CI) 4 Value (CI) 6 Value (CI) 7 Value (CI) α ( 6.784, 7.251) α ( , ) α ( , ) α ( 6.565, 6.777) βr ( 0.228, 0.166) βr ( , ) βr ( , ) βr ( 0.090, 0.175) log(α2) ( 1.268, 0.794) log(α2) ( , ) log(α2) ( , ) log(α2) ( 0.615, 0.451) 1.00 n = n = n = n = University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 26

29 Cox model λ(y x) = λ 0 (y) exp(βx R ) Objective function: negative partial likelihood (without baseline hazard) No classical score function Use surrogate score function Martingale residuals (as score with respect to the baseline hazard/"intercept") Score residuals (as score with respect to β) University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 27

30 Cox model Results 1 age p < Riluzole No Riluzole 55.7 > age p < time_onset_treatment p < > > Value (CI) Value (CI) Value (CI) Value (CI) βr 0.11 ( 0.46, 0.68) βr ( 0.537, 0.017) βr ( 0.388, 0.061) βr ( 0.382, 0.065) 1.00 n = n = n = n = University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 28

31 Computational details PRO-ACT data are available at The source code for reading and cleaning the database is provided in the TH.data package. All computations were conducted using partykit (version 0.8-2) in the R system for statistical computing (version 3.1.2). University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 29

32 Simple implementation Weibull model library (" partykit ") ## Function to compute Weibull model and return score matrix mywb <- function ( data, weights, parm ) { mod <- survreg ( Surv ( survival. time, cens ) ~ Riluzole, data = data, subset = weights > 0, dist = " weibull ") ef <- as. matrix ( estfun ( mod )[, parm ]) ret <- matrix (0, nrow = nrow ( data ), ncol = ncol ( ef )) ret [ weights > 0,] <- ef ret } ## Compute tree tree <- ctree ( fm, data = data, ytrafo = my. wb, control = ctree_control ( maxdepth = 2, testtype = " Bonferroni ")) University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 30

33 Summary MOB partitions a large class of models suitable for the treatment effect on the primary endpoint of interest. Score functions capture instabilities and thus help to identify predictive and prognostic variables. It is hard to differentiate between predictive and prognostic variables. Permutation tests suitable for all models (distribution free) with good small sample properties and error control. Multiplicity adjustment for subgroup-specific treatment effects unclear. University of Zurich, EBPI Model-based Recursive Partitioning for Subgroup Analyses Page 31

34 Literature European Medicines Agency EMA Guideline on the investigation of subgroups in confirmatory clinical trials (draft) guideline/2014/02/wc pdf, Massachusetts General Hospital, Neurological Clinical Research Institute Pooled Resource Open-Access ALS Clinical Trials Database A. Zeileis, T. Hothorn, K. Hornik Model-Based Recursive Partitioning Journal of Computational and Graphical Statistics, W. Loh, X. He, M. Man A regression tree approach to identifying subgroups with differential treatment effects

Subgroup identification for dose-finding trials via modelbased recursive partitioning

Subgroup identification for dose-finding trials via modelbased recursive partitioning Subgroup identification for dose-finding trials via modelbased recursive partitioning Marius Thomas, Björn Bornkamp, Heidi Seibold & Torsten Hothorn ISCB 2017, Vigo July 10, 2017 Motivation Characterizing

More information

Model-based Recursive Partitioning

Model-based Recursive Partitioning Model-based Recursive Partitioning Achim Zeileis Torsten Hothorn Kurt Hornik http://statmath.wu-wien.ac.at/ zeileis/ Overview Motivation The recursive partitioning algorithm Model fitting Testing for parameter

More information

Parties, Models, Mobsters A New Implementation of Model-Based Recursive Partitioning in R

Parties, Models, Mobsters A New Implementation of Model-Based Recursive Partitioning in R Parties, Models, Mobsters A New Implementation of Model-Based Recursive Partitioning in R Achim Zeileis, Torsten Hothorn http://eeecon.uibk.ac.at/~zeileis/ Overview Model-based recursive partitioning A

More information

CoxFlexBoost: Fitting Structured Survival Models

CoxFlexBoost: Fitting Structured Survival Models CoxFlexBoost: Fitting Structured Survival Models Benjamin Hofner 1 Institut für Medizininformatik, Biometrie und Epidemiologie (IMBE) Friedrich-Alexander-Universität Erlangen-Nürnberg joint work with Torsten

More information

Model-based Recursive Partitioning

Model-based Recursive Partitioning Model-based Recursive Partitioning Achim Zeileis Wirtschaftsuniversität Wien Torsten Hothorn Ludwig-Maximilians-Universität München Kurt Hornik Wirtschaftsuniversität Wien Abstract Recursive partitioning

More information

Package palmtree. January 16, 2018

Package palmtree. January 16, 2018 Package palmtree January 16, 2018 Title Partially Additive (Generalized) Linear Model Trees Date 2018-01-15 Version 0.9-0 Description This is an implementation of model-based trees with global model parameters

More information

Correctly Compute Complex Samples Statistics

Correctly Compute Complex Samples Statistics PASW Complex Samples 17.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample

More information

Model-Based Regression Trees in Economics and the Social Sciences

Model-Based Regression Trees in Economics and the Social Sciences Model-Based Regression Trees in Economics and the Social Sciences Achim Zeileis http://statmath.wu.ac.at/~zeileis/ Overview Motivation: Trees and leaves Methodology Model estimation Tests for parameter

More information

Beyond the Assumption of Constant Hazard Rate in Estimating Incidence Rate on Current Status Data with Applications to Phase IV Cancer Trial

Beyond the Assumption of Constant Hazard Rate in Estimating Incidence Rate on Current Status Data with Applications to Phase IV Cancer Trial Beyond the Assumption of Constant Hazard Rate in Estimating Incidence Rate on Current Status Data with Applications to Phase IV Cancer Trial Deokumar Srivastava, Ph.D. Member Department of Biostatistics

More information

Standard error estimation in the EM algorithm when joint modeling of survival and longitudinal data

Standard error estimation in the EM algorithm when joint modeling of survival and longitudinal data Biostatistics (2014), 0, 0, pp. 1 24 doi:10.1093/biostatistics/mainmanuscript Standard error estimation in the EM algorithm when joint modeling of survival and longitudinal data Cong Xu, Paul D. Baines

More information

Package simsurv. May 18, 2018

Package simsurv. May 18, 2018 Type Package Title Simulate Survival Data Version 0.2.2 Date 2018-05-18 Package simsurv May 18, 2018 Maintainer Sam Brilleman Description Simulate survival times from standard

More information

Package DSBayes. February 19, 2015

Package DSBayes. February 19, 2015 Type Package Title Bayesian subgroup analysis in clinical trials Version 1.1 Date 2013-12-28 Copyright Ravi Varadhan Package DSBayes February 19, 2015 URL http: //www.jhsph.edu/agingandhealth/people/faculty_personal_pages/varadhan.html

More information

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Examples: Mixture Modeling With Cross-Sectional Data CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Mixture modeling refers to modeling with categorical latent variables that represent

More information

Optimal designs for comparing curves

Optimal designs for comparing curves Optimal designs for comparing curves Holger Dette, Ruhr-Universität Bochum Maria Konstantinou, Ruhr-Universität Bochum Kirsten Schorning, Ruhr-Universität Bochum FP7 HEALTH 2013-602552 Outline 1 Motivation

More information

Model-Based Recursive Partitioning

Model-Based Recursive Partitioning Overview Model-Based Recursive Partitioning Achim Zeileis Motivation: Trees and leaves Methodology Model estimation Tests for parameter instability Segmentation Pruning Applications Software Costly journals

More information

Parties, Models, Mobsters: A New Implementation of Model-Based Recursive Partitioning in R

Parties, Models, Mobsters: A New Implementation of Model-Based Recursive Partitioning in R Parties, Models, Mobsters: A New Implementation of Model-Based Recursive Partitioning in R Achim Zeileis Universität Innsbruck Torsten Hothorn Universität Zürich Abstract MOB is a generic algorithm for

More information

Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums

Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums José Garrido Department of Mathematics and Statistics Concordia University, Montreal EAJ 2016 Lyon, September

More information

Modelling Personalized Screening: a Step Forward on Risk Assessment Methods

Modelling Personalized Screening: a Step Forward on Risk Assessment Methods Modelling Personalized Screening: a Step Forward on Risk Assessment Methods Validating Prediction Models Inmaculada Arostegui Universidad del País Vasco UPV/EHU Red de Investigación en Servicios de Salud

More information

Product Catalog. AcaStat. Software

Product Catalog. AcaStat. Software Product Catalog AcaStat Software AcaStat AcaStat is an inexpensive and easy-to-use data analysis tool. Easily create data files or import data from spreadsheets or delimited text files. Run crosstabulations,

More information

A Comparison of Modeling Scales in Flexible Parametric Models. Noori Akhtar-Danesh, PhD McMaster University

A Comparison of Modeling Scales in Flexible Parametric Models. Noori Akhtar-Danesh, PhD McMaster University A Comparison of Modeling Scales in Flexible Parametric Models Noori Akhtar-Danesh, PhD McMaster University Hamilton, Canada daneshn@mcmaster.ca Outline Backgroundg A review of splines Flexible parametric

More information

Decision tree modeling using R

Decision tree modeling using R Big-data Clinical Trial Column Page 1 of Decision tree modeling using R Zhongheng Zhang Department of Critical Care Medicine, Jinhua Municipal Central Hospital, Jinhua Hospital of Zhejiang University,

More information

Variable Selection 6.783, Biomedical Decision Support

Variable Selection 6.783, Biomedical Decision Support 6.783, Biomedical Decision Support (lrosasco@mit.edu) Department of Brain and Cognitive Science- MIT November 2, 2009 About this class Why selecting variables Approaches to variable selection Sparsity-based

More information

The potential of model-based recursive partitioning in the social sciences Revisiting Ockham s Razor

The potential of model-based recursive partitioning in the social sciences Revisiting Ockham s Razor SUPPLEMENT TO The potential of model-based recursive partitioning in the social sciences Revisiting Ockham s Razor Julia Kopf, Thomas Augustin, Carolin Strobl This supplement is designed to illustrate

More information

4.5 The smoothed bootstrap

4.5 The smoothed bootstrap 4.5. THE SMOOTHED BOOTSTRAP 47 F X i X Figure 4.1: Smoothing the empirical distribution function. 4.5 The smoothed bootstrap In the simple nonparametric bootstrap we have assumed that the empirical distribution

More information

JMP Clinical. Release Notes. Version 5.0

JMP Clinical. Release Notes. Version 5.0 JMP Clinical Version 5.0 Release Notes Creativity involves breaking out of established patterns in order to look at things in a different way. Edward de Bono JMP, A Business Unit of SAS SAS Campus Drive

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Package quint. R topics documented: September 28, Type Package Title Qualitative Interaction Trees Version 2.0.

Package quint. R topics documented: September 28, Type Package Title Qualitative Interaction Trees Version 2.0. Type Package Title Qualitative Interaction Trees Version 2.0.0 Date 2018-09-25 Package quint September 28, 2018 Maintainer Elise Dusseldorp Grows a qualitative interaction

More information

Poisson Regression and Model Checking

Poisson Regression and Model Checking Poisson Regression and Model Checking Readings GH Chapter 6-8 September 27, 2017 HIV & Risk Behaviour Study The variables couples and women_alone code the intervention: control - no counselling (both 0)

More information

Correctly Compute Complex Samples Statistics

Correctly Compute Complex Samples Statistics SPSS Complex Samples 15.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample

More information

An Ensemble Method for Interval-Censored Time-to-Event Data

An Ensemble Method for Interval-Censored Time-to-Event Data An Ensemble Method for Interval-Censored Time-to-Event Data Weichi Yao 1, Halina Frydman 2, and Jeffrey S. Simonoff 3 arxiv:1901.04599v1 [stat.me] 14 Jan 2019 1 Stern School of Business, New York University

More information

Package sp23design. R topics documented: February 20, Type Package

Package sp23design. R topics documented: February 20, Type Package Type Package Package sp23design February 20, 2015 Title Design and Simulation of seamless Phase II-III Clinical Trials Version 0.9 Date 2012-01-01 Author Balasubramanian Narasimhan [aut, cre], Mei-Chiung

More information

Preparing for Data Analysis

Preparing for Data Analysis Preparing for Data Analysis Prof. Andrew Stokes March 27, 2018 Managing your data Entering the data into a database Reading the data into a statistical computing package Checking the data for errors and

More information

Bayesian network data imputation with application to survival tree analysis

Bayesian network data imputation with application to survival tree analysis Bayesian network data imputation with application to survival tree analysis Paola M.V. Rancoita a,b,c,, Marco Zaffalon b, Emanuele Zucca d, Francesco Bertoni c,d, Cassio P. de Campos e a University Centre

More information

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 9 Recursive Partitioning: Predicting Body Fat and Glaucoma Diagnosis 9. Introduction 9.2 Recursive Partitioning

More information

Beta Regression: Shaken, Stirred, Mixed, and Partitioned. Achim Zeileis, Francisco Cribari-Neto, Bettina Grün

Beta Regression: Shaken, Stirred, Mixed, and Partitioned. Achim Zeileis, Francisco Cribari-Neto, Bettina Grün Beta Regression: Shaken, Stirred, Mixed, and Partitioned Achim Zeileis, Francisco Cribari-Neto, Bettina Grün Overview Motivation Shaken or stirred: Single or double index beta regression for mean and/or

More information

CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT

CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT This chapter provides step by step instructions on how to define and estimate each of the three types of LC models (Cluster, DFactor or Regression) and also

More information

186 Statistics, Data Analysis and Modeling. Proceedings of MWSUG '95

186 Statistics, Data Analysis and Modeling. Proceedings of MWSUG '95 A Statistical Analysis Macro Library in SAS Carl R. Haske, Ph.D., STATPROBE, nc., Ann Arbor, M Vivienne Ward, M.S., STATPROBE, nc., Ann Arbor, M ABSTRACT Statistical analysis plays a major role in pharmaceutical

More information

Optimal Two-Stage Adaptive Enrichment Designs, using Sparse Linear Programming

Optimal Two-Stage Adaptive Enrichment Designs, using Sparse Linear Programming Optimal Two-Stage Adaptive Enrichment Designs, using Sparse Linear Programming Michael Rosenblum Johns Hopkins Bloomberg School of Public Health Joint work with Xingyuan (Ethan) Fang and Han Liu Princeton

More information

Pooling Clinical Data: Key points and Pitfalls. October 16, 2012 Phuse 2012 conference, Budapest Florence Buchheit

Pooling Clinical Data: Key points and Pitfalls. October 16, 2012 Phuse 2012 conference, Budapest Florence Buchheit Pooling Clinical Data: Key points and Pitfalls October 16, 2012 Phuse 2012 conference, Budapest Florence Buchheit Introduction Are there any pre-defined rules to pool clinical data? Are there any pre-defined

More information

Chapter 6: Linear Model Selection and Regularization

Chapter 6: Linear Model Selection and Regularization Chapter 6: Linear Model Selection and Regularization As p (the number of predictors) comes close to or exceeds n (the sample size) standard linear regression is faced with problems. The variance of the

More information

Conquering Massive Clinical Models with GPU. GPU Parallelized Logistic Regression

Conquering Massive Clinical Models with GPU. GPU Parallelized Logistic Regression Conquering Massive Clinical Models with GPU Parallelized Logistic Regression M.D./Ph.D. candidate in Biomathematics University of California, Los Angeles Joint Statistical Meetings Vancouver, Canada, July

More information

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 2015 MODULE 4 : Modelling experimental data Time allowed: Three hours Candidates should answer FIVE questions. All questions carry equal

More information

Module I: Clinical Trials a Practical Guide to Design, Analysis, and Reporting 1. Fundamentals of Trial Design

Module I: Clinical Trials a Practical Guide to Design, Analysis, and Reporting 1. Fundamentals of Trial Design Module I: Clinical Trials a Practical Guide to Design, Analysis, and Reporting 1. Fundamentals of Trial Design Randomized the Clinical Trails About the Uncontrolled Trails The protocol Development The

More information

LORET: Logistic Regression Trees Mob for Discrete Category Models

LORET: Logistic Regression Trees Mob for Discrete Category Models LORET: Logistic Regression Trees Mob for Discrete Category Models Contents Discrete Category Models Logistic Models Segmentation with Trees Mob Algorithm Technical Aspects of Logistic Regression LORET

More information

Package citools. October 20, 2018

Package citools. October 20, 2018 Type Package Package citools October 20, 2018 Title Confidence or Prediction Intervals, Quantiles, and Probabilities for Statistical Models Version 0.5.0 Maintainer John Haman Functions

More information

Statistical Matching using Fractional Imputation

Statistical Matching using Fractional Imputation Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:

More information

Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients

Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients 1 Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients 1,2 Keyue Ding, Ph.D. Nov. 8, 2014 1 NCIC Clinical Trials Group, Kingston, Ontario, Canada 2 Dept. Public

More information

Fast or furious? - User analysis of SF Express Inc

Fast or furious? - User analysis of SF Express Inc CS 229 PROJECT, DEC. 2017 1 Fast or furious? - User analysis of SF Express Inc Gege Wen@gegewen, Yiyuan Zhang@yiyuan12, Kezhen Zhao@zkz I. MOTIVATION The motivation of this project is to predict the likelihood

More information

Predictive Checking. Readings GH Chapter 6-8. February 8, 2017

Predictive Checking. Readings GH Chapter 6-8. February 8, 2017 Predictive Checking Readings GH Chapter 6-8 February 8, 2017 Model Choice and Model Checking 2 Questions: 1. Is my Model good enough? (no alternative models in mind) 2. Which Model is best? (comparison

More information

Multiple imputation using chained equations: Issues and guidance for practice

Multiple imputation using chained equations: Issues and guidance for practice Multiple imputation using chained equations: Issues and guidance for practice Ian R. White, Patrick Royston and Angela M. Wood http://onlinelibrary.wiley.com/doi/10.1002/sim.4067/full By Gabrielle Simoneau

More information

Introduction to hypothesis testing

Introduction to hypothesis testing Introduction to hypothesis testing Mark Johnson Macquarie University Sydney, Australia February 27, 2017 1 / 38 Outline Introduction Hypothesis tests and confidence intervals Classical hypothesis tests

More information

Convexization in Markov Chain Monte Carlo

Convexization in Markov Chain Monte Carlo in Markov Chain Monte Carlo 1 IBM T. J. Watson Yorktown Heights, NY 2 Department of Aerospace Engineering Technion, Israel August 23, 2011 Problem Statement MCMC processes in general are governed by non

More information

Lecture 1: Statistical Reasoning 2. Lecture 1. Simple Regression, An Overview, and Simple Linear Regression

Lecture 1: Statistical Reasoning 2. Lecture 1. Simple Regression, An Overview, and Simple Linear Regression Lecture Simple Regression, An Overview, and Simple Linear Regression Learning Objectives In this set of lectures we will develop a framework for simple linear, logistic, and Cox Proportional Hazards Regression

More information

Package sprinter. February 20, 2015

Package sprinter. February 20, 2015 Type Package Package sprinter February 20, 2015 Title Framework for Screening Prognostic Interactions Version 1.1.0 Date 2014-04-11 Author Isabell Hoffmann Maintainer Isabell Hoffmann

More information

Unified Methods for Censored Longitudinal Data and Causality

Unified Methods for Censored Longitudinal Data and Causality Mark J. van der Laan James M. Robins Unified Methods for Censored Longitudinal Data and Causality Springer Preface v Notation 1 1 Introduction 8 1.1 Motivation, Bibliographic History, and an Overview of

More information

Chapter 5. Tree-based Methods

Chapter 5. Tree-based Methods Chapter 5. Tree-based Methods Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Regression

More information

Credit card Fraud Detection using Predictive Modeling: a Review

Credit card Fraud Detection using Predictive Modeling: a Review February 207 IJIRT Volume 3 Issue 9 ISSN: 2396002 Credit card Fraud Detection using Predictive Modeling: a Review Varre.Perantalu, K. BhargavKiran 2 PG Scholar, CSE, Vishnu Institute of Technology, Bhimavaram,

More information

Package ICsurv. February 19, 2015

Package ICsurv. February 19, 2015 Package ICsurv February 19, 2015 Type Package Title A package for semiparametric regression analysis of interval-censored data Version 1.0 Date 2014-6-9 Author Christopher S. McMahan and Lianming Wang

More information

A toolkit for stability assessment of tree-based learners

A toolkit for stability assessment of tree-based learners A toolkit for stability assessment of tree-based learners Michel Philipp, University of Zurich, Michel.Philipp@psychologie.uzh.ch Achim Zeileis, Universität Innsbruck, Achim.Zeileis@R-project.org Carolin

More information

Equivalence of dose-response curves

Equivalence of dose-response curves Equivalence of dose-response curves Holger Dette, Ruhr-Universität Bochum Kathrin Möllenhoff, Ruhr-Universität Bochum Stanislav Volgushev, University of Toronto Frank Bretz, Novartis Basel FP7 HEALTH 2013-602552

More information

Ludwig Fahrmeir Gerhard Tute. Statistical odelling Based on Generalized Linear Model. íecond Edition. . Springer

Ludwig Fahrmeir Gerhard Tute. Statistical odelling Based on Generalized Linear Model. íecond Edition. . Springer Ludwig Fahrmeir Gerhard Tute Statistical odelling Based on Generalized Linear Model íecond Edition. Springer Preface to the Second Edition Preface to the First Edition List of Examples List of Figures

More information

JMP Book Descriptions

JMP Book Descriptions JMP Book Descriptions The collection of JMP documentation is available in the JMP Help > Books menu. This document describes each title to help you decide which book to explore. Each book title is linked

More information

Preparing for Data Analysis

Preparing for Data Analysis Preparing for Data Analysis Prof. Andrew Stokes March 21, 2017 Managing your data Entering the data into a database Reading the data into a statistical computing package Checking the data for errors and

More information

VCE Further Mathematics Units 3&4

VCE Further Mathematics Units 3&4 Trial Examination 06 VCE Further Mathematics Units 3&4 Written Examination Suggested Solutions Neap Trial Exams are licensed to be photocopied or placed on the school intranet and used only within the

More information

in this course) ˆ Y =time to event, follow-up curtailed: covered under ˆ Missing at random (MAR) a

in this course) ˆ Y =time to event, follow-up curtailed: covered under ˆ Missing at random (MAR) a Chapter 3 Missing Data 3.1 Types of Missing Data ˆ Missing completely at random (MCAR) ˆ Missing at random (MAR) a ˆ Informative missing (non-ignorable non-response) See 1, 38, 59 for an introduction to

More information

General Factorial Models

General Factorial Models In Chapter 8 in Oehlert STAT:5201 Week 9 - Lecture 2 1 / 34 It is possible to have many factors in a factorial experiment. In DDD we saw an example of a 3-factor study with ball size, height, and surface

More information

!"# $ # # $ $ % $ &% $ '"# $ ()&*&)+(( )+(( )

!# $ # # $ $ % $ &% $ '# $ ()&*&)+(( )+(( ) !"# # # % &% '"# ) !#, ' "# " "# -. / # 0 0 0 0 0 "0 "# " # 1 #! " " 0 0 0 0 0 0 2# 0 # # 3 ' 4 56 7-56 87 9# 5 6 7 6 & 0 " : 9 ; 4 " #! 0 - '% # % "# " "# " < 4 "! % " % 4 % % 9# 4 56 87 = 4 > 0 " %!#

More information

Gaining Insight with Recursive Partitioning of Generalized Linear Models

Gaining Insight with Recursive Partitioning of Generalized Linear Models Gaining Insight with Recursive Partitioning of Generalized Linear Models Thomas Rusch WU Wirtschaftsuniversität Wien Achim Zeileis Universität Innsbruck Abstract Recursive partitioning algorithms separate

More information

Statistics 202: Data Mining. c Jonathan Taylor. Outliers Based in part on slides from textbook, slides of Susan Holmes.

Statistics 202: Data Mining. c Jonathan Taylor. Outliers Based in part on slides from textbook, slides of Susan Holmes. Outliers Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Concepts What is an outlier? The set of data points that are considerably different than the remainder of the

More information

PSS weighted analysis macro- user guide

PSS weighted analysis macro- user guide Description and citation: This macro performs propensity score (PS) adjusted analysis using stratification for cohort studies from an analytic file containing information on patient identifiers, exposure,

More information

A RECURSIVE PARTITIONING APPROACH TO HOSPITAL CASE MIX CLASSIFICATION

A RECURSIVE PARTITIONING APPROACH TO HOSPITAL CASE MIX CLASSIFICATION Alma Mater Studiorum Università di Bologna DOTTORATO DI RICERCA IN SCIENZE STATISTICHE Ciclo XXIX Settore Concorsuale di afferenza: 13/D1 Settore Scientifico disciplinare: SECS-S/02 A RECURSIVE PARTITIONING

More information

Modelling Proportions and Count Data

Modelling Proportions and Count Data Modelling Proportions and Count Data Rick White May 4, 2016 Outline Analysis of Count Data Binary Data Analysis Categorical Data Analysis Generalized Linear Models Questions Types of Data Continuous data:

More information

Alex Schaefer. April 9, 2017

Alex Schaefer. April 9, 2017 Department of Mathematics Binghamton University April 9, 2017 Outline 1 Introduction 2 Cycle Vector Space 3 Permutability 4 A Characterization Introduction Outline 1 Introduction 2 Cycle Vector Space 3

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

Random forest methodology for model-based recursive partitioning: the mobforest package for R

Random forest methodology for model-based recursive partitioning: the mobforest package for R Garge et al. BMC Bioinformatics 2013, 14:125 SOFTWARE Open Access Random forest methodology for model-based recursive partitioning: the mobforest package for R Nikhil R Garge *, Georgiy Bobashev and Barry

More information

Fitting latency models using B-splines in EPICURE for DOS

Fitting latency models using B-splines in EPICURE for DOS Fitting latency models using B-splines in EPICURE for DOS Michael Hauptmann, Jay Lubin January 11, 2007 1 Introduction Disease latency refers to the interval between an increment of exposure and a subsequent

More information

Package PTE. October 10, 2017

Package PTE. October 10, 2017 Type Package Title Personalized Treatment Evaluator Version 1.6 Date 2017-10-9 Package PTE October 10, 2017 Author Adam Kapelner, Alina Levine & Justin Bleich Maintainer Adam Kapelner

More information

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

Regression. Dr. G. Bharadwaja Kumar VIT Chennai Regression Dr. G. Bharadwaja Kumar VIT Chennai Introduction Statistical models normally specify how one set of variables, called dependent variables, functionally depend on another set of variables, called

More information

A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM

A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM Jayawant Mandrekar, Daniel J. Sargent, Paul J. Novotny, Jeff A. Sloan Mayo Clinic, Rochester, MN 55905 ABSTRACT A general

More information

DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R

DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R Lee, Rönnegård & Noh LRN@du.se Lee, Rönnegård & Noh HGLM book 1 / 24 Overview 1 Background to the book 2 Crack growth example 3 Contents

More information

PharmaSUG China. model to include all potential prognostic factors and exploratory variables, 2) select covariates which are significant at

PharmaSUG China. model to include all potential prognostic factors and exploratory variables, 2) select covariates which are significant at PharmaSUG China A Macro to Automatically Select Covariates from Prognostic Factors and Exploratory Factors for Multivariate Cox PH Model Yu Cheng, Eli Lilly and Company, Shanghai, China ABSTRACT Multivariate

More information

CART. Classification and Regression Trees. Rebecka Jörnsten. Mathematical Sciences University of Gothenburg and Chalmers University of Technology

CART. Classification and Regression Trees. Rebecka Jörnsten. Mathematical Sciences University of Gothenburg and Chalmers University of Technology CART Classification and Regression Trees Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology CART CART stands for Classification And Regression Trees.

More information

Modelling Proportions and Count Data

Modelling Proportions and Count Data Modelling Proportions and Count Data Rick White May 5, 2015 Outline Analysis of Count Data Binary Data Analysis Categorical Data Analysis Generalized Linear Models Questions Types of Data Continuous data:

More information

Package samplesizelogisticcasecontrol

Package samplesizelogisticcasecontrol Package samplesizelogisticcasecontrol February 4, 2017 Title Sample Size Calculations for Case-Control Studies Version 0.0.6 Date 2017-01-31 Author Mitchell H. Gail To determine sample size for case-control

More information

Voxel selection algorithms for fmri

Voxel selection algorithms for fmri Voxel selection algorithms for fmri Henryk Blasinski December 14, 2012 1 Introduction Functional Magnetic Resonance Imaging (fmri) is a technique to measure and image the Blood- Oxygen Level Dependent

More information

RCASPAR: A package for survival time analysis using high-dimensional data

RCASPAR: A package for survival time analysis using high-dimensional data RCASPAR: A package for survival time analysis using high-dimensional data Douaa AS Mugahid October 30, 2017 1 Introduction RCASPAR allows the utilization of high-dimensional biological data as covariates

More information

A TEXT MINER ANALYSIS TO COMPARE INTERNET AND MEDLINE INFORMATION ABOUT ALLERGY MEDICATIONS Chakib Battioui, University of Louisville, Louisville, KY

A TEXT MINER ANALYSIS TO COMPARE INTERNET AND MEDLINE INFORMATION ABOUT ALLERGY MEDICATIONS Chakib Battioui, University of Louisville, Louisville, KY Paper # DM08 A TEXT MINER ANALYSIS TO COMPARE INTERNET AND MEDLINE INFORMATION ABOUT ALLERGY MEDICATIONS Chakib Battioui, University of Louisville, Louisville, KY ABSTRACT Recently, the internet has become

More information

Score function estimator and variance reduction techniques

Score function estimator and variance reduction techniques and variance reduction techniques Wilker Aziz University of Amsterdam May 24, 2018 Wilker Aziz Discrete variables 1 Outline 1 2 3 Wilker Aziz Discrete variables 1 Variational inference for belief networks

More information

CSE 546 Machine Learning, Autumn 2013 Homework 2

CSE 546 Machine Learning, Autumn 2013 Homework 2 CSE 546 Machine Learning, Autumn 2013 Homework 2 Due: Monday, October 28, beginning of class 1 Boosting [30 Points] We learned about boosting in lecture and the topic is covered in Murphy 16.4. On page

More information

Motivating Example. Missing Data Theory. An Introduction to Multiple Imputation and its Application. Background

Motivating Example. Missing Data Theory. An Introduction to Multiple Imputation and its Application. Background An Introduction to Multiple Imputation and its Application Craig K. Enders University of California - Los Angeles Department of Psychology cenders@psych.ucla.edu Background Work supported by Institute

More information

Assessing the Quality of the Natural Cubic Spline Approximation

Assessing the Quality of the Natural Cubic Spline Approximation Assessing the Quality of the Natural Cubic Spline Approximation AHMET SEZER ANADOLU UNIVERSITY Department of Statisticss Yunus Emre Kampusu Eskisehir TURKEY ahsst12@yahoo.com Abstract: In large samples,

More information

Leveraging multiple endpoints in small clinical trials

Leveraging multiple endpoints in small clinical trials Leveraging multiple endpoints in small clinical trials Robin Ristl Section for Medical Statistics, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna EMA FP7

More information

Splitting the follow-up C&H 6

Splitting the follow-up C&H 6 Splitting the follow-up C&H 6 Bendix Carstensen Steno Diabetes Center & Department of Biostatistics, University of Copenhagen bxc@steno.dk www.biostat.ku.dk/~bxc PhD-course in Epidemiology, Department

More information

Frequentist and Bayesian Interim Analysis in Clinical Trials: Group Sequential Testing and Posterior Predictive Probability Monitoring Using SAS

Frequentist and Bayesian Interim Analysis in Clinical Trials: Group Sequential Testing and Posterior Predictive Probability Monitoring Using SAS MWSUG 2016 - Paper PH06 Frequentist and Bayesian Interim Analysis in Clinical Trials: Group Sequential Testing and Posterior Predictive Probability Monitoring Using SAS Kechen Zhao, University of Southern

More information

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential

More information

A SAS and Java Application for Reporting Clinical Trial Data. Kevin Kane MSc Infoworks (Data Handling) Limited

A SAS and Java Application for Reporting Clinical Trial Data. Kevin Kane MSc Infoworks (Data Handling) Limited A SAS and Java Application for Reporting Clinical Trial Data Kevin Kane MSc Infoworks (Data Handling) Limited Reporting Clinical Trials Is Resource Intensive! Reporting a clinical trial program for a new

More information

An imputation approach for analyzing mixed-mode surveys

An imputation approach for analyzing mixed-mode surveys An imputation approach for analyzing mixed-mode surveys Jae-kwang Kim 1 Iowa State University June 4, 2013 1 Joint work with S. Park and S. Kim Ouline Introduction Proposed Methodology Application to Private

More information

Robust Kernel Methods in Clustering and Dimensionality Reduction Problems

Robust Kernel Methods in Clustering and Dimensionality Reduction Problems Robust Kernel Methods in Clustering and Dimensionality Reduction Problems Jian Guo, Debadyuti Roy, Jing Wang University of Michigan, Department of Statistics Introduction In this report we propose robust

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 5: Maximum Likelihood Estimation; Parameter Uncertainty Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk

More information