DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R

Size: px

Start display at page:

Download "DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R"

Myron Campbell
5 years ago
Views:

1 DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R Lee, Rönnegård & Noh LRN@du.se Lee, Rönnegård & Noh HGLM book 1 / 24

2 Overview 1 Background to the book 2 Crack growth example 3 Contents 4 A last example - Street magician Lee, Rönnegård & Noh HGLM book 2 / 24

3 Development of books Lee, Rönnegård & Noh HGLM book 3 / 24

4 Model checking plots in linear regression y = Xβ + e Normal Q Q 12 Standardized residuals Theoretical Quantiles lm(y ~ x) Lee, Rönnegård & Noh HGLM book 4 / 24

5 Regression models using other distributions Gaussian L = n i=1 μ = Xβ 1 e 1 2σ 2 y i μ 2 2πσ Poisson n μ L = y ie μ y i! i=1 log(μ) = Xβ Binomial n n L = y μ y i(1 μ) n y i i i=1 logit(μ) = Xβ Gamma n 1 L = Γ(k) θ ky i k 1 e y i θ i=1 kθ μ; 1/μ = Xβ Lee, Rönnegård & Noh HGLM book 5 / 24

6 Generalized Linear Models Gaussian L = n i=1 μ = Xβ 1 e 1 2σ 2 y i μ 2 2πσ Poisson n μ L = y ie μ y i! i=1 log(μ) = Xβ Binomial n n L = y μ y i(1 μ) n y i i i=1 logit(μ) = Xβ Gamma n 1 L = Γ(k) θ ky i k 1 e y i θ i=1 kθ μ; 1/μ = Xβ GLM Common estimation algorithm using iterative regression - Fast and easy to implement - Linear regression model checking tools!!! Lee, Rönnegård & Noh HGLM book 6 / 24

7 Hierarchical Generalized Linear Models Lee Y. & Nelder J. A. (1996) GLM approach for fitting Linear mixed models Generalized linear mixed models (Laplace approximation) Mixed models with non-gaussian random effects Above models + dispersion modelling Lee, Rönnegård & Noh HGLM book 7 / 24

8 Hierarchical Generalized Linear Models Linear model with random effects y = Xβ + Zu + e u N(0, σu) 2 e N(0, σe) 2 Random effects Repeated measurements (on the same subject) the most common application The parameter is assumed to be sampled from a population of parameters (not fixed) The variance component σ 2 u is of interest Estimating û can be of interest for: prediction (including spatial Kriging) and ranking subjects. Lee, Rönnegård & Noh HGLM book 8 / 24

9 Hierarchical Generalized Linear Models Linear model y = Xβ + e e N(0, σ 2 e) Linear model including a dispersion model y = Xβ + e e N(0, exp(x d β d )) Lee, Rönnegård & Noh HGLM book 9 / 24

10 Hierarchical Generalized Linear Models Linear model Linear mixed model (LMM) Generalized linear model (GLM) Generalized linear mixed model (GLMM) Generalized linear model including Gaussian random effects Joint GLM Generalized linear model including dispersion model with fixed effects Hierarchical GLM (HGLM) Generalized linear model including Gaussian and/or non-gaussian random effects. Dispersion can be modelled using fixed effects. Lee, Rönnegård & Noh HGLM book 10 / 24

11 Hierarchical Generalized Linear Models Linear model Linear mixed model (LMM) Generalized linear model (GLM) Factor analysis Generalized linear mixed model (GLMM) Generalized linear model including Gaussian random effects Joint GLM Generalized linear model including dispersion model with fixed effects Structural Equation Models (SEM) Hierarchical GLM (HGLM) Generalized linear model including Gaussian and/or non-gaussian random effects. Dispersion can be modelled using fixed effects. HGLMs with correlated random effects Including spatial, temporal correlations, splines, GAM. Double HGLM (DHGLM) HGLM including dispersion model with both fixed and random effects Frailty HGLM HGLMs for survival analysis including competing risk models Lee, Rönnegård & Noh HGLM book 11 / 24

12 A motivating example from my own research An important field of application for random effect estimation is animal breeding where the animals are ranked by their genetic potential given by the estimated random effects in a linear mixed model y = Xβ + Za + e The outcome y can for instance be litter size in pigs or milk yield in dairy cows, and the random effects a N(0, σ 2 aa) have a correlation matrix A computed from pedigree information (see e.g., Chapter 17 of Pawitan (2001)). The ranking is based on the best linear unbiased predictor â, referred to as estimated breeding values (for the mean), and by selecting animals with high estimated breeding values the mean of the response variable is increased for each generation of selection. Lee, Rönnegård & Noh HGLM book 12 / 24

13 A motivating example from my own research An important field of application for random effect estimation is animal breeding where the animals are ranked by their genetic potential given by the estimated random effects in a linear mixed model y = Xβ + Za + e The outcome y can for instance be litter size in pigs or milk yield in dairy cows, and the random effects a N(0, σ 2 aa) have a correlation matrix A computed from pedigree information (see e.g., Chapter 17 of Pawitan (2001)). The ranking is based on the best linear unbiased predictor â, referred to as estimated breeding values (for the mean), and by selecting animals with high estimated breeding values the mean of the response variable is increased for each generation of selection. Possible to estimate breeding values for the variance! Lee, Rönnegård & Noh HGLM book 12 / 24

14 12 INTRODUCTION Figure 1.4 Distributions of observed litter sizes and the number of observations per sow.

15 Crack growth data Hudak et al. (1978) crack lengths measured on a compact tension steel. 21 metallic specimens, crack lengths recorded every 104 cycles y = increment of crack length crack0 = covariate for the mean part of the model cycle = covariate for the dispersion part of the model Lee, Rönnegård & Noh HGLM book 13 / 24

16 Crack growth data Lee, Rönnegård & Noh HGLM book 14 / 24

17 Crack growth - Using the hglm package res_glm <- glm(y ~ crack0, family= Gamma(link=log), data= data_crack_growth ) Lee, Rönnegård & Noh HGLM book 15 / 24

18 Crack growth - Using the hglm package res_glm <- glm(y ~ crack0, family= Gamma(link=log), data= data_crack_growth ) library(hglm) ## GLMM ## res_glmm <- hglm2(y ~ crack0 + (1 specimen), family= Gamma(link=log), data= data_crack_growth ) Lee, Rönnegård & Noh HGLM book 15 / 24

19 Crack growth - Using the hglm package res_glm <- glm(y ~ crack0, family= Gamma(link=log), data= data_crack_growth ) library(hglm) ## GLMM ## res_glmm <- hglm2(y ~ crack0 + (1 specimen), family= Gamma(link=log), data= data_crack_growth ) ## GLMM with dispersion model## res_glmm2 <- hglm2(y ~ crack0 + (1 specimen), family= Gamma(link=log), data= data_crack_growth, disp= ~cycle ) Lee, Rönnegård & Noh HGLM book 15 / 24

20 Crack growth - Using the dhglm package library(dhglm) ## HGLM I ## model_mu <- DHGLMMODELING(Model="mean", Link="log", LinPred = y ~ crack0 + (1 specimen), RandDist = "inverse-gamma") model_phi <- DHGLMMODELING(Model="dispersion") res_hglm1 <- dhglmfit(respdist="gamma", DataMain=data_crack_growth, MeanModel=model_mu, DispersionModel=model_phi) Lee, Rönnegård & Noh HGLM book 16 / 24

21 Crack growth - Using the dhglm package library(dhglm) ## HGLM I ## model_mu <- DHGLMMODELING(Model="mean", Link="log", LinPred = y ~ crack0 + (1 specimen), RandDist = "inverse-gamma") model_phi <- DHGLMMODELING(Model="dispersion") res_hglm1 <- dhglmfit(respdist="gamma", DataMain=data_crack_growth, MeanModel=model_mu, DispersionModel=model_phi) ## HGLM II ## model_mu <- DHGLMMODELING(Model="mean", Link="log", LinPred = y ~ crack0 + (1 specimen), RandDist="inverse-gamma") model_phi <- DHGLMMODELING(Model = "dispersion", Link = "log", LinPred = phi ~ cycle) res_hglm2 <- dhglmfit(respdist = "gamma", DataMain = data_crack_growth, MeanModel = model_mu, DispersionModel = model_phi) Lee, Rönnegård & Noh HGLM book 16 / 24

22 Crack growth - Using the dhglm package ## DHGLM I ## model_mu <- DHGLMMODELING(Model="mean", Link="log", LinPred = y ~ crack0 + (1 specimen), RandDist="inverse-gamma") model_phi <- DHGLMMODELING(Model="dispersion", Link="log", LinPred = phi ~ cycle + (1 specimen), RandDist="gaussian") res_dhglm1 <- dhglmfit(respdist = "gamma", DataMain = data_crack_growth, MeanModel = model_mu, DispersionModel = model_phi) Lee, Rönnegård & Noh HGLM book 17 / 24

23 Crack growth data - results HGLM I HGLM II DHGLM I Lee, Rönnegård & Noh HGLM book 18 / 24

24 Crack growth data - estimates Estimates from the model(mu) y ~ crack0 + (1 specimen) [1] "log" Estimate Std. Error t-value (Intercept) crack Estimates for logarithm of lambda=var(u_mu) Estimate Std. Error t-value specimen Estimates from the model(phi) phi ~ cycle + (1 specimen) [1] "log" Estimate Std. Error t-value (Intercept) cycle Estimates for logarithm of var(u_phi) Estimate Std. Error t-value specimen Lee, Rönnegård & Noh HGLM book 19 / 24

25 Contents List of notations ix Preface xi 1 Introduction Motivating examples Regarding criticisms of the h-likelihood R code Exercises 17 2 GLMs via iterative weighted least squares Examples R code Fisher s classical likelihood Iterative weighted least squares Model checking using residual plots Hat values Exercises 34 3 Inference for models with unobservables Examples R code Likelihood inference for random effects 44 v

26 vi CONTENTS 3.4 Extended likelihood principle Laplace approximations for the integrals Street magician H-likelihood and empirical Bayes Exercises 62 4 HGLMs: from method to algorithm Examples R code IWLS algorithm for interconnected GLMs IWLS algorithm for augmented GLM IWLS algorithm for HGLMs Estimation algorithm for a Poisson GLMM Exercises 91 5 HGLMs modeling in R Examples R code Exercises Double HGLMs - using the dhglm package Model description for DHGLMs Examples An extension of linear mixed models via DHGLM Implementation in the dhglm package Exercises Fitting multivariate HGLMs Examples Implementation in the mdhglm package Exercises 220

27 CONTENTS vii 8 Survival analysis Examples Competing risk models Comparison with alternative R procedures H-likelihood theory for the frailty model Running the frailtyhl package Exercises Joint models Examples H-likelihood approach to joint models Exercises Further topics Examples Variable selections Examples Hypothesis testing 291 References 299 Data Index 311 Author Index 313 Subject Index 317

28 Hierarchical Likelihood The hierarchical likelihood for observations y, fixed parameters θ and random effects v is defined as H(θ, v; y) = f θ (y v)f θ (v) Inference of fixed effects: H(θ, v; y)dv Inference of random effects: H(θ, v; y) Inference of dispersion parameters: f θ (y ˆβ), i.e. REML Lee, Rönnegård & Noh HGLM book 20 / 24

29 Street magician Here an example is presented to illustrate the fundamental idea of likelihood inference and how it may differ from Bayesian inference. You walk down the street and come across a street magician. He has a small bag with a number of dice. There are two types of dice in the bag; white ones and blue ones. The white are numbered 1 to 6, while the blue have three sides with 1 s and three sides with 2 s. The magician draws a die at random from the bag without showing it to you and rolls the die. He claims that the number that turns up is a 2. Which type of die would you guess he has rolled, a white or a blue one? Lee, Rönnegård & Noh HGLM book 21 / 24

30 Street magician - extended Instead of drawing a die, the street magician generated a random variable v from N(0, 1). Suppose that he observed v = v 0. Then without telling the value of realized value v 0, he generates random variable Y from N(v 0, 1). Then, he informs us the value y 0. Can we make an inference about v 0 given the observed data y o? Lee, Rönnegård & Noh HGLM book 22 / 24

31 Development of books Lee, Rönnegård & Noh HGLM book 23 / 24

32 Thank you! All R code included in the book available at: Lee, Rönnegård & Noh HGLM book 24 / 24

DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R

DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R Lee, Rönnegård & Noh LRN@du.se Lee, Rönnegård & Noh HGLM book 1 / 25 Overview 1 Background to the book 2 A motivating example from my own