Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2)

Similar documents
Nonlinearity and Generalized Additive Models Lecture 2

Nonparametric Regression and Generalized Additive Models Part I

Regression III: Advanced Methods

Generalized additive models I

7. Collinearity and Model Selection

Generalized Additive Model

Splines and penalized regression

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric)

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.

Nonparametric Regression

Generalized Additive Models

Economics Nonparametric Econometrics

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

Linear Methods for Regression and Shrinkage Methods

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Non-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel

Multiple Linear Regression

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools

Statistics & Analysis. A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects

STAT 705 Introduction to generalized additive models

Generalized Additive Models

Moving Beyond Linearity

Nonparametric Approaches to Regression

Lecture 7: Linear Regression (continued)

Two-Stage Least Squares

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

Using Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers

Moving Beyond Linearity

PSY 9556B (Feb 5) Latent Growth Modeling

Lecture 17: Smoothing splines, Local Regression, and GAMs

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

Unit 5: Estimating with Confidence

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business

Stat 8053, Fall 2013: Additive Models

Minitab 17 commands Prepared by Jeffrey S. Simonoff

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann

Chapter 7: Dual Modeling in the Presence of Constant Variance

One Factor Experiments

Nonparametric Risk Attribution for Factor Models of Portfolios. October 3, 2017 Kellie Ottoboni

SELECTION OF A MULTIVARIATE CALIBRATION METHOD

The theory of the linear model 41. Theorem 2.5. Under the strong assumptions A3 and A5 and the hypothesis that

Four equations are necessary to evaluate these coefficients. Eqn

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave.

Introduction to Mixed Models: Multivariate Regression

Missing Data Missing Data Methods in ML Multiple Imputation

1. Assumptions. 1. Introduction. 2. Terminology

Multiple Regression White paper

3 Nonlinear Regression

Statistics & Analysis. Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2

Machine Learning / Jan 27, 2010

BIOL 458 BIOMETRY Lab 10 - Multiple Regression

WELCOME! Lecture 3 Thommy Perlinger

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

Predict Outcomes and Reveal Relationships in Categorical Data

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines

Machine Learning. Topic 4: Linear Regression Models

3 Nonlinear Regression

Recall the expression for the minimum significant difference (w) used in the Tukey fixed-range method for means separation:

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

Solution to Series 7

Data analysis using Microsoft Excel

Making the Transition from R-code to Arc

Last time... Bias-Variance decomposition. This week

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

What is machine learning?

IBM SPSS Categories 23

Lecture 13: Model selection and regularization

Applied Statistics : Practical 9

TI-83 Users Guide. to accompany. Statistics: Unlocking the Power of Data by Lock, Lock, Lock, Lock, and Lock

Knowledge Discovery and Data Mining

Serial Correlation and Heteroscedasticity in Time series Regressions. Econometric (EC3090) - Week 11 Agustín Bénétrix

Problem set for Week 7 Linear models: Linear regression, multiple linear regression, ANOVA, ANCOVA

Nonparametric Regression

ITSx: Policy Analysis Using Interrupted Time Series

Applied Regression Modeling: A Business Approach

Ludwig Fahrmeir Gerhard Tute. Statistical odelling Based on Generalized Linear Model. íecond Edition. . Springer

IBM SPSS Categories. Predict outcomes and reveal relationships in categorical data. Highlights. With IBM SPSS Categories you can:

3 Graphical Displays of Data

Smoothing non-stationary noise of the Nigerian Stock Exchange All-Share Index data using variable coefficient functions

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Recent advances in Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

The Curse of Dimensionality

Regression Analysis and Linear Regression Models

SASEG 9B Regression Assumptions

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

Generalized additive models II

SOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU.

Subset Selection in Multiple Regression

Variable selection is intended to select the best subset of predictors. But why bother?

Nonparametric Regression in R

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

GLM II. Basic Modeling Strategy CAS Ratemaking and Product Management Seminar by Paul Bailey. March 10, 2015

Basics of Multivariate Modelling and Data Analysis

Introduction. About this Document. What is SPSS. ohow to get SPSS. oopening Data

Using Excel for Graphical Analysis of Data

Missing Data Analysis for the Employee Dataset

Transcription:

SOC6078 Advanced Statistics: 9. Generalized Additive Models Robert Andersen Department of Sociology University of Toronto Goals of the Lecture Introduce Additive Models Explain how they extend from simple nonparametric regression (i.e., smoothing splines and local polynomial regression) Discuss estimation using backfitting Explain how to interpret their results Conclude with some examples of Additive Models applied to real social science data 1 2 Limitations of the Multiple Nonparametric Models The general nonparametric model (both the lowess smooth and the smoothing spline) takes the following form: As we see here, the multiple nonparametric model allows all possible interactions between the independent variables in their effects on Y we specify a jointly conditional functional form This model is ideal under the following circumstances: 1. There are no more than two predictors 2. The pattern of nonlinearity is complicated and thus cannot be easily modelled with a simple transformation, polynomial regression or cubic spline 3. The sample size is sufficiently large 3 Limitations of the Multiple Nonparametric Models (2) The general nonparametric model becomes impossible to interpret and unstable as we add more explanatory variables, however 1. For example, in the lowess case, as the number of variables increases, the window span must become wider in order to ensure that each local regression has enough cases (The general idea is the same for smoothing splines) This process can create significant bias (the curve becomes too smooth) 2. It is impossible to interpret general nonparametric regression when there are more than two variables there are no coefficients, and we cannot graph effects more than three dimensions These limitations lead us to Additive Models 4

Additive Regression Models Additive regression models essentially apply local regression to low dimensional projections of the data That is, they estimate the regression surface by a combination of a collection of one-dimensional functions The nonparametric additive regression model is Additive Regression Models (2) The assumption that the contribution of each covariate is additive is analogous to the assumption in linear regression that each component is estimated separately Recall that the linear regression model is where the B j represent linear effects where the f i are arbitrary functions estimated from the data; the errors ε are assumed to have constant variance and a mean of 0 The estimated functions f i are the analogues of the coefficients in linear regression 5 For the additive model we model Y as an additive combination of arbitrary functions of the Xs The f j represent arbitrary trends that can be estimated by lowess or smoothing splines 6 Additive Regression Models (3) Now comes the question: How do we find these arbitrary trends? If the X s are completely independent which will not be the case we could simply estimate each functional form using a nonparametric regression of Y on each of the X s separately Similarly in linear regression when the X s are completely uncorrelated the partial regression slopes are identical to the marginal regression slopes Since the X s are related, however, we need to proceed in another way, in effect removing the effects of other predictors which are unknown before we begin We use a procedure called backfitting to find each curve, controlling for the effects of the others 7 Estimation and Backfitting Suppose that we have a two predictor additive model: If we unrealistically know the partial regression function f 2 but not f 1 we could rearrange the equation in order to solve for f 1 In other words, smoothing Y i -f 2 (x i2 ) against x i1 produces an estimate of α+f 1 (x i1 ). Simply put, knowing one function allows us to find the other in the real world, however we don t know either so we must proceed initially with preliminary estimates 8

Estimation and Backfitting (2) 1. Start by expressing the variables in mean deviation form so that the partial regressions sum to zero, thus eliminating the individual intercepts 2. Take preliminary estimates of each function from a leastsquares regression of Y on the X s 3. The preliminary estimates are used as step (0) in an iterative estimation process Estimation and Backfitting (3) The partial residuals for X 1 are then 5. The same procedure in step 4 is done for X 2 6. Next we smooth these partial residuals against their respective X s, providing a new estimate of f 4. Find the partial residuals for X 1. Recall that partial residuals remove Y from its linear relationship to X 2 while retaining the relationship between Y and X 1 9 where S is the (n n) smoother transformation matrix for X j that depends only on the configuration of X ij for the jth predictor 10 Estimation and Backfitting (4) Either loess or smoothing splines can be used to find the regression curves If local polynomial regression is used, a decision must be made about the span that is used If a smoothing spline is used, the degrees of freedom can be specified a head of time or using cross-validation with the goal of minimizing penalized residual sum of squares Recall that the first term measures the closeness to the data; the second term penalizes curvature in the function 11 Estimation and Backfitting (5) This process of finding new estimates of the functions by smoothing the partial residuals is reiterated until the partial functions converge That is, when the estimates of the smooth functions stabilize from one iteration to the next we stop When this process is done, we obtain estimates of s j (X ij ) for every value of X j More importantly, we will have reduced a multiple regression to a series of two-dimensional partial regression problems, making interpretation easy: Since each partial regression is only two-dimensional, the functional forms can be plotted on two-dimensional plots showing the partial effects of each X j on Y In other words, perspective plots are no longer necessary unless we include an interaction between two smoother terms 12

GAMs in R There are two excellent packages for fitting generalized additive models in R The gam (for generalized additive model) function in the mgcv (multilple smoothing parameter estimation with generalized cross validation) fits generalized additive models using smoothing splines The smoothing parameter can be chosen automatically using cross-validation or manually by specifying the degrees of freedom The gam function in the gam package allows either lowess (lo(x)) or smoothing splines (s(x)) to be specified The anova function can be used for both functions, allowing different models to be easily compared Additive Regression Models in R: Example: Canadian prestige data Here we use the Canadian Prestige data to fit an additive model for prestige regressed on income and education For this example I use the gam function in mgcv package The formula takes the same form as the glm function except now we have the option of having parametric terms and smoothed estimates The R-script specifying a smooth trend for both income and education is as follows: 13 14 Additive Regression Models in R: Example: Canadian prestige data (2) The summary function returns tests for each smooth, the degrees of freedom for each smooth, and an adjusted R- square for the model. The deviance can be obtained from the deviance(model) command Additive Regression Models in R: Example: Canadian prestige data (3) Again, as with other nonparametric models, we have no slope parameters to investigate (we do have an intercept, however) A plot of the regression surface is necessary 15 16

Additive Regression Models in R: Example: Canadian prestige data (4) Additive Model: We can see the nonlinear relationship for both education and Income with Prestige but there is no interaction between them i.e., the slope for income is the same at every value of education We can compare this model to the general nonparametric regression model Prestige 80 60 40 20 5000 10000 Income 15000 20000 25000 8 10 12 14 Education Additive Regression Models in R: Example: Canadian prestige data (5) General Nonparametric Model: This model is quite similar to the additive model, but there are some nuances particularly in the midrange of income that are not picked up by the additive model because the X s do not interact Prestige 80 60 40 20 5000 10000 15000 Income 20000 25000 8 10 12 14 Education 17 18 Additive Regression Models in R: Example: Canadian prestige data (6) Perspective plots can also be made automatically using the persp.gam function. These graphs include a 95% confidence region 80 60 40 20 5000 10000 15000 income 20000 25000 8 14 12 10 education Additive Regression Models in R: Example: Canadian prestige data (7) Since the slices of the additive regression in the direction of one predictor (holding the other constant) are parallel, we can graph each partialregression function separately This is the benefit of the additive model We can graph as many plots as there are variables, and allowing us to easily visualize the relationships In other words, a multidimensional regression has been reduced to a series of two-dimensional partial-regression plots To get these in R: red/green are +/-2 se 19 20

Additive Regression Models in R: Example: Canadian prestige data (8) s(income,3.12) s(education,3.18) -20 0 10-20 0 10 0 5000 10000 15000 20000 25000 income 6 8 10 12 14 16 education 21 Interpreting the Effects A plot of of X j versus s j (X j ) shows the relationship between X j and Y holding constant the other variables in the model Since Y is expressed in mean deviation form, the smooth term s j (X j ) is also centered and thus each plot represents how Y changes relative to its mean with changes in X Interpreting the scale of the graphs then becomes easy: The value of 0 on the Y-axis is the mean of Y As the line moves away from 0 in a negative direction we subtract the distance from the mean when determining the fitted value. For example, if the mean is 45, and for a particular X-value (say x=15) the curve is at s j (X j )=4, this means the fitted value of Y controlling for all other explanatory variables is 45+4=49. If there are several nonparametric relationships, we can add together the effects on the two graphs for any particular observation to find its fitted value of Y 22 Interpreting the Effects (2) R-script for previous slide Income=10000 Education=10 s(income,3.08) -20-10 0 10 20 (10 000,6) s(education,3) -20-10 0 10 20 (10,-5) 5000 10000 20000 6 8 10 12 14 16 income education The mean of prestige=47.3. Therefore the fitted value for an occupation with average income of $10000/year and 10 years of education on average is 47.3+6-5=48.3 23 24

Residual Sum of Squares As was the case for smoothing splines and lowess smooths, statistical inference and hypothesis testing is based on the residual sum of squares (or deviance in the case of generalized additive models) and the degrees of freedom The RSS for an additive model is easily defined in the usual manner: The approximate degrees of freedom, however, need to be adjusted from the regular nonparametric case, however, because we are no longer specifying a jointlyconditional functional form Degrees of Freedom Recall that for nonparametric regression, the approximate degrees of freedom are equal to the trace of the smoother matrix (the matrix that projects Y onto Y-hat) We extend this to the additive model: 1 is subtracted from each df reflecting the constraint that each partial regression function sums to zero (the individual intercept have been removed) Parametric terms entered in the model each occupy a single degree of freedom as in the linear regression case The individual degrees of freedom are then combined for a single measure: 25 1 is added to the final degrees of freedom to account for the overall constant in the model 26 Specifying Degrees of Freedom As was the case the degrees of freedom or alternatively the smoothing parameter λ can be specified by the researcher Also like smoothing splines, however, generalized crossvalidation can be used to specify the degrees of freedom Recall that this finds the smoothing parameter that gives the lowest average mean squared error from the cross-validation samples Cross-validation is implemented using the mgcv package in R 27 Cautions about Statistical Tests when the λ are chosen using GCV If the smoothing parameters λ s (or equivalently, the degrees of freedom) are chosen using generalized crossvalidation (GCV), caution must be used when using an analysis of deviance If a variable is added or removed from the model, the smoothing parameter λ that yields the smallest mean squared error will also change By implication, the degrees of freedom also changes implying that the equivalent number of parameters used for the model is different In other words, the test will only be approximate because the otherwise nested models have different degrees of freedom associated with λ As a result, it is advisable to fix the degrees of freedom when comparing models 28

Testing for Linearity We can compare the linear model of prestige regressed on income and education with the additive model by carrying out an incremental F-test Diagnostic Plots The gam.check function returns four diagnostic plots: 1. A quantile-comparison plot of the residuals allows us to look for outliers and heavy tails 2. Residuals versus linear predictors (simply observed y for continuous variables) helps detect nonconstant error variance 3. Histogram of the residuals are good for detecting nonormality 4. Response versus fitted values The difference between the models is highly statistically significant the additive model describes the relationship between prestige and education and income much better 29 30 Diagnostic Plots (2) Interactions between Smoothed Terms Sample Quantiles -15-5 5 15 Normal Q-Q Plot residuals -15-5 5 15 Resids vs. linear pred. The gam function in the mgcv package allows you to specify an intercation term between two or more terms In the case of an interaction between two terms, and when no other variables are included in the model, we essentially have a multiple nonparametric regression -2-1 0 1 2 30 40 50 60 70 80 Theoretical Quantiles linear predictor Histogram of residuals Response vs. Fitted Values Once again we need to graph the relationship in a perspective plot Frequency 0 10 20 30 Response 20 40 60 80 While it is possible to fit a higher order interaction, once we get past the two-way interaction the graph no longer can be interpreted -20-10 0 10 20 30 40 50 60 70 80 Residuals Fitted Values 31 32

Semi-Parametric Models The generality of the additive model makes it very attractive when there is complicated nonlinearity in the multivariate case Nonetheless, the flexibility of the smooth fit comes at the expense of precision and statistical power As a result, if a linear trend can be fit, it should be preferred This leads us to the semi-parametric model, which allow a mixture of linear and nonparametric components Semi-Parametric Models (2) Semi-parametric models also makes it possible to add categorical variables. They enter the model in exactly the same way as for linear regression as a set of dummy regressors As said earlier, the gam function in the mgcv package also allows you to specify interaction terms Any interactions that can be done in a linear model can also be included in an additive model The same backfitting procedure that is used for the general additive model is used in fitting this semiparametric model 33 The last of these specifies an interaction between a categorical variable X 2 (noted by two dummy regressors- D 1 and D 2 ) and a quantitative variable X 1 for which a smooth trend is specified This fits a separate curve for each category 34 Interaction for GAMs in R Interaction for GAMs in R (2) Blue Collar Professional White Collar s(income,1) -20 0 20 40 60 80 100 s(income,1) -20 0 20 40 60 80 100 s(income,1) -20 0 20 40 60 80 100 5000 15000 25000 income 5000 15000 25000 income 5000 15000 25000 income 35 36

Concurvity The generalized additive model analogue to collinearity in linear models Two possible problems can arise: 1. A point or group of points that are common outliers in two or more X s could cause wild tail behavior 2. If two X s are too highly correlated, backfitting may be unable to find a unique curve. In these cases, the initial linear coefficient will be all that is returned The graph on previous page is a good example Here type and income are too closely related i.e., professional jobs are high paying, blue collar jobs pay less, and thus we find only linear fits where the lines cross) As is the case with collinearity, there is no solution to concurvity other than reformulating the research question Example 2: Inequality data revisited Recall earlier in the course we saw that there was an apparent interaction between gini and democracy in their effects on attitudes towards pay inequality Thus far we haven t given too much effort towards trying to determine the functional form of these effects We now do so using a semi-parametric model that fits a smooth term for gini that interacts with a dummy regressor for democracy In other words we fit two separate curves: One for democracies and another for none democracies 37 38 Example 2: Inequality data revisited (2) Example 2: Inequality data revisited (3) Democracies Non-democracies s(gini,5.05) -1.5-1.0-0.5 0.0 0.5 1.0 s(gini,8.77) -1.5-1.0-0.5 0.0 0.5 1.0 39 20 30 40 50 60 gini 20 30 40 50 60 40 gini

Example 2: Inequality data revisited (4) We now proceed to test whether the additive model does significantly better than the linear model We conclude that the additive model is no more informative 41 Cautions about Interpretation When we fit a linear model we don t believe the model to be correct, but rather that it is a good approximation The same goes for additive models, but we hope that they are better approximations Having said that, all the same pitfalls possible for linear models are magnified with additive models Most importantly, we must be careful not to overinterpret fitted curves An examination of standard error bands, an analysis of deviance, and residual plots, can help determine whether fitted curves are important We can also select and delete variables in a stepwise manner (taking out and then re-adding) insignificant terms to ensure that only important terms remain in the final model We do not want unimportant terms influencing otherwise important effects 42 Generalized Additive Mixed Models The gamm function in the mgcv package calls on the lme function in the nlme package to fit generalized additive mixed models Recall earlier we used the British Election Study data to explore how income affected left-right attitudes Since observations were clustered within constituencies, we used a mixed model to take this clustering into account, specifying a random intercept Assume now that we have reason to believe that income had a nonlinear effect on attitudes. We could test this hypothesis by specifying a smooth for the income effect using the gamm function, and comparing this model with another that specifies a simpler linear trend Generalized Additive Mixed Models (2) 43 44

Generalized Additive Mixed Models (3) s(income2,1) -1.0-0.5 0.0 0.5 1.0 1.5 Generalized Additive Mixed Models (4) We can also test for linearity using the anova function to compare the fit of a model specifying a smooth term to a model specifying a linear trend -5 0 5 INCOME2 The plot of the income effect below indicates that a linear specification is probably best, so we proceed to a formal test using an analysis of deviance 45 We see hear that the difference between the models is not statistically significant, suggesting that the linear specification is best 46 Missing Data Missing data can be a problem for any type of model It is only seriously problematic, however, if the missing cases have a systematic relationship to the response or the X s If the data are missing at random (i.e., the pattern of missingness is not a function of the response), we are less worried about them. However, if they are not, the problem is even more serious for generalized additive models than for linear regression The backfitting algorithm omits all missing observations and thus their fitted values are set to 0 when the partial residuals are smoothed against the predictor Since the fitted curves have a mean of 0, this amounts to assigning the average fitted value to the missing observations In other words, it the same as using mean imputation in linear models, thus resulting in bias estimates 47 Problem of Mean Imputation (1) The following example randomly generates 100 observations from the normal distribution, N(20,2), so that x and y are perfectly correlated The example shows what happens if 20% of the data are randomly removed (and thus are missing completely at random) and mean imputation is used in a regression model 48

Problem of Mean Imputation (2) Missing on x Problem of Mean Imputation (3) Missing on x Density of x I now remove values of x for 20 observations, replacing them with the mean of x Density 0.0 0.1 0.2 0.3 0.4 All cases Mean imputation 16 18 20 22 24 N = 100 Bandwidth = 0.3697 49 50 Problem of Mean Imputation (4) Missing on x The mean imputation does not affect the slope, but it has pulled the intercept downwards More importantly, because there is less variation in x, the standard errors will be larger yall 18 20 22 24 20% Mean imputation (x) 18 20 22 24 xmean Problem of Mean Imputation (4) Missing on y We now randomly replace 20 y-values with the mean of y but retain all values of x The mean imputation affects the slope and the intercept, resulting in biased estimates ymean 18 20 22 24 20% Mean imputation (y) 18 20 22 24 xall 51 52

Summary and Conclusions Additive models give a compromise between ease of interpretation of the linear model and the flexibility of the general nonparametric model Complicated nonlinearity problems can be easily accommodated, even for models with many independent variables The effects in the model represent partial effects in the same way as coefficients in linear models These models should be seen as important models on their own, but they also can play an important role in diagnosing nonlinearity even if the final model chosen is a regular linear model 53 Summary and Conclusions (2) These models are extremely flexible in that both nonparametric and parametric trends can be specified Moreover, even interactions between explanatory variables are possible Caution:Since GAMs effectively use mean imputation for missing data (rather than list-wise deletion as in linear models), we must be especially careful to deal appropriately with missing data before fitting the model Mean imputation can result in biased estimates Finally, as we shall see tomorrow, these models can be extended to accommodate limited dependent variables in the same way that generalized linear models extend the general linear model 54