Statistics & Analysis. A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects

Size: px
Start display at page:

Download "Statistics & Analysis. A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects"

Transcription

1 A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects Patralekha Bhattacharya Thinkalytics The PDLREG procedure in SAS is used to fit a finite distributed lagged model to time series data where the coefficients of the lagged terms are assumed to lie on a polynomial curve. This method was suggested by Almon (1965) as a relatively flexible and yet parsimonious method of estimating distributed lags. This increases the flexibility in the shape of the distributed lag function. The degree of the polynomial is usually chosen to be less than the number of lags, thereby reducing the number of parameters to be estimated as well. The GAM procedure in SAS is used to fit generalized additive models as outlined by Hastie and Tibshirani (1990). In this paper, we outline the advantages and disadvantages of the two procedures and compare their performance in estimating the duration of lags, by conducting some simulations. In any field involving research and analytics, one often encounters data that is spaced over time. This is known as time series data. One of the inherent characteristics of time series data is that effects are spread out over time so that the outcome in this period may be affected not only by the events that take place in the current period, but also by those that occurred in the past. For instance, this month s Sales may be affected by the advertising that takes place in the current month as well as by all the marketing and advertising the firm did in previous months. Another example is that an individual s consumption in this period may depend on his disposable income in this period as well as on his disposable income in previous periods. Both the above instances are examples of lagged effects where the value of the dependent variable depends on lagged effects of the independent variable. There might be lead effects as well so that the outcome variable is affected by perceptions of what may happen in the future. For example, if housing prices are expected to go down in the future, one might defer the purchase of a new house by a few months. Also, consumers can react in anticipation of a marketing stimulus. Practitioners have always struggled to understand and correctly estimate these dynamic effects that are an inherent part of time series data. One way of modeling delayed effects is to introduce lagged terms as independent variables in the model. Therefore, taking the example of Sales and advertising, Sales in period t depends on advertising in period t as well as in previous periods as shown in equation (1) where represents Sales and represents advertising in period t. through are the lag coefficients. Equation (1) indicates that advertising has an effect up to s-1 periods in the future. However estimation of equation (1) can become difficult for several reasons. First of all it is difficult to decide how many lagged terms to include in equation (1). Without prior knowledge about how long the effects of advertising last, we cannot choose a value of s. Secondly with s-1 lagged terms the number of parameters turns out to be s + 1. For large values of s, this can require estimation of a large number of parameters, which may cause problems because of loss of degrees of freedom. Besides, these lagged independent variables may be correlated with the original variable thus adding to the collinearity in the model. This has led researchers to look into other methods of modeling the carryover effects of advertising. One of the methods has been to postulate relationships between the different lag parameters in order to reduce the number of parameters in the model. For instance the geometric distributed lag model assumes that the impact of the lagged terms declines geometrically over time. Therefore if is the impact of advertising in period 1, then in period 2, the impact of that advertising would be where λ is a fraction i.e. 0 < λ < 1. Therefore in equation (1), and and so on. This assumption greatly reduces the number of parameters that have to be estimated in this model. The geometric lag model however, assumes a monotonically declining lag structure which may not always be 1 (1)

2 realistic. For instance the effect of advertising may be small in the first few periods, then increase and eventually decline. In order to accommodate this type of a lag structure, some authors have used the negative binomial distribution (or Pascal distribution) to model advertising lags. A more flexible method of estimating the effect of advertising lags in equation (1) was postulated by Almon. She suggested a method in which the coefficients of the model are expressed in terms of some function f(k) which can be approximated by a polynomial in k. The PDLREG procedure in SAS is based on this method and the distribution of the lagged effects is modeled by Almon lag polynomials. This means that the coefficients of the lagged values of the independent variables are assumed to lie on a polynomial curve. Apart from dynamic effects, another factor that complicates matters is that, often, the relationship between the dependent and independent variable may be nonlinear. Going back to our Sales and advertising example, it is a well-known fact that the relationship between Sales and advertising is concave for high levels of advertising because of diminishing returns to scale with respect to advertising. In order to model this relationship, marketing practitioners often transform the dependent and independent variables using different functional forms such as log, square root etc in order to model that non-linear relationship. For more details about the types of functional forms that are usually used, please refer to the NESUG 2010 paper by the same author. However, rarely in the real world does the relationship between Sales and advertising follow a specific mathematical functional form. Moreover, predictor variables usually do not show much variation in the sample. Therefore we may only observe values of advertising within a small range. Sometimes with only small variation in the sample, several models can be a good fit for the data. In the next section of the paper we will outline the PDLREG procedure in SAS. We will illustrate the syntax for the procedure and explain some of the outputs from the procedure. The following section will be devoted to the GAM procedure. We will next move on to a comparison between the two procedures where we will outline the advantages and disadvantages of both. Finally in the last section of the paper we compare the performance of the two procedures through some simulation exercises. THE PDLREG PROCEDURE: The PDLREG procedure can be invoked by using the following syntax : proc pdlreg data=test; model y = x( n, l ); run; where y is the dependent variable and x is in the independent variable. The parameter n specifies the length of the lag distribution, i.e. the number of lags of the regressor to use in the model. The parameter l denotes the degree of the distribution polynomial. The variable x is the independent variable in the equation. The above model statement assumes that the relationship between y and x is linear in parameters. However, if there is reason to assume a nonlinear relationship between those variables, either of them can be transformed in any manner and the transformed variable can be used in the equation instead. In other words, if the relationship between the dependent variable and independent variable is non-linear, one can specify the nature of that non-linear relationship by appropriate transformations of the dependent and independent variables. Suppose we believe that there is a linear relationship between y and log(x). Then the model statement in the above syntax can be changed to : model y = z( n, l ); where z = log(x). The PDLREG procedure also allows other covariates to be entered in the above model and distributed lags can be specified for more than one regressor. The procedure prints a table containing the parameter estimates for the polynomial distribution as shown 2

3 in Fig 1.1. This table can be used to determine the correct degree of the distribution polynomial. For instance if we start off with the assumption of a polynomial of degree 5 and the parameter estimates of the coefficients are significant for the 1 st 4 terms but insignificant for the 5 th term, that may indicate that the true degree of the polynomial should be 4. The PDLREG Procedure Parameter Estimates Standard Approx Variable DF Estimate Error t Value Pr > t Intercept logx** <.0001 logx** <.0001 logx** <.0001 logx** <.0001 logx** <.0001 logx** Figure 1.1 As shown in figure 1.2, the PDLREG procedure also prints the parameter estimates of the lag distribution coefficients which are the coefficients of the lagged values of z. The significance of these coefficients can be used to determine the duration of the lags. The PDLREG procedure can support any number of lags. Estimate of Lag Distribution Standard Approx Variable Estimate Error t Value Pr > t logx(0) logx(1) <.0001 logx(2) <.0001 logx(3) <.0001 logx(4) <.0001 logx(5) <.0001 logx(6) <.0001 logx(7) <.0001 logx(8) logx(9) logx(10) Figure 1.2 3

4 THE GAM PROCEDURE: The following statements invoke the GAM procedure. proc gam data=diabetes; model y = spline(x) spline(lag1x) spline(lag2x).. spline(lagix); run; The GAM procedure fits generalized additive models as those models are defined by Hastie and Tibshirani (1990). The procedure is based on nonparametric regression and smoothing techniques which relaxes the assumption of linearity and enables us to uncover structure in the relationship between the independent variables and the dependent variable that might otherwise be missed. Multiple lag terms and /or other covariates can be entered in the model by using additional spline functions in the syntax shown above. If multiple lag terms are entered into the model, the number of lag terms that remain significant can be used to understand the duration of the lags. The procedure prints a table containing parameter estimates for the parametric part of the model as shown in figure 2.1. This looks at the linear relationship between y and each of the independent variables in the model. If the t score is high for an independent variable in this table, that indicates that the linear trend for the specific independent variable is significant. The GAM Procedure Dependent Variable: Y Smoothing Model Component(s): spline(x) spline(lag1x) spline(lag2x) spline(lag3x) spline(lag4x) spline(lag5x) spline(lag6x) spline(lag7x) spline(lag8x) spline(lag9x) spline(lag10x) Regression Model Analysis Parameter Estimates Parameter Standard Parameter Estimate Error t Value Pr > t Intercept <.0001 Linear(X) <.0001 Linear(lag1X) <.0001 Linear(lag2X) <.0001 Linear(lag3X) <.0001 Linear(lag4X) Linear(lag5X) Linear(lag6X) Linear(lag7X) Linear(lag8X) Linear(lag9X) Linear(lag10X) Figure 2.1 Another table shows the Analysis of Deviance table for the nonparametric component of the model. This later table looks at the significance of the non-linear relationships between y and each of the independent variables. A high F value for one of the independent variables in this table implies that there is a significant non-linear trend for that specific variable. This table therefore can be used to determine the significance of the non-linear trends for the independent variables in the model. 4

5 Smoothing Model Analysis Analysis of Deviance Sum of Source DF Squares Chi-Square Pr > ChiSq Spline(X) Spline(lag1X) Spline(lag2X) <.0001 Spline(lag3X) <.0001 Spline(lag4X) Spline(lag5X) Spline(lag6X) Spline(lag7X) Spline(lag8X) Spline(lag9X) Spline(lag10X) Figure 2.2 Since we have always assumed nonlinear relationships between y and each lag of the independent variable we focus on this second table to estimate the duration of lags. We use a method of iteration using a SAS DO loop to determine the number of lags that remain in the model. We start by using the independent variable along with 20 of its lags as predictor variables in the model. If any of the predictors is insignificant in this table, that term is deleted in the next round of model iteration. This method is continued until all the predictors that remain in the model are significant. The lagged terms that remain in the final model are then used to determine the duration of lagged effects. COMPARING THE TWO PROCEDURES: The art of modeling almost always calls for assumptions about the relationships between the dependent and independent variables to simplify the estimation process. For instance, when we know the relationship between the dependent and independent variable is non-linear we may use a specific functional form (such as logarithmic or square root or reciprocal) to model that relationship. In fact when it comes to estimations of dynamic lags, assumptions are also made about the relationships between the parameter estimates of the lagged independent variables. For instance, in the geometric lag model, the parameters of the lagged variables are assumed to be geometrically declining over time. Similarly, in the Pascal model, the coefficients of the lagged terms are assumed to form a negative binomial distribution. To summarize, there may be two types of restrictions that can be imposed on a dynamic model : 5

6 1. One is restrictions in coefficients of the lagged terms. 2. Restriction in the shape of the functional form: e.g., fitting a model that is linear in parameters when the correct functional form should be a non-linear model. Compared to other models such as the geometric lag model and Pascal model, the Almon lag structure somewhat relaxes the first restriction and allows some degree of flexibility in the determining the coefficients of the lagged terms. In spite of that though, certain restrictions are still imposed on the lag parameters and if these conditions are not correct, then the model will be somewhat mis-specified. In that case, incorrect lags may show up as significant in the model. The generalized additive model, however, imposes no restrictions on the coefficients of the lagged terms and allows those coefficients to be determined from the data. When working with the PDLREG procedure, we also have to make assumptions about the relationship between the dependent and independent variables and use specific transformations of variables such as log, square root etc. to represent those relationships. The GAM procedure on the other hand does not require us to make any presumptions about these relationships or compute any transformations of variables. Nor does this procedure impose specific functional forms to model the relationships between these variables. In fact, the advantage of the GAM procedure over the PDLREG procedure is that the nature of the non-linear relationship has to be specified in the latter whereas in the former, the relationship is uncovered from the data. Therefore, in the PDLREG procedure, the non-linear relationship between the dependent and independent variable is restricted to a specific functional form such as log or square root or reciprocal, etc. In the GAM procedure on the other hand, the relationship can follow any pattern that is found in the data. GAM allows complete flexibility of the functional form (non-linear) of the model and imposes no restriction on parameters. Therefore, when the true model is non-linear, GAM does a better job of fitting the model and estimating the true duration of the lags. This complete flexibility of choosing the functional form and parameters however, also comes with its own disadvantages. With a large number of predictor variables including lagged terms for each of them may lead to a large number of lagged independent variables in the model which might cause difficulties in estimation. The PDLREG procedure also has some other advantages as compared to the GAM procedure. Firstly, the PDLREG procedure is less intensive computationally, uses less resources and is much faster to run than the GAM procedure. Secondly the PDLREG procedure allows for tests of autocorrelation such as the DW test whereas the GAM procedure does not. The PDLREG procedure allows lagged dependent variables to be included in the model by using the nlag= option. However, there is no such option in the GAM procedure. In the next section we describe the method of simulation that we used in the paper. Our method here is very similar to the method used in the NESUG 2010 paper by the same author. At the risk of repetition we also describe the methodology here for the convenience of the reader. SIMULATION METHOD We use a dataset that contains media impressions for magazines. The data are simulated to be as close to real world data as possible. Actual magazine accumulation curves were obtained from the former MRI website (now GFKMRI ( and those were used to create a variable with magazine impressions. For the sake of simplification, throughout this paper we will assume that Sales are affected by only one media variable. We assume that carryover effects exist so that the media variable influences sales not only in the period in which it is aired but also in future periods. Therefore Sales in any period is determined by the value of the media variable in that period as well as lagged values of the media variable. In this paper we assume that there are 3 significant lags so that in the true model, Sales in the current period is affected by the media variable in the current and 3 preceding periods. The way we conduct our experiment is as follows. We postulate the true relationship between Sales and the media variable by specifying the model and the values of the parameters. Next we use Monte Carlo simulation methods to try to fit several different models (including the true model) to estimate the relationship between Sales and the current and lagged media variables and see which lags come up as significant in 6

7 the model. For example, suppose we assume that the true relationship between Sales and the media variable (M) can be represented by a semi-log model as follows: Using the current and lagged values of the media variable that we have in our dataset, and a randomly chosen set of parameters ( β 1, β 2, β 3, β 4 ), we calculate the value of Sales using equation (2). This is assumed to be the true relationship between Sales and the media variable, M. Taking this as the true model, we will next try to simulate a bunch of data sets each with different random scatter. In order to do this we first create a new variable stream (called δ, say) the values of which are chosen from the standard normal distribution with replacement. We next add this new variable, δ to our dependent variable, Sales to create a new Sales variable (New_S t ). where δ ~ N(0,1). This new Sales variable, New_S t, is used to run a regression model using the true (semi-logarithmic) specification and that exercise helps us obtain an estimate of the standard deviation of the residuals, S(yx). This gives us an estimate of the variance in Sales that we can observe when the true relationship is given by equation (2). Next we use Monte Carlo simulations to determine other plausible values that the dependent variable can take assuming that the true relationship is (2). These are the values of Sales that may be observed in practice when the true sales stream is S t in equation (2). To obtain these possible values for the Sales stream, we proceed as follows. To each ideal point we add random scatter drawn from a Gaussian distribution with a mean of 0 and SD equal to the value of S(yx) reported from the linear regression of our experimental data. This gives us the probable values that the Sales stream can take when the true values is given by equation (2). We repeat this step 50 times to obtain 50 different data sets each containing different Sales streams. With each dataset and each new Sales stream we try to fit the simulated data using the PDLREG procedure but using different model specifications including the true model specification. For instance since the true model specification is semi-log the simulated data set is fitted using a semi-log model, a reciprocal model, a square root model as well as the generalized additive model. In the above example, we used the semi-logarithmic model to obtain the ideal data set and then tried to fit other types of models to the simulated data that we derived from this ideal data. We repeat this exercise outlined in the previous paragraph for other types of models as well. More specifically, apart from the semi-log model, the above simulations are also performed using the reciprocal model and the square root model as the ideal models. Therefore, in the second phase of the experiment we use the reciprocal model as the true relationship between Sales and advertising and then try to derive a bunch of simulated data sets from this ideal data set. Next these simulated data sets are fitted with the PDLREG procedure assuming the reciprocal model, the semi-log model, a square root model as well as the generalized additive model. In the third phase of the experiment, we assume that the true relationship between Sales and advertising is represented by the square root model. All of the above steps are then repeated assuming that the square root model is the true model. Notice that to obtain the ideal relationship between Sales and media variables (and their lagged values) as shown in equation (1) we need to come up with values for the parameters β 1, β 2, β 3, β 4. This parameter combination is chosen randomly (with certain restrictions) in order to make sure that the choice of parameters does not influence any of the results. In fact, for each model type, 100 different parameter combinations are used to obtain the dependent variable and create 100 ideal data sets. Therefore for each model type, the method of simulating 50 datasets outlined in the previous paragraph was repeated for each of the 100 different parameter combinations. Therefore, a total of model simulations were run with 50 simulations for each of 3 model types and 100 parameter combinations. The GAM procedure was invoked using the following code. proc gam data = test; 7 (2) (3)

8 model Y = spline(m) spline(lag1m) spline(lag2m) spline(lag3m) spline(lag4m) spline(lag5m) spline(lag6m) spline(lag7m) spline(lag8m) spline(lag9m) spline(lag10m) spline(lag11m) spline(lag12m) spline(lag13m) spline(lag14m) spline(lag15m) spline(lag16m) spline(lag17m) spline(lag18m) spline(lag19m) spline(lag20m) / dist = normal; ods output ANODEV = Anodev_out; run; where lagim represents the variable that is obtained by taking the the ith lag of M. RESULTS The tables in this section illustrate the results obtained from the simulation exercises. Tables 1a 1d shows the results when the true model is semi-logarithmic. Recall that we have assumed that the current media stream as well as the 3 lagged terms are significant in the ideal model. Table 1a shows a typical result for one of the parameter combinations when the true model is semilogarithmic. If a generalized additive model is fitted to this data, then almost 100% of the simulated models show the correct lags as significant. Lag 4 shows also shows up as significant sometimes but only in 34% of the models. However if a PDLREG model is used to fit the data, some irrelevant lag terms show up as insignificant in the model. Obviously, when a Reciprocal or Square root transformation is used for the media variable, this result can be expected to occur because of the incorrect model specification (since the true model is semi-logarithmic). Surprisingly though, incorrect lags show up as significant even in the situation where the correct transformation of the independent variable is used in the PDLREG procedure. In other words even if we compute the logarithmic transformation of the media variable and then use the transformed variable in the RHS of the PDLREG model equation, we still do not get the correct lags in most of the model simulations. T RUE MODEL = SEMI-LOGARIT HMIC MODEL la g0 la g1 la g2 la g3 la g4 la g5 la g6 la g7 la g8 la g9 la g10 GAM 100% 100% 100% 100% 34% 2% 6% 0% 0% 0% 0% PDLREG (SEMI-LOG) 100% 100% 100% 100% 100% 100% 100% 100% 0% 100% 100% PDLREG (SQR_ROOT ) 100% 100% 100% 100% 100% 100% 100% 100% 0% 100% 100% PDLREG(RECIPROCAL) 100% 100% 100% 100% 100% 100% 100% 0% 100% 100% 100% Table 1a Table 1b summarizes the results for all the parameter combinations when the true model is semi-log and the fitted model is the PDLREG model with the square root transformation for the media variable. Recall that the simulation exercise is repeated for 100 different parameter combinations. The left most column in Table 1b shows the percentage of models for which the corresponding lag shows up as significant. Therefore for each of the 100 parameter combinations, 100% of the simulated models pick up lag0 through lag 7 as being significant. For 99 of the 100 parameter combinations at least one of the far out lags (lag8-11) is also found to be significant in 100% of the models. 8

9 TRUE MODEL =SEMI-LOGARITHMIC, FITTED MODEL= PDLREG WITH SQUARE ROOT TRANSFORMATION % Simula tions la g0 la g1 la g2 la g3 la g4 la g5 la g6 la g7 la g8-11 0%-20% %-40% %-60% %-80% %-100% %-100% % Table 1b Table 1c summarizes the results for all parameter combinations when the true model is semi-log and the fitted model is the PDLREG model with the semi-log transformation for the media variable. Therefore the transformation of the media variable in this case is the true model transformation. In spite of that we still find that for all 100 of the parameter combinations, 100% of the model simulations pick out lags 4, 5 and 6 as significant along with the relevant lags 0 through 3. Besides, at least one of the lags 8 through 11 always shows up as significant for 99 of the parameter Table 1c combinations. TRUE MODEL =SEMI-LOGARITHMIC, FITTED MODEL= PDLREG WITH SEMI-LOG TRANSFORMATION % Simula tions la g0 la g1 la g2 la g3 la g4 la g5 la g6 la g7 la g8-11 0% %-20% %-40% %-60% %-80% %-100% %-100% % The results look very similar if the true model is semi-logarithmic and the fitted model is a PDLREG with a reciprocal transformation for the independent variable and will not be repeated here. If the fitted model is a generalized additive model then the results are strikingly different as shown in Table 1d. In this case for of the parameter combinations lags 0 through 3 show up as significant in 100% of the model simulations. For about 39 parameter combinations, lag 4 shows up as significant for 80% to 100% of the model simulations. For almost none of the parameter combinations do far out lags show up as significant in any of the model simulations. TRUE MODEL =SEMI-LOGARITHMIC, FITTED MODEL= GAM % Simula tions la g0 la g1 la g2 la g3 la g4 la g5 la g6 la g7 la g8 dum9 dum10 dum11 0%-20% %-40% %-60% %-80% %-100% %-100% % Table 1d 9

10 Therefore, when the true model is semi-logarithmic, the generalized additive model does a better job of picking out the true duration of lags than any of the PDLREG models. Based on our simulation exercises, we reach the same conclusions when the true model is reciprocal or square root. CONCLUSION In this paper, we compare the PDLREG and GAM procedures and look at their effectiveness in estimating dynamic effects. We show that model specification may play an important role in determining which lags show up as significant in the model. There are two ways in which an incorrect model may be specified. Assumptions about the specific functional form as well as restrictions on the parameter estimates can both result in a mis-specification of the model. We propose that using a generalized additive model (PROC GAM in SAS) instead of the PDLREG procedure may help to more accurately identify the significant lags in the model. Since the true relationship between Sales and advertising, rarely follows a precise functional form, using an explicit function to model the relationship may lead to incorrect estimation of the lagged effects of advertising. A Generalized Additive Model allows greater flexibility of the functional form and helps to get more accurate results. Also, the PDLREG model restricts the parameters to lie on a polynomial curve. While this might be a reasonable assumption for some parameter values, it may not be true for other values of the parameters. GAM does not impose any restriction on the parameters of the model and may therefore be more accurate. Having said that, we would like to emphasize that the PDLREG procedure takes less time to run and may be able to handle a larger number of independent variables than the GAM procedure. Besides, the PDLREG procedure has options available for autoregressive terms to be included in the model and allows for tests for autocorrelation of residuals whereas none of these options are available in the GAM procedure. In conclusion, we would like to point out that in this paper we have used a very simplistic model to show the accuracy of GAM vis a vis other functional forms. We also restricted our analysis and simulation exercises to one dataset. More research is needed to investigate how well GAM performs with different datasets as well as when we use more complicated models with multiple media variables. REFERENCES S Almon, The Distributed Lag Between Capital Appropriations and Expenditure, Econometrica, Vol. 33, No. 1 (Jan., 1965), pp P Bhattacharya, Using Generalized Additive Models in Marketing Mix Modeling, NESUG Hastie and Tibshirani (1990), Generalized Additive Models, New York: Chapman and Hall ACKNOWLEDGMENTS SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Patralekha Bhattacharya 10

11 Thinkalytics Web: * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 11

Generalized Additive Model

Generalized Additive Model Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1

More information

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2010 1 / 26 Additive predictors

More information

Generalized Additive Models

Generalized Additive Models Generalized Additive Models Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Additive Models GAMs are one approach to non-parametric regression in the multiple predictor setting.

More information

Statistics & Analysis. Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2

Statistics & Analysis. Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2 Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2 Weijie Cai, SAS Institute Inc., Cary NC July 1, 2008 ABSTRACT Generalized additive models are useful in finding predictor-response

More information

CH9.Generalized Additive Model

CH9.Generalized Additive Model CH9.Generalized Additive Model Regression Model For a response variable and predictor variables can be modeled using a mean function as follows: would be a parametric / nonparametric regression or a smoothing

More information

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Generalized Additive Model and Applications in Direct Marketing Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Abstract Logistic regression 1 has been widely used in direct marketing applications

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

STAT 705 Introduction to generalized additive models

STAT 705 Introduction to generalized additive models STAT 705 Introduction to generalized additive models Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 22 Generalized additive models Consider a linear

More information

Paper CC-016. METHODOLOGY Suppose the data structure with m missing values for the row indices i=n-m+1,,n can be re-expressed by

Paper CC-016. METHODOLOGY Suppose the data structure with m missing values for the row indices i=n-m+1,,n can be re-expressed by Paper CC-016 A macro for nearest neighbor Lung-Chang Chien, University of North Carolina at Chapel Hill, Chapel Hill, NC Mark Weaver, Family Health International, Research Triangle Park, NC ABSTRACT SAS

More information

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value. Calibration OVERVIEW... 2 INTRODUCTION... 2 CALIBRATION... 3 ANOTHER REASON FOR CALIBRATION... 4 CHECKING THE CALIBRATION OF A REGRESSION... 5 CALIBRATION IN SIMPLE REGRESSION (DISPLAY.JMP)... 5 TESTING

More information

SAS/STAT 13.1 User s Guide. The NESTED Procedure

SAS/STAT 13.1 User s Guide. The NESTED Procedure SAS/STAT 13.1 User s Guide The NESTED Procedure This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS Institute

More information

The NESTED Procedure (Chapter)

The NESTED Procedure (Chapter) SAS/STAT 9.3 User s Guide The NESTED Procedure (Chapter) SAS Documentation This document is an individual chapter from SAS/STAT 9.3 User s Guide. The correct bibliographic citation for the complete manual

More information

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2)

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2) SOC6078 Advanced Statistics: 9. Generalized Additive Models Robert Andersen Department of Sociology University of Toronto Goals of the Lecture Introduce Additive Models Explain how they extend from simple

More information

Building Better Parametric Cost Models

Building Better Parametric Cost Models Building Better Parametric Cost Models Based on the PMI PMBOK Guide Fourth Edition 37 IPDI has been reviewed and approved as a provider of project management training by the Project Management Institute

More information

Nonparametric Regression

Nonparametric Regression Nonparametric Regression John Fox Department of Sociology McMaster University 1280 Main Street West Hamilton, Ontario Canada L8S 4M4 jfox@mcmaster.ca February 2004 Abstract Nonparametric regression analysis

More information

Data analysis using Microsoft Excel

Data analysis using Microsoft Excel Introduction to Statistics Statistics may be defined as the science of collection, organization presentation analysis and interpretation of numerical data from the logical analysis. 1.Collection of Data

More information

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 23 CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 3.1 DESIGN OF EXPERIMENTS Design of experiments is a systematic approach for investigation of a system or process. A series

More information

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables

More information

Box-Cox Transformation for Simple Linear Regression

Box-Cox Transformation for Simple Linear Regression Chapter 192 Box-Cox Transformation for Simple Linear Regression Introduction This procedure finds the appropriate Box-Cox power transformation (1964) for a dataset containing a pair of variables that are

More information

Multiple Regression White paper

Multiple Regression White paper +44 (0) 333 666 7366 Multiple Regression White paper A tool to determine the impact in analysing the effectiveness of advertising spend. Multiple Regression In order to establish if the advertising mechanisms

More information

Analysis of Panel Data. Third Edition. Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS

Analysis of Panel Data. Third Edition. Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS Analysis of Panel Data Third Edition Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS Contents Preface to the ThirdEdition Preface to the Second Edition Preface to the First Edition

More information

Chapter 7: Dual Modeling in the Presence of Constant Variance

Chapter 7: Dual Modeling in the Presence of Constant Variance Chapter 7: Dual Modeling in the Presence of Constant Variance 7.A Introduction An underlying premise of regression analysis is that a given response variable changes systematically and smoothly due to

More information

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &

More information

Effects of PROC EXPAND Data Interpolation on Time Series Modeling When the Data are Volatile or Complex

Effects of PROC EXPAND Data Interpolation on Time Series Modeling When the Data are Volatile or Complex Effects of PROC EXPAND Data Interpolation on Time Series Modeling When the Data are Volatile or Complex Keiko I. Powers, Ph.D., J. D. Power and Associates, Westlake Village, CA ABSTRACT Discrete time series

More information

Two-Stage Least Squares

Two-Stage Least Squares Chapter 316 Two-Stage Least Squares Introduction This procedure calculates the two-stage least squares (2SLS) estimate. This method is used fit models that include instrumental variables. 2SLS includes

More information

Local Minima in Regression with Optimal Scaling Transformations

Local Minima in Regression with Optimal Scaling Transformations Chapter 2 Local Minima in Regression with Optimal Scaling Transformations CATREG is a program for categorical multiple regression, applying optimal scaling methodology to quantify categorical variables,

More information

Stat 8053, Fall 2013: Additive Models

Stat 8053, Fall 2013: Additive Models Stat 853, Fall 213: Additive Models We will only use the package mgcv for fitting additive and later generalized additive models. The best reference is S. N. Wood (26), Generalized Additive Models, An

More information

SYS 6021 Linear Statistical Models

SYS 6021 Linear Statistical Models SYS 6021 Linear Statistical Models Project 2 Spam Filters Jinghe Zhang Summary The spambase data and time indexed counts of spams and hams are studied to develop accurate spam filters. Static models are

More information

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business Section 3.4: Diagnostics and Transformations Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Regression Model Assumptions Y i = β 0 + β 1 X i + ɛ Recall the key assumptions

More information

SAS/STAT 12.3 User s Guide. The GAM Procedure (Chapter)

SAS/STAT 12.3 User s Guide. The GAM Procedure (Chapter) SAS/STAT 12.3 User s Guide The GAM Procedure (Chapter) This document is an individual chapter from SAS/STAT 12.3 User s Guide. The correct bibliographic citation for the complete manual is as follows:

More information

For our example, we will look at the following factors and factor levels.

For our example, we will look at the following factors and factor levels. In order to review the calculations that are used to generate the Analysis of Variance, we will use the statapult example. By adjusting various settings on the statapult, you are able to throw the ball

More information

SAS/STAT 13.1 User s Guide. The GAM Procedure

SAS/STAT 13.1 User s Guide. The GAM Procedure SAS/STAT 13.1 User s Guide The GAM Procedure This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS Institute

More information

SAS Structural Equation Modeling 1.3 for JMP

SAS Structural Equation Modeling 1.3 for JMP SAS Structural Equation Modeling 1.3 for JMP SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2012. SAS Structural Equation Modeling 1.3 for JMP. Cary,

More information

Data-Analysis Exercise Fitting and Extending the Discrete-Time Survival Analysis Model (ALDA, Chapters 11 & 12, pp )

Data-Analysis Exercise Fitting and Extending the Discrete-Time Survival Analysis Model (ALDA, Chapters 11 & 12, pp ) Applied Longitudinal Data Analysis Page 1 Data-Analysis Exercise Fitting and Extending the Discrete-Time Survival Analysis Model (ALDA, Chapters 11 & 12, pp. 357-467) Purpose of the Exercise This data-analytic

More information

davidr Cornell University

davidr Cornell University 1 NONPARAMETRIC RANDOM EFFECTS MODELS AND LIKELIHOOD RATIO TESTS Oct 11, 2002 David Ruppert Cornell University www.orie.cornell.edu/ davidr (These transparencies and preprints available link to Recent

More information

Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM

Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM * Which directories are used for input files and output files? See menu-item "Options" and page 22 in the manual.

More information

Nonparametric Risk Attribution for Factor Models of Portfolios. October 3, 2017 Kellie Ottoboni

Nonparametric Risk Attribution for Factor Models of Portfolios. October 3, 2017 Kellie Ottoboni Nonparametric Risk Attribution for Factor Models of Portfolios October 3, 2017 Kellie Ottoboni Outline The problem Page 3 Additive model of returns Page 7 Euler s formula for risk decomposition Page 11

More information

A Beginner's Guide to. Randall E. Schumacker. The University of Alabama. Richard G. Lomax. The Ohio State University. Routledge

A Beginner's Guide to. Randall E. Schumacker. The University of Alabama. Richard G. Lomax. The Ohio State University. Routledge A Beginner's Guide to Randall E. Schumacker The University of Alabama Richard G. Lomax The Ohio State University Routledge Taylor & Francis Group New York London About the Authors Preface xv xvii 1 Introduction

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 2015 MODULE 4 : Modelling experimental data Time allowed: Three hours Candidates should answer FIVE questions. All questions carry equal

More information

Machine Learning: An Applied Econometric Approach Online Appendix

Machine Learning: An Applied Econometric Approach Online Appendix Machine Learning: An Applied Econometric Approach Online Appendix Sendhil Mullainathan mullain@fas.harvard.edu Jann Spiess jspiess@fas.harvard.edu April 2017 A How We Predict In this section, we detail

More information

HW 10 STAT 472, Spring 2018

HW 10 STAT 472, Spring 2018 HW 10 STAT 472, Spring 2018 1) (0 points) Do parts (a), (b), (c), and (e) of Exercise 2 on p. 298 of ISL. 2) (0 points) Do Exercise 3 on p. 298 of ISL. 3) For this problem, you can merely submit the things

More information

CREATING THE ANALYSIS

CREATING THE ANALYSIS Chapter 14 Multiple Regression Chapter Table of Contents CREATING THE ANALYSIS...214 ModelInformation...217 SummaryofFit...217 AnalysisofVariance...217 TypeIIITests...218 ParameterEstimates...218 Residuals-by-PredictedPlot...219

More information

Using HLM for Presenting Meta Analysis Results. R, C, Gardner Department of Psychology

Using HLM for Presenting Meta Analysis Results. R, C, Gardner Department of Psychology Data_Analysis.calm: dacmeta Using HLM for Presenting Meta Analysis Results R, C, Gardner Department of Psychology The primary purpose of meta analysis is to summarize the effect size results from a number

More information

Basic Concepts of Reliability

Basic Concepts of Reliability Basic Concepts of Reliability Reliability is a broad concept. It is applied whenever we expect something to behave in a certain way. Reliability is one of the metrics that are used to measure quality.

More information

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

Regression. Dr. G. Bharadwaja Kumar VIT Chennai Regression Dr. G. Bharadwaja Kumar VIT Chennai Introduction Statistical models normally specify how one set of variables, called dependent variables, functionally depend on another set of variables, called

More information

Economics Nonparametric Econometrics

Economics Nonparametric Econometrics Economics 217 - Nonparametric Econometrics Topics covered in this lecture Introduction to the nonparametric model The role of bandwidth Choice of smoothing function R commands for nonparametric models

More information

The Piecewise Regression Model as a Response Modeling Tool

The Piecewise Regression Model as a Response Modeling Tool NESUG 7 The Piecewise Regression Model as a Response Modeling Tool Eugene Brusilovskiy University of Pennsylvania Philadelphia, PA Abstract The general problem in response modeling is to identify a response

More information

Chapter 4: Implicit Error Detection

Chapter 4: Implicit Error Detection 4. Chpter 5 Chapter 4: Implicit Error Detection Contents 4.1 Introduction... 4-2 4.2 Network error correction... 4-2 4.3 Implicit error detection... 4-3 4.4 Mathematical model... 4-6 4.5 Simulation setup

More information

Voluntary State Curriculum Algebra II

Voluntary State Curriculum Algebra II Algebra II Goal 1: Integration into Broader Knowledge The student will develop, analyze, communicate, and apply models to real-world situations using the language of mathematics and appropriate technology.

More information

Box-Cox Transformation

Box-Cox Transformation Chapter 190 Box-Cox Transformation Introduction This procedure finds the appropriate Box-Cox power transformation (1964) for a single batch of data. It is used to modify the distributional shape of a set

More information

[spa-temp.inf] Spatial-temporal information

[spa-temp.inf] Spatial-temporal information [spa-temp.inf] Spatial-temporal information VI Table of Contents for Spatial-temporal information I. Spatial-temporal information........................................... VI - 1 A. Cohort-survival method.........................................

More information

Statistics: Normal Distribution, Sampling, Function Fitting & Regression Analysis (Grade 12) *

Statistics: Normal Distribution, Sampling, Function Fitting & Regression Analysis (Grade 12) * OpenStax-CNX module: m39305 1 Statistics: Normal Distribution, Sampling, Function Fitting & Regression Analysis (Grade 12) * Free High School Science Texts Project This work is produced by OpenStax-CNX

More information

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. ST512 Fall Quarter, 2005 Exam 1 Name: Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. 1. (42 points) A random sample of n = 30 NBA basketball

More information

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer Model Assessment and Selection Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Model Training data Testing data Model Testing error rate Training error

More information

Scholz, Hill and Rambaldi: Weekly Hedonic House Price Indexes Discussion

Scholz, Hill and Rambaldi: Weekly Hedonic House Price Indexes Discussion Scholz, Hill and Rambaldi: Weekly Hedonic House Price Indexes Discussion Dr Jens Mehrhoff*, Head of Section Business Cycle, Price and Property Market Statistics * Jens This Mehrhoff, presentation Deutsche

More information

A. Incorrect! This would be the negative of the range. B. Correct! The range is the maximum data value minus the minimum data value.

A. Incorrect! This would be the negative of the range. B. Correct! The range is the maximum data value minus the minimum data value. AP Statistics - Problem Drill 05: Measures of Variation No. 1 of 10 1. The range is calculated as. (A) The minimum data value minus the maximum data value. (B) The maximum data value minus the minimum

More information

Example 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1

Example 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1 Panel data set Consists of n entities or subjects (e.g., firms and states), each of which includes T observations measured at 1 through t time period. total number of observations : nt Panel data have

More information

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential

More information

HW 10 STAT 672, Summer 2018

HW 10 STAT 672, Summer 2018 HW 10 STAT 672, Summer 2018 1) (0 points) Do parts (a), (b), (c), and (e) of Exercise 2 on p. 298 of ISL. 2) (0 points) Do Exercise 3 on p. 298 of ISL. 3) For this problem, try to use the 64 bit version

More information

Section 2.1: Intro to Simple Linear Regression & Least Squares

Section 2.1: Intro to Simple Linear Regression & Least Squares Section 2.1: Intro to Simple Linear Regression & Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 Regression:

More information

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. + What is Data? Data is a collection of facts. Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. In most cases, data needs to be interpreted and

More information

Curve fitting. Lab. Formulation. Truncation Error Round-off. Measurement. Good data. Not as good data. Least squares polynomials.

Curve fitting. Lab. Formulation. Truncation Error Round-off. Measurement. Good data. Not as good data. Least squares polynomials. Formulating models We can use information from data to formulate mathematical models These models rely on assumptions about the data or data not collected Different assumptions will lead to different models.

More information

Using Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers

Using Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers Using Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers Why enhance GLM? Shortcomings of the linear modelling approach. GLM being

More information

Fathom Dynamic Data TM Version 2 Specifications

Fathom Dynamic Data TM Version 2 Specifications Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other

More information

Table of Contents (As covered from textbook)

Table of Contents (As covered from textbook) Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

Correctly Compute Complex Samples Statistics

Correctly Compute Complex Samples Statistics SPSS Complex Samples 15.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample

More information

PSY 9556B (Feb 5) Latent Growth Modeling

PSY 9556B (Feb 5) Latent Growth Modeling PSY 9556B (Feb 5) Latent Growth Modeling Fixed and random word confusion Simplest LGM knowing how to calculate dfs How many time points needed? Power, sample size Nonlinear growth quadratic Nonlinear growth

More information

PARAMETRIC ESTIMATION OF CONSTRUCTION COST USING COMBINED BOOTSTRAP AND REGRESSION TECHNIQUE

PARAMETRIC ESTIMATION OF CONSTRUCTION COST USING COMBINED BOOTSTRAP AND REGRESSION TECHNIQUE INTERNATIONAL JOURNAL OF CIVIL ENGINEERING AND TECHNOLOGY (IJCIET) Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14) ISSN 0976 6308 (Print) ISSN 0976

More information

Section 4 Matching Estimator

Section 4 Matching Estimator Section 4 Matching Estimator Matching Estimators Key Idea: The matching method compares the outcomes of program participants with those of matched nonparticipants, where matches are chosen on the basis

More information

Generalized additive models II

Generalized additive models II Generalized additive models II Patrick Breheny October 13 Patrick Breheny BST 764: Applied Statistical Modeling 1/23 Coronary heart disease study Today s lecture will feature several case studies involving

More information

A Multiple-Line Fitting Algorithm Without Initialization Yan Guo

A Multiple-Line Fitting Algorithm Without Initialization Yan Guo A Multiple-Line Fitting Algorithm Without Initialization Yan Guo Abstract: The commonest way to fit multiple lines is to use methods incorporate the EM algorithm. However, the EM algorithm dose not guarantee

More information

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening Variables Entered/Removed b Variables Entered GPA in other high school, test, Math test, GPA, High school math GPA a Variables Removed

More information

Missing Data Missing Data Methods in ML Multiple Imputation

Missing Data Missing Data Methods in ML Multiple Imputation Missing Data Missing Data Methods in ML Multiple Imputation PRE 905: Multivariate Analysis Lecture 11: April 22, 2014 PRE 905: Lecture 11 Missing Data Methods Today s Lecture The basics of missing data:

More information

SPSS INSTRUCTION CHAPTER 9

SPSS INSTRUCTION CHAPTER 9 SPSS INSTRUCTION CHAPTER 9 Chapter 9 does no more than introduce the repeated-measures ANOVA, the MANOVA, and the ANCOVA, and discriminant analysis. But, you can likely envision how complicated it can

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

Mixed Effects Models. Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC.

Mixed Effects Models. Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC. Mixed Effects Models Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC March 6, 2018 Resources for statistical assistance Department of Statistics

More information

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures

More information

Orange Juice data. Emanuele Taufer. 4/12/2018 Orange Juice data (1)

Orange Juice data. Emanuele Taufer. 4/12/2018 Orange Juice data (1) Orange Juice data Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l10-oj-data.html#(1) 1/31 Orange Juice Data The data contain weekly sales of refrigerated

More information

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric)

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric) Splines Patrick Breheny November 20 Patrick Breheny STA 621: Nonparametric Statistics 1/46 Introduction Introduction Problems with polynomial bases We are discussing ways to estimate the regression function

More information

DOWNLOAD PDF BIG IDEAS MATH VERTICAL SHRINK OF A PARABOLA

DOWNLOAD PDF BIG IDEAS MATH VERTICAL SHRINK OF A PARABOLA Chapter 1 : BioMath: Transformation of Graphs Use the results in part (a) to identify the vertex of the parabola. c. Find a vertical line on your graph paper so that when you fold the paper, the left portion

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Rebecca C. Steorts, Duke University STA 325, Chapter 3 ISL 1 / 49 Agenda How to extend beyond a SLR Multiple Linear Regression (MLR) Relationship Between the Response and Predictors

More information

Nonparametric Mixed-Effects Models for Longitudinal Data

Nonparametric Mixed-Effects Models for Longitudinal Data Nonparametric Mixed-Effects Models for Longitudinal Data Zhang Jin-Ting Dept of Stat & Appl Prob National University of Sinagpore University of Seoul, South Korea, 7 p.1/26 OUTLINE The Motivating Data

More information

1 The Permanent Income Hypothesis

1 The Permanent Income Hypothesis The Permanent Income Hypothesis. A two-period model Consider a two-period model where households choose consumption ( 2 ) to solve + 2 max log + log 2 { 2 } µ + + where isthediscountfactor, theinterestrate.

More information

Digital Image Processing. Prof. P. K. Biswas. Department of Electronic & Electrical Communication Engineering

Digital Image Processing. Prof. P. K. Biswas. Department of Electronic & Electrical Communication Engineering Digital Image Processing Prof. P. K. Biswas Department of Electronic & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Lecture - 21 Image Enhancement Frequency Domain Processing

More information

CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12

CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12 Tool 1: Standards for Mathematical ent: Interpreting Functions CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12 Name of Reviewer School/District Date Name of Curriculum Materials:

More information

GLM II. Basic Modeling Strategy CAS Ratemaking and Product Management Seminar by Paul Bailey. March 10, 2015

GLM II. Basic Modeling Strategy CAS Ratemaking and Product Management Seminar by Paul Bailey. March 10, 2015 GLM II Basic Modeling Strategy 2015 CAS Ratemaking and Product Management Seminar by Paul Bailey March 10, 2015 Building predictive models is a multi-step process Set project goals and review background

More information

Tips and Guidance for Analyzing Data. Executive Summary

Tips and Guidance for Analyzing Data. Executive Summary Tips and Guidance for Analyzing Data Executive Summary This document has information and suggestions about three things: 1) how to quickly do a preliminary analysis of time-series data; 2) key things to

More information

Non-Linearity of Scorecard Log-Odds

Non-Linearity of Scorecard Log-Odds Non-Linearity of Scorecard Log-Odds Ross McDonald, Keith Smith, Matthew Sturgess, Edward Huang Retail Decision Science, Lloyds Banking Group Edinburgh Credit Scoring Conference 6 th August 9 Lloyds Banking

More information

STA 570 Spring Lecture 5 Tuesday, Feb 1

STA 570 Spring Lecture 5 Tuesday, Feb 1 STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row

More information

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS ABSTRACT Paper 1938-2018 Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS Robert M. Lucas, Robert M. Lucas Consulting, Fort Collins, CO, USA There is confusion

More information

Reference

Reference Leaning diary: research methodology 30.11.2017 Name: Juriaan Zandvliet Student number: 291380 (1) a short description of each topic of the course, (2) desciption of possible examples or exercises done

More information

Adaptive osculatory rational interpolation for image processing

Adaptive osculatory rational interpolation for image processing Journal of Computational and Applied Mathematics 195 (2006) 46 53 www.elsevier.com/locate/cam Adaptive osculatory rational interpolation for image processing Min Hu a, Jieqing Tan b, a College of Computer

More information

SAS (Statistical Analysis Software/System)

SAS (Statistical Analysis Software/System) SAS (Statistical Analysis Software/System) SAS Adv. Analytics or Predictive Modelling:- Class Room: Training Fee & Duration : 30K & 3 Months Online Training Fee & Duration : 33K & 3 Months Learning SAS:

More information

Error Analysis, Statistics and Graphing

Error Analysis, Statistics and Graphing Error Analysis, Statistics and Graphing This semester, most of labs we require us to calculate a numerical answer based on the data we obtain. A hard question to answer in most cases is how good is your

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

Predict Outcomes and Reveal Relationships in Categorical Data

Predict Outcomes and Reveal Relationships in Categorical Data PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,

More information

ADMS 3330 FALL 2008 EXAM All Multiple choice Exam (See Answer Key on last page)

ADMS 3330 FALL 2008 EXAM All Multiple choice Exam (See Answer Key on last page) MULTIPLE CHOICE. Choose the letter corresponding to the one alternative that best completes the statement or answers the question. 1. Which of the following are assumptions or requirements of the transportation

More information

Fast or furious? - User analysis of SF Express Inc

Fast or furious? - User analysis of SF Express Inc CS 229 PROJECT, DEC. 2017 1 Fast or furious? - User analysis of SF Express Inc Gege Wen@gegewen, Yiyuan Zhang@yiyuan12, Kezhen Zhao@zkz I. MOTIVATION The motivation of this project is to predict the likelihood

More information

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,

More information