Chapter 9 Building the Regression Model I: Model Selection and Validation

Size: px
Start display at page:

Download "Chapter 9 Building the Regression Model I: Model Selection and Validation"

Transcription

1 Chapter 9 Building the Regression Model I: Model Selection and Validation 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 9 1 / 42

2 9.1 Polynomial Regression Models From previous: know Now how to fit models how to make inferences from these models Selection of the predictor variables for exploratory ( 探究的 ) observational studies hsuhl (NUK) LR Chap 9 2 / 42

3 9.1 Polynomial Regression Models Overview of Model-Building Process Strategy for building a regression model 1 Data collection and preparation hsuhl (NUK) LR Chap 9 3 / 42

4 9.1 Polynomial Regression Models Overview of Model-Building Process Strategy for building a regression model 1 Data collection and preparation 2 Reduction of number of explanatory variables hsuhl (NUK) LR Chap 9 3 / 42

5 9.1 Polynomial Regression Models Overview of Model-Building Process Strategy for building a regression model 1 Data collection and preparation 2 Reduction of number of explanatory variables 3 Model refinement( 精煉 ) and selection (tentative 暫時性的 ) hsuhl (NUK) LR Chap 9 3 / 42

6 9.1 Polynomial Regression Models Overview of Model-Building Process Strategy for building a regression model 1 Data collection and preparation 2 Reduction of number of explanatory variables 3 Model refinement( 精煉 ) and selection (tentative 暫時性的 ) 4 Model validation hsuhl (NUK) LR Chap 9 3 / 42

7 9.2 Surgical Unit Example Surgical Unit Example a particular type of liver( 肝臟 ) operation Variables: X 1 : blood clotting( 凝塊 ) score X 2 : prognostic( 預後的 ) index X 3 : enzyme( 酵 ) function test score X 4 : liver function test score X 5 : age, in years X 6 : indicator variable for gender (0=male; 1=female) X 7 and X 8 : indicator variables for history of alcohol use: hsuhl (NUK) LR Chap 9 4 / 42

8 9.2 Surgical Unit Example Surgical Unit Example Y : response; survival time Original data: 108 patients Preliminary study: the first 54 patients with the first four variables hsuhl (NUK) LR Chap 9 5 / 42

9 9.2 Surgical Unit Example basic plots/correlation diagnosis several cases as outlying examine the influence of these cases hsuhl (NUK) LR Chap 9 6 / 42

10 9.2 Surgical Unit Example residual plots: several cases as outlying curvature and nonconstant error variance some departure from normality hsuhl (NUK) LR Chap 9 7 / 42

11 9.2 Surgical Unit Example To make the distribution of the error terms more nearly normal to see if the same transformation would also reduce the apparent curvature Y = ln Y hsuhl (NUK) LR Chap 9 8 / 42

12 9.2 Surgical Unit Example Figure : JMP Scatter Plot Matrix and Correlation Matrix when Response Variable Is Y -Surgical Unit Example hsuhl (NUK) LR Chap 9 9 / 42

13 9.2 Surgical Unit Example linearly associated with Y, with X 3, X 4 showing the highest degrees of association and X 1 the lowest intercorrelations( 內相關 ; 交互相關 ) among the potential predictor variables (X 4 vs. X 1, X 2, X 3 ) Basic of the analyses: present the predictor variables in linear terms; not to include any interaction terms To examine whether all of the potential variables are needed or whether a subset of them is adequate model selection hsuhl (NUK) LR Chap 9 10 / 42

14 9.2 Surgical Unit Example Reference: hsuhl (NUK) LR Chap 9 11 / 42

15 9.3 Criteria for Model Selection Criteria for Model Selection From any set of p 1 predictors: 2 p 1 alternative models can be constructed There is the regression model with no X variables. hsuhl (NUK) LR Chap 9 12 / 42

16 9.3 Criteria for Model Selection Criteria for Model Selection (cont.) Model selection procedure have been developed to identify a small group of regression models that are good according to a specified criterion A detailed examination can then be made of a limited number of the more promising( 有希望的 ) or candidate models, leading to the selection of the final regression model to be employed. Focus on six criteria: Rp, 2 Ra,p, 2 C p, AIC p, SBC p, PRESS p (BIC p) hsuhl (NUK) LR Chap 9 13 / 42

17 9.3 Criteria for Model Selection Criteria for Model Selection (cont.) Some notations for model selection P 1: the number of potential X variables in the pool all regression models contain an intercept term β 0 (contains 1 in design matrix X P parameters) p 1: the number of X variables in a subset (p parameters in the regression function for this subset of X variables) n > P 1 p P hsuhl (NUK) LR Chap 9 14 / 42

18 9.3 Criteria for Model Selection R 2 p or SSE p Criterion the use of R 2 which R 2 is high R 2 p criterion the error sum of squares SSE p (subsets for which SSE p is small) R 2 p = 1 SSE p SSTO SSTO is constant for all possible regression models R 2 with #{X} The R 2 p criterion is not intended to identify the subsets that maximize this criterion. (max R 2 p when all P 1 potential X variables are included) to find the point where adding more X variables is not worthwhile( 值得做的 ) ( it leads to a very small increase in R 2 p) hsuhl (NUK) LR Chap 9 15 / 42

19 9.3 Criteria for Model Selection R 2 a,p or MSE p Criterion take account of the number of parameters R 2 a,p: the adjusted coefficient of multiple determination R 2 a,p = 1 ( ) n 1 SSEp n p SSTO = 1 MSE p SSTO n 1 R 2 a,p MSE p for the given Y observations max(r 2 a,p) can decrease as p increases ( the increase in max R 2 p becomes small it is not sufficient to offset the loss of an additional df.) R 2 a,p criterion: to find a few subsets for which R 2 a,p is at the maximum or so close to the maximum hsuhl (NUK) LR Chap 9 16 / 42

20 9.3 Criteria for Model Selection Mallow s C p Criterion concerned with the total mean squared error of the n fitted values The mean squared error for Ŷ i : (Ŷi µ i ) 2 = (E{Ŷi} µ i ) + (Ŷi E{Ŷi}) }{{}}{{} bias component random error component E{Ŷ i µ i } 2 = ( E{Ŷ i µ i } ) 2 + σ 2 {Ŷ i } The total mean squared error for all n fitted values Ŷ i : n [ (E{ Ŷ i µ i } ) 2 n ( + σ {Ŷi}] 2 = E{ Ŷ i µ i } ) 2 n + σ 2 {Ŷi} i=1 i=1 i=1 2 hsuhl (NUK) LR Chap 9 17 / 42

21 9.3 Criteria for Model Selection Mallow s C p Criterion (cont.) Γ p : the criterion measure is the total mean squared error divided by σ 2 Γp = 1 [ n ( E{ Ŷ σ 2 i µ i } ) ] 2 n + σ 2 {Ŷi} i=1 C p : the estimator of Γ p C p = i=1 SSE p (n 2p) MSE(X 1,..., X P 1 ) hsuhl (NUK) LR Chap 9 18 / 42

22 9.3 Criteria for Model Selection Mallow s C p Criterion (cont.) Why C p is an estimator of Γ p ni=1 σ 2 {Ŷ i } = trace(cov(xb)) = pσ 2 E{SSE p } = (E{Ŷi} µ i ) 2 + (n p)σ 2 E{SSE p } = E{Ŷi Y i } 2 = { ) } 2 E (Ŷi E{Ŷi} ε i + E{Ŷi} µ i [ n ] Γ p = 1 ) 2 n (E{Ŷ σ 2 i } µ i + σ 2 {Ŷ i } i=1 i=1 = E{SSE p} σ 2 (n 2p) SSE p C p = (n 2p) MSE(X 1,..., X P 1 ) hsuhl (NUK) LR Chap 9 19 / 42

23 9.3 Criteria for Model Selection Mallow s C p Criterion (cont.) When E{Ŷi} = µ i (no bias with p 1 X variables) E{C p } p the regression model containing all P 1 X variables: C P = SSE(X 1,..., X P 1 ) SSE(X 1,...,X P 1 ) n P (n 2P) = P The C p measure assumes that MSE(X 1,..., X P 1 ) is an unbiased estimator of σ 2. like to find the model: 1 C p is small 2 C p p hsuhl (NUK) LR Chap 9 20 / 42

24 9.3 Criteria for Model Selection AIC p and SBC p Criteria provide penalties for adding predictors: Akaike s information criterion: AIC p Schwarz Bayesian criterion: SBC p (Bayesian information criterion (BIC)) AIC p = n ln SSE p n ln n + 2p (penalty) BIC p = n ln SSE p n ln n + [ln n]p (penalty) Search for models that have small values of AIC p or SBC p SBC p criterion tends to favor more parsimonious models (large penalty if n 8) hsuhl (NUK) LR Chap 9 21 / 42

25 9.3 Criteria for Model Selection PRESS p Criterion PRESS p : prediction sum of squares n PRESS p = (Y i Ŷ i(i) ) 2 i=1 a measure of how well the use of the fitted values for a subset model can predict the observed responses Y i Y i(i) : the fitted value when i is being predicted from a model in which (i) was left out when the regression function was fitted (leave-one-out) When the prediction errors Y i Ŷi(i) are small, so are PRESS p PRESS p can be calculated without requiring n separate regression runs, each time deleting one of the n cases (Chap. 10) hsuhl (NUK) LR Chap 9 22 / 42

26 9.3 Criteria for Model Selection Example for different Criteria Surgical unit example Key R codes: leaps: performs an exhaustive search for the best subsets of the variables in x method=c("cp", "adjr2", "r2"): Calculate C p, adjusted R 2 a or R 2 int=true: Add an intercept to the model nbest: Number of subsets of each size to report for loop procedure hsuhl (NUK) LR Chap 9 23 / 42

27 9.3 Criteria for Model Selection Example for different Criteria (cont.) library(leaps) ex<-read.table("ch09ta01.txt",header=f) colnames(ex)<-c("x1","x2","x3","x4","x5","x6","x7","x8","y","ypr") attach(ex) ex.r2<-leaps( x=cbind(x1,x2,x3,x4,x5,x6,x7,x8), y=ypr, method= r2, nbest=6) p<-seq( min(ex.r2$size),max(ex.r2$size) ) ind<-as.data.frame(ex.r2[c(3:4)]) ind<-ind[with(ind, order(size,r2)), ] plot(ind[,c(1:2)],ylab=expression(r^2), xlab= p,col="red",pch=16) Rp2 = by( data=ex.r2[4],indices=factor(ex.r2$size), FUN=max) lines( Rp2 ~ p,col="blue" ) hsuhl (NUK) LR Chap 9 24 / 42

28 9.3 Criteria for Model Selection R codes for Table 9.2 #### Using for loop ex<-read.table("ch09ta01.txt",header=f) colnames(ex)<-c("x1","x2","x3","x4","x5","x6","x7","x8","y","ypr") attach(ex) nn<-length(ex[,1]) vars <- c("ypr","x1","x2","x3","x4")# N <- list(0,1,2,3,4)# COMB <- sapply(n, function(m) combn(x=vars[2:5], m)) COMB2 <- list() k=0 exlmf<-lm(ypr~x1+x2+x3+x4) SSEF<-deviance(aov(exlmf)) MSEF<-deviance(aov(exlmf))/(n-5) res.table<-null res.table0<-null for(i in seq(comb)) { tmp <- COMB[[i]] for(j in seq(ncol(tmp))) { hsuhl (NUK) LR Chap 9 25 / 42

29 9.3 Criteria for Model Selection R codes for Table 9.2 (cont.) k <- k + 1 if(length(tmp)==0) COMB2[[k]] <- formula(paste("ypr~1")) if(length(tmp)>0) COMB2[[k]] <- formula(paste("ypr", "~", paste(tmp[,j], collapse=" + "))) exlm0<-lm(comb2[[k]],data=ex) exlm<-summary(exlm0) np<-length(tmp[,1])+1 R2p<-exlm$r.squared R2ap<-exlm$adj.r.squared SSEp<-deviance(aov(exlm0)) Cp<-SSEp/MSEF-(nn-2*(np)) AICp<-nn*log(SSEp)-nn*log(nn)+2*np BICp<-nn*log(SSEp)-nn*log(nn)+log(nn)*np PRESSp<-sum((resid(exlm0)/(1 - lm.influence(exlm0)$hat))^2) res.table<-rbind(res.table,c(noquote(paste(tmp[,j], collapse=" + ")), format(round(c(np,r2p,r2ap,cp,aicp,bicp,pressp),3), nsmall = 3))) res.table0<-rbind(res.table0,c(np,r2p,r2ap,cp,aicp,bicp,pressp)) } } library(xtable) res.table hsuhl (NUK) LR Chap 9 26 / 42

30 9.3 Criteria for Model Selection R codes for Table 9.2 (cont.) No. Variables p Rp 2 Ra,p 2 C p AIC p SBC p PRESS p 1 None X X X X X1 + X X1 + X X1 + X X2 + X X2 + X X3 + X X1 + X2 + X X1 + X2 + X X1 + X3 + X X2 + X3 + X X1 + X2 + X3 + X hsuhl (NUK) LR Chap 9 27 / 42

31 9.3 Criteria for Model Selection R codes for Table 9.2 (cont.) hsuhl (NUK) LR Chap 9 28 / 42

32 9.3 Criteria for Model Selection R codes for Table 9.2 (cont.) hsuhl (NUK) LR Chap 9 29 / 42

33 9.4 Automatic Search Procedures for Model Selection Automatic Search Procedures for Model Selection #{possible models }(2 P 1 ) with #{predictors} Evaluating all of the possible alternatives can be a daunting( 使人氣餒的 ) endeavor( 努力 ). Automatic computer-search procedures: "Best" subsets algorithms stepwise regression hsuhl (NUK) LR Chap 9 30 / 42

34 9.4 Automatic Search Procedures for Model Selection "Best" Subset Algorithms Time-saving algorithms have been developed according to a specified criterion, ex: C p the best subsets Without requiring the fitting of all the possible subset regression models only a small fraction of all possible regression models is required to calculate. Several regression models can be identified as "good" for final consideration, depending on which criteria we use. hsuhl (NUK) LR Chap 9 31 / 42

35 9.4 Automatic Search Procedures for Model Selection "Best" Subset Algorithms (cont.) The surgical unit example R 2 a,p = 0.823: seven- or eight-parameter model min(c 7 ) = min(aic 7 ) = min(sbc 5 ) = min(press 5 ) = Not to identify a single best model; hope to identify a small set of promising models for further study hsuhl (NUK) LR Chap 9 32 / 42

36 9.4 Automatic Search Procedures for Model Selection "Best" Subset Algorithms (cont.) hsuhl (NUK) LR Chap 9 33 / 42

37 9.4 Automatic Search Procedures for Model Selection Selection Methods The forward stepwise regression procedure is probably the most widely used of the automatic search methods. The search method develops a sequence of regression models, at each step adding or deleting an X variable. (t or F statistics) the stepwise search procedures end with the identification of a single regression model as "best" Different formats: Forward Selection; Forward Stepwise Selection Backward Elimination; Backward Stepwise Elimination hsuhl (NUK) LR Chap 9 34 / 42

38 9.4 Automatic Search Procedures for Model Selection Forward Stepwise Selection The forward stepwise regression search algorithm: t statistics and the associated P-values 1 Fit a simple linear regression model for each of the P 1 X variables considered for inclusion. For each compute the t statistics for testing whether or not the slope is zero: t = b k s{b k } 2 Pick the largest out of the P 1 t k s and include the corresponding X variable in the regression model if t k exceeds some threshold (ex: comparing the P-value with the limits α in = 0.1). If no such t k stop the procedure hsuhl (NUK) LR Chap 9 35 / 42

39 9.4 Automatic Search Procedures for Model Selection Forward Stepwise Selection (cont.) 3 Using partial t statistics to add the second variable (ex: comparing the P-value with the limits α in = 0.1) 4 Test whether variables in the model need to be dropped from the model: Using partial t statistics (ex: comparing the P-value with the limits α out = 0.15) 5 Go to Step 3 and continue the procedure until stop. Note that allow an X variable brought into the model at an early stage, to be dropped subsequently if it is no longer helpful small α in are often underspecified α in < α out : avoid cycling hsuhl (NUK) LR Chap 9 36 / 42

40 9.4 Automatic Search Procedures for Model Selection Forward Stepwise Selection (cont.) forward selection Backward Elimination: start with all P 1 X-variables R codes: Backward Stepwise Elimination step(): Select a formula-based model by AIC. fastbw(): Fast Backward Variable Selection (library(rms)) add1() or drop1(): Add or Drop All Possible Single Terms to a Model hsuhl (NUK) LR Chap 9 37 / 42

41 Model Validation Model Validation the validation of the selected regression models checking a candidate model against independent data Three basic ways of validating a regression model: 1 Collection of new data to check the model and its predictive ability 2 Comparison of results: theoretical expectation; early results; simulation resutls 3 Holdout sample( 保留樣本 ): check the model and its predictive ability hsuhl (NUK) LR Chap 9 38 / 42

42 Model Validation Model Validation (cont.) Collection of New data to Check Model A means of measuring the actual predictive capability of the selected regression model is to use this model to predict each case in the new data set and then to calculate the mean of the squared prediction errors, to be denoted by MSPR, which stands for mean squared prediction error: MSPR = n i=1(y i Ŷ i ) 2 n Difficulties in replicating a study hsuhl (NUK) LR Chap 9 39 / 42

43 Model Validation Model Validation (cont.) Comparing with Theory, Empirical Evidence, or Simulation Results Unfortunately, there is often little theory that can be used to validate regression models hsuhl (NUK) LR Chap 9 40 / 42

44 Model Validation Model Validation (cont.) Data Splitting often called cross-validation ( 交叉驗證 ): the data set is large enough to split the data into two sets training sample: model-building set prediction set: used to evaluate the reasonableness and predictive ability of the selected model If the entire data set is not large enough for making an equal split, the validation data set will need to be smaller than the model-building data set. Splits of the data can be made at random. Drawback: the variances of the estimated regression coefficients developed from the model-building data set will usually be larger than those that would have been obtained from the fit to the entire data set hsuhl (NUK) LR Chap 9 41 / 42

45 Model Validation Model Validation (cont.) In any case, once the model has been validated, it is customary practice to use the entire data set for estimating the final regression model. Comments: double cross-validation procedure K-fold cross-validating hsuhl (NUK) LR Chap 9 42 / 42

STA121: Applied Regression Analysis

STA121: Applied Regression Analysis STA121: Applied Regression Analysis Variable Selection - Chapters 8 in Dielman Artin Department of Statistical Science October 23, 2009 Outline Introduction 1 Introduction 2 3 4 Variable Selection Model

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider

More information

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100.

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100. Linear Model Selection and Regularization especially usefull in high dimensions p>>100. 1 Why Linear Model Regularization? Linear models are simple, BUT consider p>>n, we have more features than data records

More information

Variable selection is intended to select the best subset of predictors. But why bother?

Variable selection is intended to select the best subset of predictors. But why bother? Chapter 10 Variable Selection Variable selection is intended to select the best subset of predictors. But why bother? 1. We want to explain the data in the simplest way redundant predictors should be removed.

More information

Information Criteria Methods in SAS for Multiple Linear Regression Models

Information Criteria Methods in SAS for Multiple Linear Regression Models Paper SA5 Information Criteria Methods in SAS for Multiple Linear Regression Models Dennis J. Beal, Science Applications International Corporation, Oak Ridge, TN ABSTRACT SAS 9.1 calculates Akaike s Information

More information

Chapter 10: Variable Selection. November 12, 2018

Chapter 10: Variable Selection. November 12, 2018 Chapter 10: Variable Selection November 12, 2018 1 Introduction 1.1 The Model-Building Problem The variable selection problem is to find an appropriate subset of regressors. It involves two conflicting

More information

Topics in Machine Learning-EE 5359 Model Assessment and Selection

Topics in Machine Learning-EE 5359 Model Assessment and Selection Topics in Machine Learning-EE 5359 Model Assessment and Selection Ioannis D. Schizas Electrical Engineering Department University of Texas at Arlington 1 Training and Generalization Training stage: Utilizing

More information

2017 ITRON EFG Meeting. Abdul Razack. Specialist, Load Forecasting NV Energy

2017 ITRON EFG Meeting. Abdul Razack. Specialist, Load Forecasting NV Energy 2017 ITRON EFG Meeting Abdul Razack Specialist, Load Forecasting NV Energy Topics 1. Concepts 2. Model (Variable) Selection Methods 3. Cross- Validation 4. Cross-Validation: Time Series 5. Example 1 6.

More information

MODEL DEVELOPMENT: VARIABLE SELECTION

MODEL DEVELOPMENT: VARIABLE SELECTION 7 MODEL DEVELOPMENT: VARIABLE SELECTION The discussion of least squares regression thus far has presumed that the model was known with respect to which variables were to be included and the form these

More information

Lecture 13: Model selection and regularization

Lecture 13: Model selection and regularization Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always

More information

BIOL 458 BIOMETRY Lab 10 - Multiple Regression

BIOL 458 BIOMETRY Lab 10 - Multiple Regression BIOL 458 BIOMETRY Lab 10 - Multiple Regression Many problems in science involve the analysis of multi-variable data sets. For data sets in which there is a single continuous dependent variable, but several

More information

Model selection Outline for today

Model selection Outline for today Model selection Outline for today The problem of model selection Choose among models by a criterion rather than significance testing Criteria: Mallow s C p and AIC Search strategies: All subsets; stepaic

More information

Discussion Notes 3 Stepwise Regression and Model Selection

Discussion Notes 3 Stepwise Regression and Model Selection Discussion Notes 3 Stepwise Regression and Model Selection Stepwise Regression There are many different commands for doing stepwise regression. Here we introduce the command step. There are many arguments

More information

Chapter 6: Linear Model Selection and Regularization

Chapter 6: Linear Model Selection and Regularization Chapter 6: Linear Model Selection and Regularization As p (the number of predictors) comes close to or exceeds n (the sample size) standard linear regression is faced with problems. The variance of the

More information

Model selection. Peter Hoff STAT 423. Applied Regression and Analysis of Variance. University of Washington /53

Model selection. Peter Hoff STAT 423. Applied Regression and Analysis of Variance. University of Washington /53 /53 Model selection Peter Hoff STAT 423 Applied Regression and Analysis of Variance University of Washington Diabetes example: y = diabetes progression x 1 = age x 2 = sex. dim(x) ## [1] 442 64 colnames(x)

More information

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools Abstract In this project, we study 374 public high schools in New York City. The project seeks to use regression techniques

More information

Assignment 6 - Model Building

Assignment 6 - Model Building Assignment 6 - Model Building your name goes here Due: Wednesday, March 7, 2018, noon, to Sakai Summary Primarily from the topics in Chapter 9 of your text, this homework assignment gives you practice

More information

Multicollinearity and Validation CIVL 7012/8012

Multicollinearity and Validation CIVL 7012/8012 Multicollinearity and Validation CIVL 7012/8012 2 In Today s Class Recap Multicollinearity Model Validation MULTICOLLINEARITY 1. Perfect Multicollinearity 2. Consequences of Perfect Multicollinearity 3.

More information

[POLS 8500] Stochastic Gradient Descent, Linear Model Selection and Regularization

[POLS 8500] Stochastic Gradient Descent, Linear Model Selection and Regularization [POLS 8500] Stochastic Gradient Descent, Linear Model Selection and Regularization L. Jason Anastasopoulos ljanastas@uga.edu February 2, 2017 Gradient descent Let s begin with our simple problem of estimating

More information

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Exploratory regression and model selection

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Exploratory regression and model selection Statistical Modelling for Social Scientists Manchester University January 20, 21 and 24, 2011 Graeme Hutcheson, University of Manchester Exploratory regression and model selection The lecture notes, exercises

More information

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2017

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2017 CPSC 340: Machine Learning and Data Mining Feature Selection Fall 2017 Assignment 2: Admin 1 late day to hand in tonight, 2 for Wednesday, answers posted Thursday. Extra office hours Thursday at 4pm (ICICS

More information

7. Collinearity and Model Selection

7. Collinearity and Model Selection Sociology 740 John Fox Lecture Notes 7. Collinearity and Model Selection Copyright 2014 by John Fox Collinearity and Model Selection 1 1. Introduction I When there is a perfect linear relationship among

More information

Package leaps. R topics documented: May 5, Title regression subset selection. Version 2.9

Package leaps. R topics documented: May 5, Title regression subset selection. Version 2.9 Package leaps May 5, 2009 Title regression subset selection Version 2.9 Author Thomas Lumley using Fortran code by Alan Miller Description Regression subset selection including

More information

Quantitative Methods in Management

Quantitative Methods in Management Quantitative Methods in Management MBA Glasgow University March 20-23, 2009 Luiz Moutinho, University of Glasgow Graeme Hutcheson, University of Manchester Exploratory Regression The lecture notes, exercises

More information

Multivariate Analysis Multivariate Calibration part 2

Multivariate Analysis Multivariate Calibration part 2 Multivariate Analysis Multivariate Calibration part 2 Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Linear Latent Variables An essential concept in multivariate data

More information

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 23 CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 3.1 DESIGN OF EXPERIMENTS Design of experiments is a systematic approach for investigation of a system or process. A series

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 2015 MODULE 4 : Modelling experimental data Time allowed: Three hours Candidates should answer FIVE questions. All questions carry equal

More information

Workshop 8: Model selection

Workshop 8: Model selection Workshop 8: Model selection Selecting among candidate models requires a criterion for evaluating and comparing models, and a strategy for searching the possibilities. In this workshop we will explore some

More information

Stat 401 B Lecture 26

Stat 401 B Lecture 26 Stat B Lecture 6 Forward Selection The Forward selection rocedure looks to add variables to the model. Once added, those variables stay in the model even if they become insignificant at a later ste. Backward

More information

Statistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010

Statistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010 Statistical Models for Management Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon February 24 26, 2010 Graeme Hutcheson, University of Manchester Exploratory regression and model

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017 Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

5.5 Regression Estimation

5.5 Regression Estimation 5.5 Regression Estimation Assume a SRS of n pairs (x, y ),..., (x n, y n ) is selected from a population of N pairs of (x, y) data. The goal of regression estimation is to take advantage of a linear relationship

More information

Splines and penalized regression

Splines and penalized regression Splines and penalized regression November 23 Introduction We are discussing ways to estimate the regression function f, where E(y x) = f(x) One approach is of course to assume that f has a certain shape,

More information

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview Chapter 888 Introduction This procedure generates D-optimal designs for multi-factor experiments with both quantitative and qualitative factors. The factors can have a mixed number of levels. For example,

More information

SAS Workshop. Introduction to SAS Programming. Iowa State University DAY 2 SESSION IV

SAS Workshop. Introduction to SAS Programming. Iowa State University DAY 2 SESSION IV SAS Workshop Introduction to SAS Programming DAY 2 SESSION IV Iowa State University May 10, 2016 Controlling ODS graphical output from a procedure Many SAS procedures produce default plots in ODS graphics

More information

Lecture 16: High-dimensional regression, non-linear regression

Lecture 16: High-dimensional regression, non-linear regression Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining and analysis November 3, 2017 1 / 17 High-dimensional regression Most of the methods we

More information

Cross-validation and the Bootstrap

Cross-validation and the Bootstrap Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. These methods refit a model of interest to samples formed from the training set,

More information

MATH 829: Introduction to Data Mining and Analysis Model selection

MATH 829: Introduction to Data Mining and Analysis Model selection 1/12 MATH 829: Introduction to Data Mining and Analysis Model selection Dominique Guillot Departments of Mathematical Sciences University of Delaware February 24, 2016 2/12 Comparison of regression methods

More information

Two-Stage Least Squares

Two-Stage Least Squares Chapter 316 Two-Stage Least Squares Introduction This procedure calculates the two-stage least squares (2SLS) estimate. This method is used fit models that include instrumental variables. 2SLS includes

More information

Lecture on Modeling Tools for Clustering & Regression

Lecture on Modeling Tools for Clustering & Regression Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into

More information

Lecture 25: Review I

Lecture 25: Review I Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

More information

Chapter 15 Mixed Models. Chapter Table of Contents. Introduction Split Plot Experiment Clustered Data References...

Chapter 15 Mixed Models. Chapter Table of Contents. Introduction Split Plot Experiment Clustered Data References... Chapter 15 Mixed Models Chapter Table of Contents Introduction...309 Split Plot Experiment...311 Clustered Data...320 References...326 308 Chapter 15. Mixed Models Chapter 15 Mixed Models Introduction

More information

Statistical Consulting Topics Using cross-validation for model selection. Cross-validation is a technique that can be used for model evaluation.

Statistical Consulting Topics Using cross-validation for model selection. Cross-validation is a technique that can be used for model evaluation. Statistical Consulting Topics Using cross-validation for model selection Cross-validation is a technique that can be used for model evaluation. We often fit a model to a full data set and then perform

More information

Machine Learning. Topic 4: Linear Regression Models

Machine Learning. Topic 4: Linear Regression Models Machine Learning Topic 4: Linear Regression Models (contains ideas and a few images from wikipedia and books by Alpaydin, Duda/Hart/ Stork, and Bishop. Updated Fall 205) Regression Learning Task There

More information

Regression Analysis and Linear Regression Models

Regression Analysis and Linear Regression Models Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical

More information

Nonparametric Methods Recap

Nonparametric Methods Recap Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority

More information

Differentiation of Cognitive Abilities across the Lifespan. Online Supplement. Elliot M. Tucker-Drob

Differentiation of Cognitive Abilities across the Lifespan. Online Supplement. Elliot M. Tucker-Drob 1 Differentiation of Cognitive Abilities across the Lifespan Online Supplement Elliot M. Tucker-Drob This online supplement reports the results of an alternative set of analyses performed on a single sample

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal

More information

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric)

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric) Splines Patrick Breheny November 20 Patrick Breheny STA 621: Nonparametric Statistics 1/46 Introduction Introduction Problems with polynomial bases We are discussing ways to estimate the regression function

More information

MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES

MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES UNIVERSITY OF GLASGOW MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES by KHUNESWARI GOPAL PILLAY A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in

More information

Assignment No: 2. Assessment as per Schedule. Specifications Readability Assignments

Assignment No: 2. Assessment as per Schedule. Specifications Readability Assignments Specifications Readability Assignments Assessment as per Schedule Oral Total 6 4 4 2 4 20 Date of Performance:... Expected Date of Completion:... Actual Date of Completion:... ----------------------------------------------------------------------------------------------------------------

More information

2014 Stat-Ease, Inc. All Rights Reserved.

2014 Stat-Ease, Inc. All Rights Reserved. What s New in Design-Expert version 9 Factorial split plots (Two-Level, Multilevel, Optimal) Definitive Screening and Single Factor designs Journal Feature Design layout Graph Columns Design Evaluation

More information

Cross-validation and the Bootstrap

Cross-validation and the Bootstrap Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. 1/44 Cross-validation and the Bootstrap In the section we discuss two resampling

More information

Package msgps. February 20, 2015

Package msgps. February 20, 2015 Type Package Package msgps February 20, 2015 Title Degrees of freedom of elastic net, adaptive lasso and generalized elastic net Version 1.3 Date 2012-5-17 Author Kei Hirose Maintainer Kei Hirose

More information

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:

More information

Evaluating generalization (validation) Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support

Evaluating generalization (validation) Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Evaluating generalization (validation) Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Topics Validation of biomedical models Data-splitting Resampling Cross-validation

More information

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes. Resources for statistical assistance Quantitative covariates and regression analysis Carolyn Taylor Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC January 24, 2017 Department

More information

1 StatLearn Practical exercise 5

1 StatLearn Practical exercise 5 1 StatLearn Practical exercise 5 Exercise 1.1. Download the LA ozone data set from the book homepage. We will be regressing the cube root of the ozone concentration on the other variables. Divide the data

More information

BIOL 458 BIOMETRY Lab 10 - Multiple Regression

BIOL 458 BIOMETRY Lab 10 - Multiple Regression BIOL 458 BIOMETRY Lab 0 - Multiple Regression Many problems in biology science involve the analysis of multivariate data sets. For data sets in which there is a single continuous dependent variable, but

More information

Lab 07: Multiple Linear Regression: Variable Selection

Lab 07: Multiple Linear Regression: Variable Selection Lab 07: Multiple Linear Regression: Variable Selection OBJECTIVES 1.Use PROC REG to fit multiple regression models. 2.Learn how to find the best reduced model. 3.Variable diagnostics and influential statistics

More information

Nonparametric Approaches to Regression

Nonparametric Approaches to Regression Nonparametric Approaches to Regression In traditional nonparametric regression, we assume very little about the functional form of the mean response function. In particular, we assume the model where m(xi)

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Exploring Econometric Model Selection Using Sensitivity Analysis

Exploring Econometric Model Selection Using Sensitivity Analysis Exploring Econometric Model Selection Using Sensitivity Analysis William Becker Paolo Paruolo Andrea Saltelli Nice, 2 nd July 2013 Outline What is the problem we are addressing? Past approaches Hoover

More information

SAS Workshop. Introduction to SAS Programming. Iowa State University DAY 3 SESSION I

SAS Workshop. Introduction to SAS Programming. Iowa State University DAY 3 SESSION I SAS Workshop Introduction to SAS Programming DAY 3 SESSION I Iowa State University May 10, 2016 Sample Data: Prostate Data Set Example C8 further illustrates the use of all-subset selection options in

More information

Instruction on JMP IN of Chapter 19

Instruction on JMP IN of Chapter 19 Instruction on JMP IN of Chapter 19 Example 19.2 (1). Download the dataset xm19-02.jmp from the website for this course and open it. (2). Go to the Analyze menu and select Fit Model. Click on "REVENUE"

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Rebecca C. Steorts, Duke University STA 325, Chapter 3 ISL 1 / 49 Agenda How to extend beyond a SLR Multiple Linear Regression (MLR) Relationship Between the Response and Predictors

More information

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem. STAT 2607 REVIEW PROBLEMS 1 REMINDER: On the final exam 1. Word problems must be answered in words of the problem. 2. "Test" means that you must carry out a formal hypothesis testing procedure with H0,

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Feature Selection Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. Admin Assignment 3: Due Friday Midterm: Feb 14 in class

More information

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer Model Assessment and Selection Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Model Training data Testing data Model Testing error rate Training error

More information

The problem we have now is called variable selection or perhaps model selection. There are several objectives.

The problem we have now is called variable selection or perhaps model selection. There are several objectives. STAT-UB.0103 NOTES for Wednesday 01.APR.04 One of the clues on the library data comes through the VIF values. These VIFs tell you to what extent a predictor is linearly dependent on other predictors. We

More information

The Truth behind PGA Tour Player Scores

The Truth behind PGA Tour Player Scores The Truth behind PGA Tour Player Scores Sukhyun Sean Park, Dong Kyun Kim, Ilsung Lee May 7, 2016 Abstract The main aim of this project is to analyze the variation in a dataset that is obtained from the

More information

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane DATA MINING AND MACHINE LEARNING Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Data preprocessing Feature normalization Missing

More information

Leveling Up as a Data Scientist. ds/2014/10/level-up-ds.jpg

Leveling Up as a Data Scientist.   ds/2014/10/level-up-ds.jpg Model Optimization Leveling Up as a Data Scientist http://shorelinechurch.org/wp-content/uploa ds/2014/10/level-up-ds.jpg Bias and Variance Error = (expected loss of accuracy) 2 + flexibility of model

More information

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010 THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE

More information

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables

More information

On Choosing the Right Coordinate Transformation Method

On Choosing the Right Coordinate Transformation Method On Choosing the Right Coordinate Transformation Method Yaron A. Felus 1 and Moshe Felus 1 The Survey of Israel, Tel-Aviv Surveying Engineering Department, MI Halperin Felus Surveying and Photogrammetry

More information

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering

More information

Bootstrapping Methods

Bootstrapping Methods Bootstrapping Methods example of a Monte Carlo method these are one Monte Carlo statistical method some Bayesian statistical methods are Monte Carlo we can also simulate models using Monte Carlo methods

More information

Using Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers

Using Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers Using Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers Why enhance GLM? Shortcomings of the linear modelling approach. GLM being

More information

Network Management System Dimensioning with Performance Data. Kaisa Tuisku

Network Management System Dimensioning with Performance Data. Kaisa Tuisku Network Management System Dimensioning with Performance Data Kaisa Tuisku University of Tampere School of Information Sciences Computer Science M. Sc. thesis Supervisor: Jorma Laurikkala June 2016 i University

More information

Data-Splitting Models for O3 Data

Data-Splitting Models for O3 Data Data-Splitting Models for O3 Data Q. Yu, S. N. MacEachern and M. Peruggia Abstract Daily measurements of ozone concentration and eight covariates were recorded in 1976 in the Los Angeles basin (Breiman

More information

Introduction to hypothesis testing

Introduction to hypothesis testing Introduction to hypothesis testing Mark Johnson Macquarie University Sydney, Australia February 27, 2017 1 / 38 Outline Introduction Hypothesis tests and confidence intervals Classical hypothesis tests

More information

Lasso Regression: Regularization for feature selection

Lasso Regression: Regularization for feature selection Lasso Regression: Regularization for feature selection CSE 416: Machine Learning Emily Fox University of Washington April 12, 2018 Symptom of overfitting 2 Often, overfitting associated with very large

More information

Hyperparameters and Validation Sets. Sargur N. Srihari

Hyperparameters and Validation Sets. Sargur N. Srihari Hyperparameters and Validation Sets Sargur N. srihari@cedar.buffalo.edu 1 Topics in Machine Learning Basics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation

More information

Topics in Machine Learning

Topics in Machine Learning Topics in Machine Learning Gilad Lerman School of Mathematics University of Minnesota Text/slides stolen from G. James, D. Witten, T. Hastie, R. Tibshirani and A. Ng Machine Learning - Motivation Arthur

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

SYS 6021 Linear Statistical Models

SYS 6021 Linear Statistical Models SYS 6021 Linear Statistical Models Project 2 Spam Filters Jinghe Zhang Summary The spambase data and time indexed counts of spams and hams are studied to develop accurate spam filters. Static models are

More information

Chapter 7: Dual Modeling in the Presence of Constant Variance

Chapter 7: Dual Modeling in the Presence of Constant Variance Chapter 7: Dual Modeling in the Presence of Constant Variance 7.A Introduction An underlying premise of regression analysis is that a given response variable changes systematically and smoothly due to

More information

Nonparametric Importance Sampling for Big Data

Nonparametric Importance Sampling for Big Data Nonparametric Importance Sampling for Big Data Abigael C. Nachtsheim Research Training Group Spring 2018 Advisor: Dr. Stufken SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Motivation Goal: build a model

More information

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. ST512 Fall Quarter, 2005 Exam 1 Name: Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. 1. (42 points) A random sample of n = 30 NBA basketball

More information

Week 4: Simple Linear Regression III

Week 4: Simple Linear Regression III Week 4: Simple Linear Regression III Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Goodness of

More information

Robust Linear Regression (Passing- Bablok Median-Slope)

Robust Linear Regression (Passing- Bablok Median-Slope) Chapter 314 Robust Linear Regression (Passing- Bablok Median-Slope) Introduction This procedure performs robust linear regression estimation using the Passing-Bablok (1988) median-slope algorithm. Their

More information

Chapter 7: Numerical Prediction

Chapter 7: Numerical Prediction Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 7: Numerical Prediction Lecture: Prof. Dr.

More information

IQR = number. summary: largest. = 2. Upper half: Q3 =

IQR = number. summary: largest. = 2. Upper half: Q3 = Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number

More information

Package GWRM. R topics documented: July 31, Type Package

Package GWRM. R topics documented: July 31, Type Package Type Package Package GWRM July 31, 2017 Title Generalized Waring Regression Model for Count Data Version 2.1.0.3 Date 2017-07-18 Maintainer Antonio Jose Saez-Castillo Depends R (>= 3.0.0)

More information

Example 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1

Example 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1 Panel data set Consists of n entities or subjects (e.g., firms and states), each of which includes T observations measured at 1 through t time period. total number of observations : nt Panel data have

More information