5.5 Regression Estimation

Size: px
Start display at page:

Download "5.5 Regression Estimation"

Transcription

1 5.5 Regression Estimation Assume a SRS of n pairs (x, y ),..., (x n, y n ) is selected from a population of N pairs of (x, y) data. The goal of regression estimation is to take advantage of a linear relationship between x and y to improve estimation of the t y or y U. Unlike ratio estimation, there is no assumption of a zero intercept in the linear relationship or a positive slope. We assume a linear form for the relationship between y and x: y i = B 0 + B x i + ɛ i (53) for intercept B 0, slope B, and ɛ i is the deviation between y i and B 0 + B x i. We assume that (i) the mean of the ɛ i s is zero and (ii) are uncorrelated with the x i s. This implies that there is no systematic relationship between the e i s and x i s. For example, we do not want the variance to increase (or decrease) with the mean Estimating y U and t y To estimate y U, we first must get estimates B 0 and B of the true intercept B 0 and slope B. We use the least squares estimates n B 0 = y B i= x B = (x i x)(y i y) n i= (x = n n i= x iy i ( n i= x i)( n i= y i) i x) 2 n n i= x2 i ( n i= x i) 2 When x U is known, our estimate ŷ reg of the population mean is: ŷ reg = B 0 + B x U (54) When x U is unknown, replace x U with x in (54). Then the estimate is ŷ reg = y. The estimated variance of ŷ reg in (54) is V (ŷ reg ) = N n N n(n 2) n i= (55) If N is unknown, but N is large relative to n (or, n/n is small), then the f.p.c. (N n)/n. Some researchers will replace (N n)/n with in the variance formula in (55): n V (ŷ reg ) (y i B 0 B x i ) 2 (56) i= An alternative formula for calculation: n (y i B 0 B x i ) 2 = i= ( n n ) yi 2 ny 2 B 2 x 2 i nx 2 i= i= (57) which can be substituted into (55) to get the estimated variance V (ŷ reg ) [ n ( n )] N n V (ŷ reg ) = yi 2 ny 2 Nn(n 2) B 2 x 2 i nx 2 i= i= (58) 9

2 An approximate 00( α) confidence interval for y U is ŷ reg ± t V (ŷ reg ) where t is the upper α/2 critical value from a t-distribution having n 2 degrees of freedom. By multiplying ŷ reg in (54) by N, an estimator t reg of the population total t y is: t reg = Nŷ reg = N( B 0 + B x U ) = N( y B x + B x U ) (59) = Ny + B (Nx U Nx) = Ny + B (t x Nx) Multiplying V (ŷ reg ) in (55) by N 2 provides the estimated variance of t reg : V ( t reg ) = N(N n) n(n 2) n (y i B 0 B x i ) 2. (60) i= An approximate 00( α) confidence interval for t y is t reg ± t V ( t reg ) where t is the upper α/2 critical value from a t-distribution having n 2 degrees of freedom. Note that n (y i B 0 B x i ) 2 is the sum of squared residuals from a simple linear i= regression. That is, n (y i B 0 B x i ) 2 = SSE for the least squares regression line. i= Therefore, we could fit a regression model using a statistics package, and substitute the value of SSE or MSE into (55) and get V (ŷ reg ) = V ( t reg ) = (6) Example: The Florida Game and Freshwater Fish Commission is interested in estimating the weights of alligators using length measurements which are easier to observe. The population size N is unknown but is large enough so that ignoring the finite population correction will have a negligible effect on estimation. That is, N n (see Equation (56)). A random sample of 22 N alligators yielded the following weight (in pounds) and length (in inches) data: Alligator Length Weight Alligator Length Weight

3 x = x i = 735 x 2 i = y = yi = 502 y 2 i = 9762 xi y i = Summary values for n = 25 x = Summary xi = 224 values for x 2 n i = x = y = 08.2 xi = yi 735 = 2705 x 2 i = y 2 i = xi y i = y = yi = 502 y 2 i = 9762 xi y i = Because it is much easier to collect data on alligator length, there is a lot of available data Because on length. it is much Assume easier to that collect the available data on alligator data indicates length, there that is thea mean lot of available alligator length xdata U on 90length. inches. Assume Estimate thathe themean available alligator data indicates weight y U that using the regression mean alligator estimator length ŷreg. x U 90 inches. Estimate the mean alligator weight y U using regression estimator ŷ reg. Regression output from MINITAB for the regression of alligator weight vs length (n = 22). Regression output from MINITAB for the regression of The alligator regression weight equation vs length is (n = 22). weight = length The regression equation is weight = length S =.6352 R-Sq = 84.3 % R-Sq(adj) = 83.5 % S =.6352 R-Sq = 84.3 % R-Sq(adj) = 83.5 % Analysis of Variance Analysis of Variance Source DF SS MS F P Regression Source DF SS MS F P0.000 Error Regression Total Error Total

4 Suppose the sample size was actually n = 25 and included the following 3 measurements. Estimate the mean alligator weight y U using the regression estimator ŷ reg. Suppose the sample size was actually n = 25 and included the following 3 measurements. Alligator Length Weight Estimate the mean alligator weight y 23 U using the regression estimator ŷreg Alligator 25 Length 4 Weight Summary 24 values 28 for n 366 = 25 x = xi = 224 x 2 i = y = 08.2 yi = 2705 y 2 97 i = xi y i = Do you have any concerns about using a regression estimator for this data? If so, how can Do you have any concerns about using a regression estimator for this data? If so, how can we adjust our analysis to account for it? we adjust our analysis to account for it? Regression output from MINITAB for the regression of Regression output from MINITAB for the regression of alligator weight vs length (n = 25). alligator weight vs length (n 25). The The regression regression equation equation is is weight weight = = length length S S = = R-Sq R-Sq = = % % R-Sq(adj) R-Sq(adj) = = % % Analysis Analysis of of Variance Variance Source Source DF DF SS SS MS MS F F P P Regression Regression Error Error Total Total

5 5.5.2 Extension to Multiple Regression We can generalize the simple linear regression model to a multiple linear regression model with Case : k different regression variables x, x 2,..., x k with model y = B 0 + (62) Case 2: A k th -order polynomial in regression variable x with model y = B 0 + B x + B 2 x B k x k + ɛ = (63) For Case :. Find the least-squares estimates ( B 0, B,..., B k ). This will produce the prediction model: ŷ = B 0 + (64) 2. To estimate y U, replace each variable with its mean: ŷ reg = B 0 + (65) where x Ui is the mean of x i. For Case 2:. Find the least-squares estimates ( B 0, B,..., B k ). This will produce the prediction model: 2. To estimate y U, replace x with its mean: ŷ = B 0 + B x + B 2 x B k x k (66) ŷ reg = B 0 + B x U + B 2 x 2 U + + B k x k U (67) For Case and for Case 2, we use the SSE from the regression output to calculate the estimated variance of ŷ reg : V (ŷ reg ) = N n SSE = (68) Nn(n k ) where MSE = SSE/(n k ) is the mean squared error from the regression having n k degrees of freedom for the Error term. 23

6 Example: Example: Use Use the the data data from from the the Florida Florida Game Game and and Freshwater Freshwater Fish Fish Commission Commission example example Fit Fitthe thequadratic regression model model y y = B 0 x 2 x B x + B 2 x 2 + ɛ. ɛ. Estimate the themean alligator weight yy UU using the themultiple regression estimator ŷreg. ŷ. Find aa95% confidence interval for fory y U U.. Regression output from MINITAB for the quadratic regression of of alligator weight vs length (n = 25). The Theregression equation is is weight = = length length**2 S S = = R-Sq = = 98.7 %% R-Sq(adj) = = 98.6 %% Analysis of ofvariance Source DF DF SS SS MS MS FF PP Regression Error Total Source DF DF Seq SeqSS SS FF PP Linear Quadratic

7 5.6 Regression Analyses using R and SAS 5.6. Example with unknown N R code for Regression Estimation (N unknown but large) CASE : We will fit the model gatorwgt = B 0 + B gatorlen using the original 22 pairs of alligator length (x) and weight (y) data. We will also estimate the mean weight y U using regression estimation (assuming x U = 90 pounds). The df = n 2 = 20 because the model has 2 parameters. Unfortunately, the survreg function in R uses Nn(n ) in the estimated variance formula (see equation (54) instead of Nn(n 2) for linear regression. If the regression model has p parameters, we want Nn(n p) in the variance formula. Thus, if we multiply the standard error given in R by (n )/(n p) = (n )/df where df = n p the regression degrees of freedom we get the same analysis as SAS. R code for Regression Estimation: Case library(survey) # Regression estimation x <- c(94,74,82,58,86,94,63,86,69,72,85,86,88,72,74,6,90,89,68,76,78,90) y <- c(30,5,80,28,80,0,33,90,36,38,84,83,70,6,54,44,06,84,39,42,57,02) ci.level =.95 # confidence level <--- p = 2 # number of parameters in the model <--- n = length(x) df = n - p corr.fctr = sqrt((n-)/df) regdata <- data.frame(x,y) <--- regdsgn <- svydesign(id=~,data=regdata) regdsgn svyreg <- svyglm(y~x,design=regdsgn) <-- enter regression model svyreg atmean <- c(,90) <-- enter x mean = 90 # Assign names to each term in the model names(atmean) <- c("(intercept)","x") atmean <- as.data.frame(t(atmean)) <-- two model terms meanwgt <- predict(svyreg,newdata=atmean,total=) meanwgt # Enter standard error se.meanwgt = <-- value output by meanwgt # Compute confidence interval for mean with correction factor meanwgt + c(-,)*corr.fctr*qt(ci.level+(-ci.level)/2,df)*se.meanwgt 25

8 R output for Regression Estimation: Case Independent Sampling design (with replacement) svydesign(id = ~, data = regdata) Coefficients: (Intercept) x Degrees of Freedom: 2 Total (i.e. Null); Null Deviance: 7220 Residual Deviance: 2708 AIC: Residual > meanwgt link SE > > # Enter standard error > se.meanwgt = > > # Compute confidence interval for mean with correction factor [] CASE 2: We will fit the model gatorwgt = B 0 + B gatorlen using the 25 pairs of alligator length (x) and weight (y) data. We will again estimate the mean weight y U using regression estimation (assuming x U = 90 pounds). R code for Regression Estimation: Case 2 library(survey) # Regression estimation x <- c(94,74,82,58,86,94,63,86,69,72,85,86,88,72,74,6,90,89,68,76,78,90, 47,28,4) y <- c(30,5,80,28,80,0,33,90,36,38,84,83,70,6,54,44,06,84,39,42,57,02, 640,366,97) ci.level =.95 # confidence level <-- p = 2 # number of parameters in the model <-- n = length(x) df = n - p corr.fctr = sqrt((n-)/df) regdata <- data.frame(x,y) <-- regdsgn <- svydesign(id=~,data=regdata) regdsgn svyreg <- svyglm(y~x,design=regdsgn) <-- enter regression model svyreg atmean <- c(,90) <-- enter x mean = 90 26

9 # Assign names to each term in the model names(atmean) <- c("(intercept)","x") atmean <- as.data.frame(t(atmean)) <-- two model terms meanwgt <- predict(svyreg,newdata=atmean,total=) meanwgt # Enter standard error se.meanwgt = <-- value output by meanwgt # Compute confidence interval for mean with correction factor meanwgt + c(-,)*corr.fctr*qt(ci.level+(-ci.level)/2,df)*se.meanwgt R output for Regression Estimation: Case 2 Independent Sampling design (with replacement) Coefficients: (Intercept) x Degrees of Freedom: 24 Total (i.e. Null); Null Deviance: Residual Deviance: 6700 AIC: Residual > meanwgt link SE > # Enter standard error > se.meanwgt = > # Compute confidence interval for mean with correction factor [] CASE 3: We will fit the model gatorwgt = B 0 + B gatorlen + B 2 gatorlen 2 using the original 22 pairs of alligator length (x) and weight (y) data. We will estimate the mean weight y U using quadratic model regression estimation (assuming x U = 90 pounds). The df = n 3 = 9 because the model has 3 parameters. R code for regression estimation: Case 3 library(survey) # Regression estimation x <- c(94,74,82,58,86,94,63,86,69,72,85,86,88,72,74,6,90,89,68,76,78,90) y <- c(30,5,80,28,80,0,33,90,36,38,84,83,70,6,54,44,06,84,39,42,57,02) 27

10 xsq <- x^2 xsq <-- create x-squared values for quadratic model ci.level =.95 # confidence level <-- p = 3 # number of parameters in the model <-- n = length(x) df = n - p corr.fctr = sqrt((n-)/df) regdata <- data.frame(x,xsq,y) <-- regdsgn <- svydesign(id=~,data=regdata) regdsgn svyreg <- svyglm(y~x+xsq,design=regdsgn) <-- svyreg atmean <- c(,90,800) <-- # Assign names to each term in the model names(atmean) <- c("(intercept)","x","xsq") <-- atmean <- as.data.frame(t(atmean)) meanwgt <- predict(svyreg,newdata=atmean,total=) meanwgt # Enter standard error se.meanwgt = <-- # Compute confidence interval for mean with correction factor meanwgt + c(-,)*corr.fctr*qt(ci.level+(-ci.level)/2,df)*se.meanwgt R output for regression estimation: Case 3 Independent Sampling design (with replacement) Coefficients: (Intercept) x xsq Degrees of Freedom: 2 Total (i.e. Null); Null Deviance: 7220 Residual Deviance: 628 AIC: Residual > meanwgt link SE > # Enter standard error > se.meanwgt = > # Compute confidence interval for mean with correction factor []

11 CASE 4: We will fit the model gatorwgt = B 0 + B gatorlen + B 2 gatorlen 2 using the 25 pairs of alligator length (x) and weight (y) data. We will again estimate the mean weight y U using quadratic model regression estimation (assuming x U = 90 pounds). R code for regression estimation: Case 4 library(survey) # Regression estimation x <- c(94,74,82,58,86,94,63,86,69,72,85,86,88,72,74,6,90,89,68,76,78,90, 47,28,4) y <- c(30,5,80,28,80,0,33,90,36,38,84,83,70,6,54,44,06,84,39,42,57,02, 640,366,97) xsq <- x^2 xsq <-- create x-squared values for quadratic model ci.level =.95 # confidence level <-- p = 3 # number of parameters in the model <-- n = length(x) df = n - p corr.fctr = sqrt((n-)/df) regdata <- data.frame(x,xsq,y) <-- regdsgn <- svydesign(id=~,data=regdata) regdsgn svyreg <- svyglm(y~x+xsq,design=regdsgn) <-- svyreg atmean <- c(,90,800) <-- # Assign names to each term in the model names(atmean) <- c("(intercept)","x","xsq") <-- atmean <- as.data.frame(t(atmean)) meanwgt <- predict(svyreg,newdata=atmean,total=) meanwgt # Enter standard error se.meanwgt = <-- # Compute confidence interval for mean with correction factor meanwgt + c(-,)*corr.fctr*qt(ci.level+(-ci.level)/2,df)*se.meanwgt 29

12 R output for regression estimation: Case 4 Independent Sampling design (with replacement) svydesign(id = ~, data = regdata) Coefficients: (Intercept) x xsq Degrees of Freedom: 24 Total (i.e. Null); Null Deviance: Residual Deviance: 5273 AIC: Residual > meanwgt link SE > # Enter standard error > se.meanwgt = > # Compute confidence interval for mean with correction factor [] SAS code for Regression Estimation (N unknown but large) We will reproduce the R analyses for Case (linear regression with n = 22 points) and Case 4 (quadratic regression with n = 25 points). For Case 2 and Case 4, you just have to change the data set. CASE : We will fit the model gatorwgt = B 0 + B gatorlen using the original 22 pairs of alligator length (x) and weight (y) data. We will also estimate the mean weight y U using regression estimation (assuming x U = 90 pounds). The df = n 2 = 20 because the model has 2 parameters. data in; input gatorid gatorlen cards; ; /* Use proc surveyreg to estimate the average alligator weight */ proc surveyreg data=in ; model gatorwgt = gatorlen / clparm solution df=20; /* To estimate a mean substitute for intercept, mean(x) for gatorwgt */ estimate Average gator weight intercept gatorlen 90; run; 30

13 SAS output for Case The SURVEYREG Procedure Regression Analysis for Dependent Variable gatorwgt Data Summary Number of Observations 22 Mean of gatorwgt Sum of gatorwgt Fit Statistics R-square Root MSE.6352 Denominator DF 20 Tests of Model Effects Effect Num DF F Value Pr > F Model <.000 Intercept <.000 gatorlen <.000 NOTE: The denominator degrees of freedom for the F tests is 20. Estimated Regression Coefficients Standard 95% Confidence Parameter Estimate Error t Value Pr > t Interval Intercept < gatorlen < Analysis of Estimable Functions Standard Parameter Estimate Error t Value Pr > t Average gator weight <.000 Analysis of Estimable Functions Parameter 95% Confidence Interval Average gator weight NOTE: The denominator degrees of freedom for the t tests is 20. 3

14 CASE 4: We will fit the model gatorwgt = B 0 + B gatorlen + B 2 gatorlen 2 using the 25 pairs of alligator length (x) and weight (y) data. We will again estimate the mean weight y U using quadratic model regression estimation (assuming x U = 90 pounds). This is the data for Case 4. Change df=9 to df=22 in the SAS code. data in; input gatorid gatorlen gatorsq = gatorlen**2; cards; ; SAS output for Case 4 The SURVEYREG Procedure Regression Analysis for Dependent Variable gatorwgt Data Summary Number of Observations 25 Mean of gatorwgt Sum of gatorwgt Fit Statistics R-square Root MSE Denominator DF 22 Tests of Model Effects Effect Num DF F Value Pr > F Model <.000 Intercept <.000 gatorlen 8.42 <.000 gatorsq <.000 NOTE: The denominator degrees of freedom for the F tests is 22. Estimated Regression Coefficients Standard 95% Confidence Parameter Estimate Error t Value Pr > t Interval Intercept < gatorlen < gatorsq < NOTE: The denominator degrees of freedom for the t tests is 22. Analysis of Estimable Functions Standard Parameter Average gator weight Estimate Error t Value 8.05 Pr > t <.000 Analysis of Estimable Functions 95% Confidence Parameter Interval Average gator weight NOTE: The denominator degrees of freedom for the t tests is

15 5.6.2 Example with known N and t x The data is from Lohr Example 4.9 (page 39-4): To estimate the number of dead trees in a study area, we divide the study area into 00 square plots and count the number of dead trees in a photograph of each plot. Photo counts can be made quickly, but sometimes a tree is misclassified or not detected. A SRS of 25 of the plots for field counts of dead trees (ground truthing) is taken. The population mean number of dead trees per plot from the photo counts is.3. The data is given in the R code below. Estimate the actual mean number of dead trees per plot and the total number of dead trees in the study area. R code for regression estimation t x and N known library(survey) # Regression estimation with known tx and N (Lohr Example 4.9) x <- c(0,2,7,3,3,6,7,6,5,0,4,2,0,5,2,0,0,9,6,,7,9,,0,0) y <- c(5,4,9,4,8,5,8,5,3,5,,5,2,8,3,9,,2,9,2,3,,0,9,8) ci.level =.95 # confidence level <--- p = 2 # number of parameters in the model <--- n = length(x) df = n - p corr.fctr = sqrt((n-)/df) fpc <- c(rep(00,n)) regdata <- data.frame(x,y) regdsgn <- svydesign(id=~,fpc=~fpc,data=regdata) regdsgn svyreg <- svyglm(y~x,design=regdsgn) svyreg atmean <- c(,.3) < = N <--- include fpc <-- evaluate at x-mean # Assign names to each term in the model names(atmean) <- c("(intercept)","x") atmean <- as.data.frame(t(atmean)) meanwgt <- predict(svyreg,newdata=atmean,total=) meanwgt # Enter standard error se.meanwgt =.48 <-- input s.e. from R # Compute confidence interval for mean with correction factor meanwgt + c(-,)*corr.fctr*qt(ci.level+(-ci.level)/2,df)*se.meanwgt # Compute confidence interval for a total with correction factor ttl = 00 # ttl is the total number of units sumwgt <- svycontrast(meanwgt,ttl) sumwgt + c(-,)*corr.fctr*qt(ci.level+(-ci.level)/2,df)*se.meanwgt*ttl 33

16 R output for regression estimation t x and N known Coefficients: (Intercept) x Degrees of Freedom: 24 Total (i.e. Null); Null Deviance: 28.2 Residual Deviance: 33.2 AIC: Residual > meanwgt link SE > # Enter standard error > se.meanwgt =.48 > # Compute confidence interval for mean with correction factor [] > # Compute confidence interval for a total with correction factor [] SAS code for regression estimation t x and N known (supplemental) DATA trees; INPUT photo N = 00; DATALINES; ; /* Use proc surveyreg to estimate the total number of trees */ PROC SURVEYREG DATA=trees TOTAL=00; MODEL field = photo / clparm solution df=23; ESTIMATE Total field trees intercept 00 photo 30; /* substitute N for intercept, t_x for photo */ ESTIMATE Mean field trees intercept 00 photo 30 / divisor=00; run; 34

17 SAS output for regression estimation t x and N known The SURVEYREG Procedure Regression Analysis for Dependent Variable field Data Summary Number of Observations 25 Sum of Weights Weighted Mean of field Weighted Sum of field 56.0 Fit Statistics R-square Root MSE Denominator DF 23 Tests of Model Effects Effect Num DF F Value Pr > F Model <.000 Intercept photo <.000 NOTE: The denominator degrees of freedom for the F tests is 23. Estimated Regression Coefficients Standard 95% Confidence Parameter Estimate Error t Value Pr > t Interval Intercept photo < NOTE: The denominator degrees of freedom for the t tests is 23. Analysis of Estimable Functions Standard Parameter Estimate Error t Value Pr > t Total field trees <.000 Mean field trees <.000 Regression Analysis for Dependent Variable field Analysis of Estimable Functions Parameter 95% Confidence Interval Total field trees Mean field trees NOTE: The denominator degrees of freedom for the t tests is

18 5.7 ŷ U vs ŷ reg or t vs t reg Which is better? SRS or Regression Estimation? When does regression estimation provide better estimates of y U or t y than the simple random sample (SRS) estimator? Recall that for least squares regression we have SS(total) = SS(regression)+ SSE. Then If x and y are strongly correlated, most of the variability in the y values can be explained by the regression and there will be very little random variability about the regression line. That is, M S(regression) is large relative to M SE. If x and y are weakly correlated, very little of the variability in the y values can be explained by the regression and most of the variability is random variability about the regression line. That is, M S(regresssion) is small relative to M SE. Thus, for a moderate to strong linear relationship between x and y, regression estimation is recommended over SRS estimation. 5.8 Sample Size Estimation for Ratio and Regression Estimation To determine the sample size formulas for ratio and regression estimation, we will use the same approach that was used for determining a sample size for simple random sampling:. Specify a maximum allowable difference d for the parameter we want to estimate. This is equivalent to stating the largest margin of error the researcher would want for a confidence interval. 2. Specify α (where 00( α)% is the confidence level for the confidence interval). 3. Specify a prior estimate of a variance V ( θ) where θ is the estimator of parameter θ. θ could be y U or t y or population ratio R = y U /x U (in ratio estimation). 4. Set the margin of error formula equal to d, and solve for n. Sample Size Determination for Ratio Estimation: From equations (50), (47), (48), and (49), the margin of error formulas for parameters B, y U, and t y with estimated variance s 2 e are For R : For y U : For t y : (N ) n s 2 z α/2 V (r) = z e α/2 Nx 2 U n (N ) n s 2 z α/2 V ( µ r ) = z e α/2 N n z α/2 V ( t r ) = z α/2 N(N n) s2 e n 36

19 After setting the margin of error = d, and solving for n yields For B : n = n 0 + N where n 0 = where d is the maximum allowable difference for estimating the ratio B. It is assumed that you know x U. If you do not know x U, you must provide an estimate x U. For y U : n = For t y : n = n 0 + N where n 0 = estimating the mean y U. n 0 + N where n 0 = for estimating the total t y. where d is the maximum allowable difference for where d is the maximum allowable difference s 2 e is an estimate of the variability of the (x, y) points about a line having a zero intercept. You can use s 2 e from a prior study, a pilot study, or double sampling. Example of sample size determination: Using the information from the pulpwood and dry wood example, estimate the sample size required so that t y is within 200 kg of t y with a probability of.95. Assume the new truckloads contain 000 bundles of wood. Sample Size Estimation for Regression Estimation: From equation (6), (47) and the confidence interval formulas for y U and t y, the margin of error formulas for parameters y U, and t y with estimated variance MSE are For y U : For t y : z α/2 V (ŷ reg ) = z α/2 N n Nn MSE z α/2 N(N n) V ( τ reg ) = z α/2 MSE n After setting the margin of error = d, and solving for n yields For y U : n = For t y : n = n 0 + N where n 0 = for estimating the mean y U. n 0 + N where n 0 = for estimating the total t y. where d is the maximum allowable difference where d is the maximum allowable difference The MSE for the regression is an estimate of the variability of the (x, y) points about the regression line. You can use MSE from a prior study, a pilot study, or double sampling. 37

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. ST512 Fall Quarter, 2005 Exam 1 Name: Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. 1. (42 points) A random sample of n = 30 NBA basketball

More information

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010 THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE

More information

Regression Analysis and Linear Regression Models

Regression Analysis and Linear Regression Models Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical

More information

Introduction to Statistical Analyses in SAS

Introduction to Statistical Analyses in SAS Introduction to Statistical Analyses in SAS Programming Workshop Presented by the Applied Statistics Lab Sarah Janse April 5, 2017 1 Introduction Today we will go over some basic statistical analyses in

More information

Week 4: Simple Linear Regression III

Week 4: Simple Linear Regression III Week 4: Simple Linear Regression III Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Goodness of

More information

Factorial ANOVA with SAS

Factorial ANOVA with SAS Factorial ANOVA with SAS /* potato305.sas */ options linesize=79 noovp formdlim='_' ; title 'Rotten potatoes'; title2 ''; proc format; value tfmt 1 = 'Cool' 2 = 'Warm'; data spud; infile 'potato2.data'

More information

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors Week 10: Autocorrelated errors This week, I have done one possible analysis and provided lots of output for you to consider. Case study: predicting body fat Body fat is an important health measure, but

More information

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions) THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533 MIDTERM EXAMINATION: October 14, 2005 Instructor: Val LeMay Time: 50 minutes 40 Marks FRST 430 50 Marks FRST 533 (extra questions) This examination

More information

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem. STAT 2607 REVIEW PROBLEMS 1 REMINDER: On the final exam 1. Word problems must be answered in words of the problem. 2. "Test" means that you must carry out a formal hypothesis testing procedure with H0,

More information

Bivariate (Simple) Regression Analysis

Bivariate (Simple) Regression Analysis Revised July 2018 Bivariate (Simple) Regression Analysis This set of notes shows how to use Stata to estimate a simple (two-variable) regression equation. It assumes that you have set Stata up on your

More information

Example 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1

Example 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1 Panel data set Consists of n entities or subjects (e.g., firms and states), each of which includes T observations measured at 1 through t time period. total number of observations : nt Panel data have

More information

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008 MIT OpenCourseWare http://ocw.mit.edu.83j / 6.78J / ESD.63J Control of Manufacturing Processes (SMA 633) Spring 8 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

IQR = number. summary: largest. = 2. Upper half: Q3 =

IQR = number. summary: largest. = 2. Upper half: Q3 = Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number

More information

Regression on the trees data with R

Regression on the trees data with R > trees Girth Height Volume 1 8.3 70 10.3 2 8.6 65 10.3 3 8.8 63 10.2 4 10.5 72 16.4 5 10.7 81 18.8 6 10.8 83 19.7 7 11.0 66 15.6 8 11.0 75 18.2 9 11.1 80 22.6 10 11.2 75 19.9 11 11.3 79 24.2 12 11.4 76

More information

Compare Linear Regression Lines for the HP-67

Compare Linear Regression Lines for the HP-67 Compare Linear Regression Lines for the HP-67 by Namir Shammas This article presents an HP-67 program that calculates the linear regression statistics for two data sets and then compares their slopes and

More information

PSY 9556B (Feb 5) Latent Growth Modeling

PSY 9556B (Feb 5) Latent Growth Modeling PSY 9556B (Feb 5) Latent Growth Modeling Fixed and random word confusion Simplest LGM knowing how to calculate dfs How many time points needed? Power, sample size Nonlinear growth quadratic Nonlinear growth

More information

Week 4: Simple Linear Regression II

Week 4: Simple Linear Regression II Week 4: Simple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Algebraic properties

More information

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2010 1 / 26 Additive predictors

More information

Section 2.2: Covariance, Correlation, and Least Squares

Section 2.2: Covariance, Correlation, and Least Squares Section 2.2: Covariance, Correlation, and Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 A Deeper

More information

Section 2.3: Simple Linear Regression: Predictions and Inference

Section 2.3: Simple Linear Regression: Predictions and Inference Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple

More information

Robust Linear Regression (Passing- Bablok Median-Slope)

Robust Linear Regression (Passing- Bablok Median-Slope) Chapter 314 Robust Linear Regression (Passing- Bablok Median-Slope) Introduction This procedure performs robust linear regression estimation using the Passing-Bablok (1988) median-slope algorithm. Their

More information

Variable selection is intended to select the best subset of predictors. But why bother?

Variable selection is intended to select the best subset of predictors. But why bother? Chapter 10 Variable Selection Variable selection is intended to select the best subset of predictors. But why bother? 1. We want to explain the data in the simplest way redundant predictors should be removed.

More information

SAS data statements and data: /*Factor A: angle Factor B: geometry Factor C: speed*/

SAS data statements and data: /*Factor A: angle Factor B: geometry Factor C: speed*/ STAT:5201 Applied Statistic II (Factorial with 3 factors as 2 3 design) Three-way ANOVA (Factorial with three factors) with replication Factor A: angle (low=0/high=1) Factor B: geometry (shape A=0/shape

More information

STAT 705 Introduction to generalized additive models

STAT 705 Introduction to generalized additive models STAT 705 Introduction to generalized additive models Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 22 Generalized additive models Consider a linear

More information

Subset Selection in Multiple Regression

Subset Selection in Multiple Regression Chapter 307 Subset Selection in Multiple Regression Introduction Multiple regression analysis is documented in Chapter 305 Multiple Regression, so that information will not be repeated here. Refer to that

More information

Cross-validation and the Bootstrap

Cross-validation and the Bootstrap Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. These methods refit a model of interest to samples formed from the training set,

More information

Multiple Regression White paper

Multiple Regression White paper +44 (0) 333 666 7366 Multiple Regression White paper A tool to determine the impact in analysing the effectiveness of advertising spend. Multiple Regression In order to establish if the advertising mechanisms

More information

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes. Resources for statistical assistance Quantitative covariates and regression analysis Carolyn Taylor Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC January 24, 2017 Department

More information

Two-Stage Least Squares

Two-Stage Least Squares Chapter 316 Two-Stage Least Squares Introduction This procedure calculates the two-stage least squares (2SLS) estimate. This method is used fit models that include instrumental variables. 2SLS includes

More information

Instruction on JMP IN of Chapter 19

Instruction on JMP IN of Chapter 19 Instruction on JMP IN of Chapter 19 Example 19.2 (1). Download the dataset xm19-02.jmp from the website for this course and open it. (2). Go to the Analyze menu and select Fit Model. Click on "REVENUE"

More information

Lab #9: ANOVA and TUKEY tests

Lab #9: ANOVA and TUKEY tests Lab #9: ANOVA and TUKEY tests Objectives: 1. Column manipulation in SAS 2. Analysis of variance 3. Tukey test 4. Least Significant Difference test 5. Analysis of variance with PROC GLM 6. Levene test for

More information

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model Topic 16 - Other Remedies Ridge Regression Robust Regression Regression Trees Outline - Fall 2013 Piecewise Linear Model Bootstrapping Topic 16 2 Ridge Regression Modification of least squares that addresses

More information

Recall the expression for the minimum significant difference (w) used in the Tukey fixed-range method for means separation:

Recall the expression for the minimum significant difference (w) used in the Tukey fixed-range method for means separation: Topic 11. Unbalanced Designs [ST&D section 9.6, page 219; chapter 18] 11.1 Definition of missing data Accidents often result in loss of data. Crops are destroyed in some plots, plants and animals die,

More information

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business Section 3.4: Diagnostics and Transformations Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Regression Model Assumptions Y i = β 0 + β 1 X i + ɛ Recall the key assumptions

More information

UNIT 1: NUMBER LINES, INTERVALS, AND SETS

UNIT 1: NUMBER LINES, INTERVALS, AND SETS ALGEBRA II CURRICULUM OUTLINE 2011-2012 OVERVIEW: 1. Numbers, Lines, Intervals and Sets 2. Algebraic Manipulation: Rational Expressions and Exponents 3. Radicals and Radical Equations 4. Function Basics

More information

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics

More information

Cell means coding and effect coding

Cell means coding and effect coding Cell means coding and effect coding /* mathregr_3.sas */ %include 'readmath.sas'; title2 ''; /* The data step continues */ if ethnic ne 6; /* Otherwise, throw the case out */ /* Indicator dummy variables

More information

Model Selection and Inference

Model Selection and Inference Model Selection and Inference Merlise Clyde January 29, 2017 Last Class Model for brain weight as a function of body weight In the model with both response and predictor log transformed, are dinosaurs

More information

14.2 The Regression Equation

14.2 The Regression Equation 14.2 The Regression Equation Tom Lewis Fall Term 2009 Tom Lewis () 14.2 The Regression Equation Fall Term 2009 1 / 12 Outline 1 Exact and inexact linear relationships 2 Fitting lines to data 3 Formulas

More information

A. Incorrect! This would be the negative of the range. B. Correct! The range is the maximum data value minus the minimum data value.

A. Incorrect! This would be the negative of the range. B. Correct! The range is the maximum data value minus the minimum data value. AP Statistics - Problem Drill 05: Measures of Variation No. 1 of 10 1. The range is calculated as. (A) The minimum data value minus the maximum data value. (B) The maximum data value minus the minimum

More information

CH5: CORR & SIMPLE LINEAR REFRESSION =======================================

CH5: CORR & SIMPLE LINEAR REFRESSION ======================================= STAT 430 SAS Examples SAS5 ===================== ssh xyz@glue.umd.edu, tap sas913 (old sas82), sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm CH5: CORR & SIMPLE LINEAR REFRESSION =======================================

More information

Stat 5100 Handout #6 SAS: Linear Regression Remedial Measures

Stat 5100 Handout #6 SAS: Linear Regression Remedial Measures Stat 5100 Handout #6 SAS: Linear Regression Remedial Measures Example: Age and plasma level for 25 healthy children in a study are reported. Of interest is how plasma level depends on age. (Text Table

More information

Unit 5: Estimating with Confidence

Unit 5: Estimating with Confidence Unit 5: Estimating with Confidence Section 8.3 The Practice of Statistics, 4 th edition For AP* STARNES, YATES, MOORE Unit 5 Estimating with Confidence 8.1 8.2 8.3 Confidence Intervals: The Basics Estimating

More information

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 4, 217 PDF file location: http://www.murraylax.org/rtutorials/regression_intro.pdf HTML file location:

More information

9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10

9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10 St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 9: R 9.1 Random coefficients models...................... 1 9.1.1 Constructed data........................

More information

Getting Correct Results from PROC REG

Getting Correct Results from PROC REG Getting Correct Results from PROC REG Nate Derby Stakana Analytics Seattle, WA, USA SUCCESS 3/12/15 Nate Derby Getting Correct Results from PROC REG 1 / 29 Outline PROC REG 1 PROC REG 2 Nate Derby Getting

More information

Concept of Curve Fitting Difference with Interpolation

Concept of Curve Fitting Difference with Interpolation Curve Fitting Content Concept of Curve Fitting Difference with Interpolation Estimation of Linear Parameters by Least Squares Curve Fitting by Polynomial Least Squares Estimation of Non-linear Parameters

More information

Centering and Interactions: The Training Data

Centering and Interactions: The Training Data Centering and Interactions: The Training Data A random sample of 150 technical support workers were first given a test of their technical skill and knowledge, and then randomly assigned to one of three

More information

Normal Plot of the Effects (response is Mean free height, Alpha = 0.05)

Normal Plot of the Effects (response is Mean free height, Alpha = 0.05) Percent Normal Plot of the Effects (response is Mean free height, lpha = 5) 99 95 Effect Type Not Significant Significant 90 80 70 60 50 40 30 20 E F actor C D E Name C D E 0 5 E -0.3-0. Effect 0. 0.3

More information

Today is the last day to register for CU Succeed account AND claim your account. Tuesday is the last day to register for my class

Today is the last day to register for CU Succeed account AND claim your account. Tuesday is the last day to register for my class Today is the last day to register for CU Succeed account AND claim your account. Tuesday is the last day to register for my class Back board says your name if you are on my roster. I need parent financial

More information

1. What specialist uses information obtained from bones to help police solve crimes?

1. What specialist uses information obtained from bones to help police solve crimes? Mathematics: Modeling Our World Unit 4: PREDICTION HANDOUT VIDEO VIEWING GUIDE H4.1 1. What specialist uses information obtained from bones to help police solve crimes? 2.What are some things that can

More information

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown Z-TEST / Z-STATISTIC: used to test hypotheses about µ when the population standard deviation is known and population distribution is normal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Rebecca C. Steorts, Duke University STA 325, Chapter 3 ISL 1 / 49 Agenda How to extend beyond a SLR Multiple Linear Regression (MLR) Relationship Between the Response and Predictors

More information

1. Answer: x or x. Explanation Set up the two equations, then solve each equation. x. Check

1. Answer: x or x. Explanation Set up the two equations, then solve each equation. x. Check Thinkwell s Placement Test 5 Answer Key If you answered 7 or more Test 5 questions correctly, we recommend Thinkwell's Algebra. If you answered fewer than 7 Test 5 questions correctly, we recommend Thinkwell's

More information

Unit 8 SUPPLEMENT Normal, T, Chi Square, F, and Sums of Normals

Unit 8 SUPPLEMENT Normal, T, Chi Square, F, and Sums of Normals BIOSTATS 540 Fall 017 8. SUPPLEMENT Normal, T, Chi Square, F and Sums of Normals Page 1 of Unit 8 SUPPLEMENT Normal, T, Chi Square, F, and Sums of Normals Topic 1. Normal Distribution.. a. Definition..

More information

Graphs and Linear Functions

Graphs and Linear Functions Graphs and Linear Functions A -dimensional graph is a visual representation of a relationship between two variables given by an equation or an inequality. Graphs help us solve algebraic problems by analysing

More information

A Multiple-Line Fitting Algorithm Without Initialization Yan Guo

A Multiple-Line Fitting Algorithm Without Initialization Yan Guo A Multiple-Line Fitting Algorithm Without Initialization Yan Guo Abstract: The commonest way to fit multiple lines is to use methods incorporate the EM algorithm. However, the EM algorithm dose not guarantee

More information

1. Solve the following system of equations below. What does the solution represent? 5x + 2y = 10 3x + 5y = 2

1. Solve the following system of equations below. What does the solution represent? 5x + 2y = 10 3x + 5y = 2 1. Solve the following system of equations below. What does the solution represent? 5x + 2y = 10 3x + 5y = 2 2. Given the function: f(x) = a. Find f (6) b. State the domain of this function in interval

More information

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

Regression. Dr. G. Bharadwaja Kumar VIT Chennai Regression Dr. G. Bharadwaja Kumar VIT Chennai Introduction Statistical models normally specify how one set of variables, called dependent variables, functionally depend on another set of variables, called

More information

Section 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business

Section 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business Section 3.2: Multiple Linear Regression II Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Multiple Linear Regression: Inference and Understanding We can answer new questions

More information

STAT:5201 Applied Statistic II

STAT:5201 Applied Statistic II STAT:5201 Applied Statistic II Two-Factor Experiment (one fixed blocking factor, one fixed factor of interest) Randomized complete block design (RCBD) Primary Factor: Day length (short or long) Blocking

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

Cross-validation and the Bootstrap

Cross-validation and the Bootstrap Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. 1/44 Cross-validation and the Bootstrap In the section we discuss two resampling

More information

Estimating R 0 : Solutions

Estimating R 0 : Solutions Estimating R 0 : Solutions John M. Drake and Pejman Rohani Exercise 1. Show how this result could have been obtained graphically without the rearranged equation. Here we use the influenza data discussed

More information

CSC 328/428 Summer Session I 2002 Data Analysis for the Experimenter FINAL EXAM

CSC 328/428 Summer Session I 2002 Data Analysis for the Experimenter FINAL EXAM options pagesize=53 linesize=76 pageno=1 nodate; proc format; value $stcktyp "1"="Growth" "2"="Combined" "3"="Income"; data invstmnt; input stcktyp $ perform; label stkctyp="type of Stock" perform="overall

More information

Voluntary State Curriculum Algebra II

Voluntary State Curriculum Algebra II Algebra II Goal 1: Integration into Broader Knowledge The student will develop, analyze, communicate, and apply models to real-world situations using the language of mathematics and appropriate technology.

More information

Section 2.1: Intro to Simple Linear Regression & Least Squares

Section 2.1: Intro to Simple Linear Regression & Least Squares Section 2.1: Intro to Simple Linear Regression & Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 Regression:

More information

Assignment 4 (Sol.) Introduction to Data Analytics Prof. Nandan Sudarsanam & Prof. B. Ravindran

Assignment 4 (Sol.) Introduction to Data Analytics Prof. Nandan Sudarsanam & Prof. B. Ravindran Assignment 4 (Sol.) Introduction to Data Analytics Prof. andan Sudarsanam & Prof. B. Ravindran 1. Which among the following techniques can be used to aid decision making when those decisions depend upon

More information

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data Spatial Patterns We will examine methods that are used to analyze patterns in two sorts of spatial data: Point Pattern Analysis - These methods concern themselves with the location information associated

More information

MAC 1105 Fall Term 2018

MAC 1105 Fall Term 2018 MAC 1105 Fall Term 2018 Each objective covered in MAC 1105 is listed below. Along with each objective is the homework list used with MyMathLab (MML) and a list to use with the text (if you want to use

More information

Week 5: Multiple Linear Regression II

Week 5: Multiple Linear Regression II Week 5: Multiple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Adjusted R

More information

CREATING THE ANALYSIS

CREATING THE ANALYSIS Chapter 14 Multiple Regression Chapter Table of Contents CREATING THE ANALYSIS...214 ModelInformation...217 SummaryofFit...217 AnalysisofVariance...217 TypeIIITests...218 ParameterEstimates...218 Residuals-by-PredictedPlot...219

More information

Statistics Lab #7 ANOVA Part 2 & ANCOVA

Statistics Lab #7 ANOVA Part 2 & ANCOVA Statistics Lab #7 ANOVA Part 2 & ANCOVA PSYCH 710 7 Initialize R Initialize R by entering the following commands at the prompt. You must type the commands exactly as shown. options(contrasts=c("contr.sum","contr.poly")

More information

One Factor Experiments

One Factor Experiments One Factor Experiments 20-1 Overview Computation of Effects Estimating Experimental Errors Allocation of Variation ANOVA Table and F-Test Visual Diagnostic Tests Confidence Intervals For Effects Unequal

More information

Stat 8053, Fall 2013: Additive Models

Stat 8053, Fall 2013: Additive Models Stat 853, Fall 213: Additive Models We will only use the package mgcv for fitting additive and later generalized additive models. The best reference is S. N. Wood (26), Generalized Additive Models, An

More information

Economics Nonparametric Econometrics

Economics Nonparametric Econometrics Economics 217 - Nonparametric Econometrics Topics covered in this lecture Introduction to the nonparametric model The role of bandwidth Choice of smoothing function R commands for nonparametric models

More information

R-Square Coeff Var Root MSE y Mean

R-Square Coeff Var Root MSE y Mean STAT:50 Applied Statistics II Exam - Practice 00 possible points. Consider a -factor study where each of the factors has 3 levels. The factors are Diet (,,3) and Drug (A,B,C) and there are n = 3 observations

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 327 Geometric Regression Introduction Geometric regression is a special case of negative binomial regression in which the dispersion parameter is set to one. It is similar to regular multiple regression

More information

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value. Calibration OVERVIEW... 2 INTRODUCTION... 2 CALIBRATION... 3 ANOTHER REASON FOR CALIBRATION... 4 CHECKING THE CALIBRATION OF A REGRESSION... 5 CALIBRATION IN SIMPLE REGRESSION (DISPLAY.JMP)... 5 TESTING

More information

An introduction to SPSS

An introduction to SPSS An introduction to SPSS To open the SPSS software using U of Iowa Virtual Desktop... Go to https://virtualdesktop.uiowa.edu and choose SPSS 24. Contents NOTE: Save data files in a drive that is accessible

More information

College Prep Algebra II Summer Packet

College Prep Algebra II Summer Packet Name: College Prep Algebra II Summer Packet This packet is an optional review which is highly recommended before entering CP Algebra II. It provides practice for necessary Algebra I topics. Remember: When

More information

Integrated Math I. IM1.1.3 Understand and use the distributive, associative, and commutative properties.

Integrated Math I. IM1.1.3 Understand and use the distributive, associative, and commutative properties. Standard 1: Number Sense and Computation Students simplify and compare expressions. They use rational exponents and simplify square roots. IM1.1.1 Compare real number expressions. IM1.1.2 Simplify square

More information

3. Data Analysis and Statistics

3. Data Analysis and Statistics 3. Data Analysis and Statistics 3.1 Visual Analysis of Data 3.2.1 Basic Statistics Examples 3.2.2 Basic Statistical Theory 3.3 Normal Distributions 3.4 Bivariate Data 3.1 Visual Analysis of Data Visual

More information

Introduction to Hierarchical Linear Model. Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017

Introduction to Hierarchical Linear Model. Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017 Introduction to Hierarchical Linear Model Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017 1 Outline What is Hierarchical Linear Model? Why do nested data create analytic problems? Graphic presentation

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 13: The bootstrap (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 30 Resampling 2 / 30 Sampling distribution of a statistic For this lecture: There is a population model

More information

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 23 CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 3.1 DESIGN OF EXPERIMENTS Design of experiments is a systematic approach for investigation of a system or process. A series

More information

Assignment 2. with (a) (10 pts) naive Gauss elimination, (b) (10 pts) Gauss with partial pivoting

Assignment 2. with (a) (10 pts) naive Gauss elimination, (b) (10 pts) Gauss with partial pivoting Assignment (Be sure to observe the rules about handing in homework). Solve: with (a) ( pts) naive Gauss elimination, (b) ( pts) Gauss with partial pivoting *You need to show all of the steps manually.

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 2015 MODULE 4 : Modelling experimental data Time allowed: Three hours Candidates should answer FIVE questions. All questions carry equal

More information

Poisson Regression and Model Checking

Poisson Regression and Model Checking Poisson Regression and Model Checking Readings GH Chapter 6-8 September 27, 2017 HIV & Risk Behaviour Study The variables couples and women_alone code the intervention: control - no counselling (both 0)

More information

For our example, we will look at the following factors and factor levels.

For our example, we will look at the following factors and factor levels. In order to review the calculations that are used to generate the Analysis of Variance, we will use the statapult example. By adjusting various settings on the statapult, you are able to throw the ball

More information

SAS/STAT 13.1 User s Guide. The NESTED Procedure

SAS/STAT 13.1 User s Guide. The NESTED Procedure SAS/STAT 13.1 User s Guide The NESTED Procedure This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS Institute

More information

Doing Math: Scientific Notation Scientific notation is a way to show (and do math with) very small or very large numbers b 10 c

Doing Math: Scientific Notation Scientific notation is a way to show (and do math with) very small or very large numbers b 10 c Doing Math: Scientific Notation Scientific notation is a way to show (and do math with) very small or very large numbers b 10 c b = a decimal number between 1.0 and 9.99... c =anexponentof10 Example: 1.53

More information

Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding

Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding In the previous lecture we learned how to incorporate a categorical research factor into a MLR model by using

More information

Section 1.5. Finding Linear Equations

Section 1.5. Finding Linear Equations Section 1.5 Finding Linear Equations Using Slope and a Point to Find an Equation of a Line Example Find an equation of a line that has slope m = 3 and contains the point (2, 5). Solution Substitute m =

More information

Descriptive Statistics, Standard Deviation and Standard Error

Descriptive Statistics, Standard Deviation and Standard Error AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.

More information

Applied Statistics and Econometrics Lecture 6

Applied Statistics and Econometrics Lecture 6 Applied Statistics and Econometrics Lecture 6 Giuseppe Ragusa Luiss University gragusa@luiss.it http://gragusa.org/ March 6, 2017 Luiss University Empirical application. Data Italian Labour Force Survey,

More information

Simulating Multivariate Normal Data

Simulating Multivariate Normal Data Simulating Multivariate Normal Data You have a population correlation matrix and wish to simulate a set of data randomly sampled from a population with that structure. I shall present here code and examples

More information

Compare Linear Regression Lines for the HP-41C

Compare Linear Regression Lines for the HP-41C Compare Linear Regression Lines for the HP-41C by Namir Shammas This article presents an HP-41C program that calculates the linear regression statistics for two data sets and then compares their slopes

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Spline Models. Introduction to CS and NCS. Regression splines. Smoothing splines

Spline Models. Introduction to CS and NCS. Regression splines. Smoothing splines Spline Models Introduction to CS and NCS Regression splines Smoothing splines 3 Cubic Splines a knots: a< 1 < 2 < < m

More information