5.5 Regression Estimation

Size: px

Start display at page:

Download "5.5 Regression Estimation"

Rachel Preston
6 years ago
Views:

1 5.5 Regression Estimation Assume a SRS of n pairs (x, y ),..., (x n, y n ) is selected from a population of N pairs of (x, y) data. The goal of regression estimation is to take advantage of a linear relationship between x and y to improve estimation of the t y or y U. Unlike ratio estimation, there is no assumption of a zero intercept in the linear relationship or a positive slope. We assume a linear form for the relationship between y and x: y i = B 0 + B x i + ɛ i (53) for intercept B 0, slope B, and ɛ i is the deviation between y i and B 0 + B x i. We assume that (i) the mean of the ɛ i s is zero and (ii) are uncorrelated with the x i s. This implies that there is no systematic relationship between the e i s and x i s. For example, we do not want the variance to increase (or decrease) with the mean Estimating y U and t y To estimate y U, we first must get estimates B 0 and B of the true intercept B 0 and slope B. We use the least squares estimates n B 0 = y B i= x B = (x i x)(y i y) n i= (x = n n i= x iy i ( n i= x i)( n i= y i) i x) 2 n n i= x2 i ( n i= x i) 2 When x U is known, our estimate ŷ reg of the population mean is: ŷ reg = B 0 + B x U (54) When x U is unknown, replace x U with x in (54). Then the estimate is ŷ reg = y. The estimated variance of ŷ reg in (54) is V (ŷ reg ) = N n N n(n 2) n i= (55) If N is unknown, but N is large relative to n (or, n/n is small), then the f.p.c. (N n)/n. Some researchers will replace (N n)/n with in the variance formula in (55): n V (ŷ reg ) (y i B 0 B x i ) 2 (56) i= An alternative formula for calculation: n (y i B 0 B x i ) 2 = i= ( n n ) yi 2 ny 2 B 2 x 2 i nx 2 i= i= (57) which can be substituted into (55) to get the estimated variance V (ŷ reg ) [ n ( n )] N n V (ŷ reg ) = yi 2 ny 2 Nn(n 2) B 2 x 2 i nx 2 i= i= (58) 9

2 An approximate 00( α) confidence interval for y U is ŷ reg ± t V (ŷ reg ) where t is the upper α/2 critical value from a t-distribution having n 2 degrees of freedom. By multiplying ŷ reg in (54) by N, an estimator t reg of the population total t y is: t reg = Nŷ reg = N( B 0 + B x U ) = N( y B x + B x U ) (59) = Ny + B (Nx U Nx) = Ny + B (t x Nx) Multiplying V (ŷ reg ) in (55) by N 2 provides the estimated variance of t reg : V ( t reg ) = N(N n) n(n 2) n (y i B 0 B x i ) 2. (60) i= An approximate 00( α) confidence interval for t y is t reg ± t V ( t reg ) where t is the upper α/2 critical value from a t-distribution having n 2 degrees of freedom. Note that n (y i B 0 B x i ) 2 is the sum of squared residuals from a simple linear i= regression. That is, n (y i B 0 B x i ) 2 = SSE for the least squares regression line. i= Therefore, we could fit a regression model using a statistics package, and substitute the value of SSE or MSE into (55) and get V (ŷ reg ) = V ( t reg ) = (6) Example: The Florida Game and Freshwater Fish Commission is interested in estimating the weights of alligators using length measurements which are easier to observe. The population size N is unknown but is large enough so that ignoring the finite population correction will have a negligible effect on estimation. That is, N n (see Equation (56)). A random sample of 22 N alligators yielded the following weight (in pounds) and length (in inches) data: Alligator Length Weight Alligator Length Weight

3 x = x i = 735 x 2 i = y = yi = 502 y 2 i = 9762 xi y i = Summary values for n = 25 x = Summary xi = 224 values for x 2 n i = x = y = 08.2 xi = yi 735 = 2705 x 2 i = y 2 i = xi y i = y = yi = 502 y 2 i = 9762 xi y i = Because it is much easier to collect data on alligator length, there is a lot of available data Because on length. it is much Assume easier to that collect the available data on alligator data indicates length, there that is thea mean lot of available alligator length xdata U on 90length. inches. Assume Estimate thathe themean available alligator data indicates weight y U that using the regression mean alligator estimator length ŷreg. x U 90 inches. Estimate the mean alligator weight y U using regression estimator ŷ reg. Regression output from MINITAB for the regression of alligator weight vs length (n = 22). Regression output from MINITAB for the regression of The alligator regression weight equation vs length is (n = 22). weight = length The regression equation is weight = length S =.6352 R-Sq = 84.3 % R-Sq(adj) = 83.5 % S =.6352 R-Sq = 84.3 % R-Sq(adj) = 83.5 % Analysis of Variance Analysis of Variance Source DF SS MS F P Regression Source DF SS MS F P0.000 Error Regression Total Error Total

4 Suppose the sample size was actually n = 25 and included the following 3 measurements. Estimate the mean alligator weight y U using the regression estimator ŷ reg. Suppose the sample size was actually n = 25 and included the following 3 measurements. Alligator Length Weight Estimate the mean alligator weight y 23 U using the regression estimator ŷreg Alligator 25 Length 4 Weight Summary 24 values 28 for n 366 = 25 x = xi = 224 x 2 i = y = 08.2 yi = 2705 y 2 97 i = xi y i = Do you have any concerns about using a regression estimator for this data? If so, how can Do you have any concerns about using a regression estimator for this data? If so, how can we adjust our analysis to account for it? we adjust our analysis to account for it? Regression output from MINITAB for the regression of Regression output from MINITAB for the regression of alligator weight vs length (n = 25). alligator weight vs length (n 25). The The regression regression equation equation is is weight weight = = length length S S = = R-Sq R-Sq = = % % R-Sq(adj) R-Sq(adj) = = % % Analysis Analysis of of Variance Variance Source Source DF DF SS SS MS MS F F P P Regression Regression Error Error Total Total

5 5.5.2 Extension to Multiple Regression We can generalize the simple linear regression model to a multiple linear regression model with Case : k different regression variables x, x 2,..., x k with model y = B 0 + (62) Case 2: A k th -order polynomial in regression variable x with model y = B 0 + B x + B 2 x B k x k + ɛ = (63) For Case :. Find the least-squares estimates ( B 0, B,..., B k ). This will produce the prediction model: ŷ = B 0 + (64) 2. To estimate y U, replace each variable with its mean: ŷ reg = B 0 + (65) where x Ui is the mean of x i. For Case 2:. Find the least-squares estimates ( B 0, B,..., B k ). This will produce the prediction model: 2. To estimate y U, replace x with its mean: ŷ = B 0 + B x + B 2 x B k x k (66) ŷ reg = B 0 + B x U + B 2 x 2 U + + B k x k U (67) For Case and for Case 2, we use the SSE from the regression output to calculate the estimated variance of ŷ reg : V (ŷ reg ) = N n SSE = (68) Nn(n k ) where MSE = SSE/(n k ) is the mean squared error from the regression having n k degrees of freedom for the Error term. 23

6 Example: Example: Use Use the the data data from from the the Florida Florida Game Game and and Freshwater Freshwater Fish Fish Commission Commission example example Fit Fitthe thequadratic regression model model y y = B 0 x 2 x B x + B 2 x 2 + ɛ. ɛ. Estimate the themean alligator weight yy UU using the themultiple regression estimator ŷreg. ŷ. Find aa95% confidence interval for fory y U U.. Regression output from MINITAB for the quadratic regression of of alligator weight vs length (n = 25). The Theregression equation is is weight = = length length**2 S S = = R-Sq = = 98.7 %% R-Sq(adj) = = 98.6 %% Analysis of ofvariance Source DF DF SS SS MS MS FF PP Regression Error Total Source DF DF Seq SeqSS SS FF PP Linear Quadratic

7 5.6 Regression Analyses using R and SAS 5.6. Example with unknown N R code for Regression Estimation (N unknown but large) CASE : We will fit the model gatorwgt = B 0 + B gatorlen using the original 22 pairs of alligator length (x) and weight (y) data. We will also estimate the mean weight y U using regression estimation (assuming x U = 90 pounds). The df = n 2 = 20 because the model has 2 parameters. Unfortunately, the survreg function in R uses Nn(n ) in the estimated variance formula (see equation (54) instead of Nn(n 2) for linear regression. If the regression model has p parameters, we want Nn(n p) in the variance formula. Thus, if we multiply the standard error given in R by (n )/(n p) = (n )/df where df = n p the regression degrees of freedom we get the same analysis as SAS. R code for Regression Estimation: Case library(survey) # Regression estimation x <- c(94,74,82,58,86,94,63,86,69,72,85,86,88,72,74,6,90,89,68,76,78,90) y <- c(30,5,80,28,80,0,33,90,36,38,84,83,70,6,54,44,06,84,39,42,57,02) ci.level =.95 # confidence level <--- p = 2 # number of parameters in the model <--- n = length(x) df = n - p corr.fctr = sqrt((n-)/df) regdata <- data.frame(x,y) <--- regdsgn <- svydesign(id=~,data=regdata) regdsgn svyreg <- svyglm(y~x,design=regdsgn) <-- enter regression model svyreg atmean <- c(,90) <-- enter x mean = 90 # Assign names to each term in the model names(atmean) <- c("(intercept)","x") atmean <- as.data.frame(t(atmean)) <-- two model terms meanwgt <- predict(svyreg,newdata=atmean,total=) meanwgt # Enter standard error se.meanwgt = <-- value output by meanwgt # Compute confidence interval for mean with correction factor meanwgt + c(-,)*corr.fctr*qt(ci.level+(-ci.level)/2,df)*se.meanwgt 25

8 R output for Regression Estimation: Case Independent Sampling design (with replacement) svydesign(id = ~, data = regdata) Coefficients: (Intercept) x Degrees of Freedom: 2 Total (i.e. Null); Null Deviance: 7220 Residual Deviance: 2708 AIC: Residual > meanwgt link SE > > # Enter standard error > se.meanwgt = > > # Compute confidence interval for mean with correction factor [] CASE 2: We will fit the model gatorwgt = B 0 + B gatorlen using the 25 pairs of alligator length (x) and weight (y) data. We will again estimate the mean weight y U using regression estimation (assuming x U = 90 pounds). R code for Regression Estimation: Case 2 library(survey) # Regression estimation x <- c(94,74,82,58,86,94,63,86,69,72,85,86,88,72,74,6,90,89,68,76,78,90, 47,28,4) y <- c(30,5,80,28,80,0,33,90,36,38,84,83,70,6,54,44,06,84,39,42,57,02, 640,366,97) ci.level =.95 # confidence level <-- p = 2 # number of parameters in the model <-- n = length(x) df = n - p corr.fctr = sqrt((n-)/df) regdata <- data.frame(x,y) <-- regdsgn <- svydesign(id=~,data=regdata) regdsgn svyreg <- svyglm(y~x,design=regdsgn) <-- enter regression model svyreg atmean <- c(,90) <-- enter x mean = 90 26

9 # Assign names to each term in the model names(atmean) <- c("(intercept)","x") atmean <- as.data.frame(t(atmean)) <-- two model terms meanwgt <- predict(svyreg,newdata=atmean,total=) meanwgt # Enter standard error se.meanwgt = <-- value output by meanwgt # Compute confidence interval for mean with correction factor meanwgt + c(-,)*corr.fctr*qt(ci.level+(-ci.level)/2,df)*se.meanwgt R output for Regression Estimation: Case 2 Independent Sampling design (with replacement) Coefficients: (Intercept) x Degrees of Freedom: 24 Total (i.e. Null); Null Deviance: Residual Deviance: 6700 AIC: Residual > meanwgt link SE > # Enter standard error > se.meanwgt = > # Compute confidence interval for mean with correction factor [] CASE 3: We will fit the model gatorwgt = B 0 + B gatorlen + B 2 gatorlen 2 using the original 22 pairs of alligator length (x) and weight (y) data. We will estimate the mean weight y U using quadratic model regression estimation (assuming x U = 90 pounds). The df = n 3 = 9 because the model has 3 parameters. R code for regression estimation: Case 3 library(survey) # Regression estimation x <- c(94,74,82,58,86,94,63,86,69,72,85,86,88,72,74,6,90,89,68,76,78,90) y <- c(30,5,80,28,80,0,33,90,36,38,84,83,70,6,54,44,06,84,39,42,57,02) 27

10 xsq <- x^2 xsq <-- create x-squared values for quadratic model ci.level =.95 # confidence level <-- p = 3 # number of parameters in the model <-- n = length(x) df = n - p corr.fctr = sqrt((n-)/df) regdata <- data.frame(x,xsq,y) <-- regdsgn <- svydesign(id=~,data=regdata) regdsgn svyreg <- svyglm(y~x+xsq,design=regdsgn) <-- svyreg atmean <- c(,90,800) <-- # Assign names to each term in the model names(atmean) <- c("(intercept)","x","xsq") <-- atmean <- as.data.frame(t(atmean)) meanwgt <- predict(svyreg,newdata=atmean,total=) meanwgt # Enter standard error se.meanwgt = <-- # Compute confidence interval for mean with correction factor meanwgt + c(-,)*corr.fctr*qt(ci.level+(-ci.level)/2,df)*se.meanwgt R output for regression estimation: Case 3 Independent Sampling design (with replacement) Coefficients: (Intercept) x xsq Degrees of Freedom: 2 Total (i.e. Null); Null Deviance: 7220 Residual Deviance: 628 AIC: Residual > meanwgt link SE > # Enter standard error > se.meanwgt = > # Compute confidence interval for mean with correction factor []

11 CASE 4: We will fit the model gatorwgt = B 0 + B gatorlen + B 2 gatorlen 2 using the 25 pairs of alligator length (x) and weight (y) data. We will again estimate the mean weight y U using quadratic model regression estimation (assuming x U = 90 pounds). R code for regression estimation: Case 4 library(survey) # Regression estimation x <- c(94,74,82,58,86,94,63,86,69,72,85,86,88,72,74,6,90,89,68,76,78,90, 47,28,4) y <- c(30,5,80,28,80,0,33,90,36,38,84,83,70,6,54,44,06,84,39,42,57,02, 640,366,97) xsq <- x^2 xsq <-- create x-squared values for quadratic model ci.level =.95 # confidence level <-- p = 3 # number of parameters in the model <-- n = length(x) df = n - p corr.fctr = sqrt((n-)/df) regdata <- data.frame(x,xsq,y) <-- regdsgn <- svydesign(id=~,data=regdata) regdsgn svyreg <- svyglm(y~x+xsq,design=regdsgn) <-- svyreg atmean <- c(,90,800) <-- # Assign names to each term in the model names(atmean) <- c("(intercept)","x","xsq") <-- atmean <- as.data.frame(t(atmean)) meanwgt <- predict(svyreg,newdata=atmean,total=) meanwgt # Enter standard error se.meanwgt = <-- # Compute confidence interval for mean with correction factor meanwgt + c(-,)*corr.fctr*qt(ci.level+(-ci.level)/2,df)*se.meanwgt 29

12 R output for regression estimation: Case 4 Independent Sampling design (with replacement) svydesign(id = ~, data = regdata) Coefficients: (Intercept) x xsq Degrees of Freedom: 24 Total (i.e. Null); Null Deviance: Residual Deviance: 5273 AIC: Residual > meanwgt link SE > # Enter standard error > se.meanwgt = > # Compute confidence interval for mean with correction factor [] SAS code for Regression Estimation (N unknown but large) We will reproduce the R analyses for Case (linear regression with n = 22 points) and Case 4 (quadratic regression with n = 25 points). For Case 2 and Case 4, you just have to change the data set. CASE : We will fit the model gatorwgt = B 0 + B gatorlen using the original 22 pairs of alligator length (x) and weight (y) data. We will also estimate the mean weight y U using regression estimation (assuming x U = 90 pounds). The df = n 2 = 20 because the model has 2 parameters. data in; input gatorid gatorlen cards; ; /* Use proc surveyreg to estimate the average alligator weight */ proc surveyreg data=in ; model gatorwgt = gatorlen / clparm solution df=20; /* To estimate a mean substitute for intercept, mean(x) for gatorwgt */ estimate Average gator weight intercept gatorlen 90; run; 30

13 SAS output for Case The SURVEYREG Procedure Regression Analysis for Dependent Variable gatorwgt Data Summary Number of Observations 22 Mean of gatorwgt Sum of gatorwgt Fit Statistics R-square Root MSE.6352 Denominator DF 20 Tests of Model Effects Effect Num DF F Value Pr > F Model <.000 Intercept <.000 gatorlen <.000 NOTE: The denominator degrees of freedom for the F tests is 20. Estimated Regression Coefficients Standard 95% Confidence Parameter Estimate Error t Value Pr > t Interval Intercept < gatorlen < Analysis of Estimable Functions Standard Parameter Estimate Error t Value Pr > t Average gator weight <.000 Analysis of Estimable Functions Parameter 95% Confidence Interval Average gator weight NOTE: The denominator degrees of freedom for the t tests is 20. 3

14 CASE 4: We will fit the model gatorwgt = B 0 + B gatorlen + B 2 gatorlen 2 using the 25 pairs of alligator length (x) and weight (y) data. We will again estimate the mean weight y U using quadratic model regression estimation (assuming x U = 90 pounds). This is the data for Case 4. Change df=9 to df=22 in the SAS code. data in; input gatorid gatorlen gatorsq = gatorlen**2; cards; ; SAS output for Case 4 The SURVEYREG Procedure Regression Analysis for Dependent Variable gatorwgt Data Summary Number of Observations 25 Mean of gatorwgt Sum of gatorwgt Fit Statistics R-square Root MSE Denominator DF 22 Tests of Model Effects Effect Num DF F Value Pr > F Model <.000 Intercept <.000 gatorlen 8.42 <.000 gatorsq <.000 NOTE: The denominator degrees of freedom for the F tests is 22. Estimated Regression Coefficients Standard 95% Confidence Parameter Estimate Error t Value Pr > t Interval Intercept < gatorlen < gatorsq < NOTE: The denominator degrees of freedom for the t tests is 22. Analysis of Estimable Functions Standard Parameter Average gator weight Estimate Error t Value 8.05 Pr > t <.000 Analysis of Estimable Functions 95% Confidence Parameter Interval Average gator weight NOTE: The denominator degrees of freedom for the t tests is

15 5.6.2 Example with known N and t x The data is from Lohr Example 4.9 (page 39-4): To estimate the number of dead trees in a study area, we divide the study area into 00 square plots and count the number of dead trees in a photograph of each plot. Photo counts can be made quickly, but sometimes a tree is misclassified or not detected. A SRS of 25 of the plots for field counts of dead trees (ground truthing) is taken. The population mean number of dead trees per plot from the photo counts is.3. The data is given in the R code below. Estimate the actual mean number of dead trees per plot and the total number of dead trees in the study area. R code for regression estimation t x and N known library(survey) # Regression estimation with known tx and N (Lohr Example 4.9) x <- c(0,2,7,3,3,6,7,6,5,0,4,2,0,5,2,0,0,9,6,,7,9,,0,0) y <- c(5,4,9,4,8,5,8,5,3,5,,5,2,8,3,9,,2,9,2,3,,0,9,8) ci.level =.95 # confidence level <--- p = 2 # number of parameters in the model <--- n = length(x) df = n - p corr.fctr = sqrt((n-)/df) fpc <- c(rep(00,n)) regdata <- data.frame(x,y) regdsgn <- svydesign(id=~,fpc=~fpc,data=regdata) regdsgn svyreg <- svyglm(y~x,design=regdsgn) svyreg atmean <- c(,.3) < = N <--- include fpc <-- evaluate at x-mean # Assign names to each term in the model names(atmean) <- c("(intercept)","x") atmean <- as.data.frame(t(atmean)) meanwgt <- predict(svyreg,newdata=atmean,total=) meanwgt # Enter standard error se.meanwgt =.48 <-- input s.e. from R # Compute confidence interval for mean with correction factor meanwgt + c(-,)*corr.fctr*qt(ci.level+(-ci.level)/2,df)*se.meanwgt # Compute confidence interval for a total with correction factor ttl = 00 # ttl is the total number of units sumwgt <- svycontrast(meanwgt,ttl) sumwgt + c(-,)*corr.fctr*qt(ci.level+(-ci.level)/2,df)*se.meanwgt*ttl 33

16 R output for regression estimation t x and N known Coefficients: (Intercept) x Degrees of Freedom: 24 Total (i.e. Null); Null Deviance: 28.2 Residual Deviance: 33.2 AIC: Residual > meanwgt link SE > # Enter standard error > se.meanwgt =.48 > # Compute confidence interval for mean with correction factor [] > # Compute confidence interval for a total with correction factor [] SAS code for regression estimation t x and N known (supplemental) DATA trees; INPUT photo N = 00; DATALINES; ; /* Use proc surveyreg to estimate the total number of trees */ PROC SURVEYREG DATA=trees TOTAL=00; MODEL field = photo / clparm solution df=23; ESTIMATE Total field trees intercept 00 photo 30; /* substitute N for intercept, t_x for photo */ ESTIMATE Mean field trees intercept 00 photo 30 / divisor=00; run; 34

17 SAS output for regression estimation t x and N known The SURVEYREG Procedure Regression Analysis for Dependent Variable field Data Summary Number of Observations 25 Sum of Weights Weighted Mean of field Weighted Sum of field 56.0 Fit Statistics R-square Root MSE Denominator DF 23 Tests of Model Effects Effect Num DF F Value Pr > F Model <.000 Intercept photo <.000 NOTE: The denominator degrees of freedom for the F tests is 23. Estimated Regression Coefficients Standard 95% Confidence Parameter Estimate Error t Value Pr > t Interval Intercept photo < NOTE: The denominator degrees of freedom for the t tests is 23. Analysis of Estimable Functions Standard Parameter Estimate Error t Value Pr > t Total field trees <.000 Mean field trees <.000 Regression Analysis for Dependent Variable field Analysis of Estimable Functions Parameter 95% Confidence Interval Total field trees Mean field trees NOTE: The denominator degrees of freedom for the t tests is

18 5.7 ŷ U vs ŷ reg or t vs t reg Which is better? SRS or Regression Estimation? When does regression estimation provide better estimates of y U or t y than the simple random sample (SRS) estimator? Recall that for least squares regression we have SS(total) = SS(regression)+ SSE. Then If x and y are strongly correlated, most of the variability in the y values can be explained by the regression and there will be very little random variability about the regression line. That is, M S(regression) is large relative to M SE. If x and y are weakly correlated, very little of the variability in the y values can be explained by the regression and most of the variability is random variability about the regression line. That is, M S(regresssion) is small relative to M SE. Thus, for a moderate to strong linear relationship between x and y, regression estimation is recommended over SRS estimation. 5.8 Sample Size Estimation for Ratio and Regression Estimation To determine the sample size formulas for ratio and regression estimation, we will use the same approach that was used for determining a sample size for simple random sampling:. Specify a maximum allowable difference d for the parameter we want to estimate. This is equivalent to stating the largest margin of error the researcher would want for a confidence interval. 2. Specify α (where 00( α)% is the confidence level for the confidence interval). 3. Specify a prior estimate of a variance V ( θ) where θ is the estimator of parameter θ. θ could be y U or t y or population ratio R = y U /x U (in ratio estimation). 4. Set the margin of error formula equal to d, and solve for n. Sample Size Determination for Ratio Estimation: From equations (50), (47), (48), and (49), the margin of error formulas for parameters B, y U, and t y with estimated variance s 2 e are For R : For y U : For t y : (N ) n s 2 z α/2 V (r) = z e α/2 Nx 2 U n (N ) n s 2 z α/2 V ( µ r ) = z e α/2 N n z α/2 V ( t r ) = z α/2 N(N n) s2 e n 36

19 After setting the margin of error = d, and solving for n yields For B : n = n 0 + N where n 0 = where d is the maximum allowable difference for estimating the ratio B. It is assumed that you know x U. If you do not know x U, you must provide an estimate x U. For y U : n = For t y : n = n 0 + N where n 0 = estimating the mean y U. n 0 + N where n 0 = for estimating the total t y. where d is the maximum allowable difference for where d is the maximum allowable difference s 2 e is an estimate of the variability of the (x, y) points about a line having a zero intercept. You can use s 2 e from a prior study, a pilot study, or double sampling. Example of sample size determination: Using the information from the pulpwood and dry wood example, estimate the sample size required so that t y is within 200 kg of t y with a probability of.95. Assume the new truckloads contain 000 bundles of wood. Sample Size Estimation for Regression Estimation: From equation (6), (47) and the confidence interval formulas for y U and t y, the margin of error formulas for parameters y U, and t y with estimated variance MSE are For y U : For t y : z α/2 V (ŷ reg ) = z α/2 N n Nn MSE z α/2 N(N n) V ( τ reg ) = z α/2 MSE n After setting the margin of error = d, and solving for n yields For y U : n = For t y : n = n 0 + N where n 0 = for estimating the mean y U. n 0 + N where n 0 = for estimating the total t y. where d is the maximum allowable difference where d is the maximum allowable difference The MSE for the regression is an estimate of the variability of the (x, y) points about the regression line. You can use MSE from a prior study, a pilot study, or double sampling. 37

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. ST512 Fall Quarter, 2005 Exam 1 Name: Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. 1. (42 points) A random sample of n = 30 NBA basketball