5.5 Regression Estimation
|
|
- Rachel Preston
- 6 years ago
- Views:
Transcription
1 5.5 Regression Estimation Assume a SRS of n pairs (x, y ),..., (x n, y n ) is selected from a population of N pairs of (x, y) data. The goal of regression estimation is to take advantage of a linear relationship between x and y to improve estimation of the t y or y U. Unlike ratio estimation, there is no assumption of a zero intercept in the linear relationship or a positive slope. We assume a linear form for the relationship between y and x: y i = B 0 + B x i + ɛ i (53) for intercept B 0, slope B, and ɛ i is the deviation between y i and B 0 + B x i. We assume that (i) the mean of the ɛ i s is zero and (ii) are uncorrelated with the x i s. This implies that there is no systematic relationship between the e i s and x i s. For example, we do not want the variance to increase (or decrease) with the mean Estimating y U and t y To estimate y U, we first must get estimates B 0 and B of the true intercept B 0 and slope B. We use the least squares estimates n B 0 = y B i= x B = (x i x)(y i y) n i= (x = n n i= x iy i ( n i= x i)( n i= y i) i x) 2 n n i= x2 i ( n i= x i) 2 When x U is known, our estimate ŷ reg of the population mean is: ŷ reg = B 0 + B x U (54) When x U is unknown, replace x U with x in (54). Then the estimate is ŷ reg = y. The estimated variance of ŷ reg in (54) is V (ŷ reg ) = N n N n(n 2) n i= (55) If N is unknown, but N is large relative to n (or, n/n is small), then the f.p.c. (N n)/n. Some researchers will replace (N n)/n with in the variance formula in (55): n V (ŷ reg ) (y i B 0 B x i ) 2 (56) i= An alternative formula for calculation: n (y i B 0 B x i ) 2 = i= ( n n ) yi 2 ny 2 B 2 x 2 i nx 2 i= i= (57) which can be substituted into (55) to get the estimated variance V (ŷ reg ) [ n ( n )] N n V (ŷ reg ) = yi 2 ny 2 Nn(n 2) B 2 x 2 i nx 2 i= i= (58) 9
2 An approximate 00( α) confidence interval for y U is ŷ reg ± t V (ŷ reg ) where t is the upper α/2 critical value from a t-distribution having n 2 degrees of freedom. By multiplying ŷ reg in (54) by N, an estimator t reg of the population total t y is: t reg = Nŷ reg = N( B 0 + B x U ) = N( y B x + B x U ) (59) = Ny + B (Nx U Nx) = Ny + B (t x Nx) Multiplying V (ŷ reg ) in (55) by N 2 provides the estimated variance of t reg : V ( t reg ) = N(N n) n(n 2) n (y i B 0 B x i ) 2. (60) i= An approximate 00( α) confidence interval for t y is t reg ± t V ( t reg ) where t is the upper α/2 critical value from a t-distribution having n 2 degrees of freedom. Note that n (y i B 0 B x i ) 2 is the sum of squared residuals from a simple linear i= regression. That is, n (y i B 0 B x i ) 2 = SSE for the least squares regression line. i= Therefore, we could fit a regression model using a statistics package, and substitute the value of SSE or MSE into (55) and get V (ŷ reg ) = V ( t reg ) = (6) Example: The Florida Game and Freshwater Fish Commission is interested in estimating the weights of alligators using length measurements which are easier to observe. The population size N is unknown but is large enough so that ignoring the finite population correction will have a negligible effect on estimation. That is, N n (see Equation (56)). A random sample of 22 N alligators yielded the following weight (in pounds) and length (in inches) data: Alligator Length Weight Alligator Length Weight
3 x = x i = 735 x 2 i = y = yi = 502 y 2 i = 9762 xi y i = Summary values for n = 25 x = Summary xi = 224 values for x 2 n i = x = y = 08.2 xi = yi 735 = 2705 x 2 i = y 2 i = xi y i = y = yi = 502 y 2 i = 9762 xi y i = Because it is much easier to collect data on alligator length, there is a lot of available data Because on length. it is much Assume easier to that collect the available data on alligator data indicates length, there that is thea mean lot of available alligator length xdata U on 90length. inches. Assume Estimate thathe themean available alligator data indicates weight y U that using the regression mean alligator estimator length ŷreg. x U 90 inches. Estimate the mean alligator weight y U using regression estimator ŷ reg. Regression output from MINITAB for the regression of alligator weight vs length (n = 22). Regression output from MINITAB for the regression of The alligator regression weight equation vs length is (n = 22). weight = length The regression equation is weight = length S =.6352 R-Sq = 84.3 % R-Sq(adj) = 83.5 % S =.6352 R-Sq = 84.3 % R-Sq(adj) = 83.5 % Analysis of Variance Analysis of Variance Source DF SS MS F P Regression Source DF SS MS F P0.000 Error Regression Total Error Total
4 Suppose the sample size was actually n = 25 and included the following 3 measurements. Estimate the mean alligator weight y U using the regression estimator ŷ reg. Suppose the sample size was actually n = 25 and included the following 3 measurements. Alligator Length Weight Estimate the mean alligator weight y 23 U using the regression estimator ŷreg Alligator 25 Length 4 Weight Summary 24 values 28 for n 366 = 25 x = xi = 224 x 2 i = y = 08.2 yi = 2705 y 2 97 i = xi y i = Do you have any concerns about using a regression estimator for this data? If so, how can Do you have any concerns about using a regression estimator for this data? If so, how can we adjust our analysis to account for it? we adjust our analysis to account for it? Regression output from MINITAB for the regression of Regression output from MINITAB for the regression of alligator weight vs length (n = 25). alligator weight vs length (n 25). The The regression regression equation equation is is weight weight = = length length S S = = R-Sq R-Sq = = % % R-Sq(adj) R-Sq(adj) = = % % Analysis Analysis of of Variance Variance Source Source DF DF SS SS MS MS F F P P Regression Regression Error Error Total Total
5 5.5.2 Extension to Multiple Regression We can generalize the simple linear regression model to a multiple linear regression model with Case : k different regression variables x, x 2,..., x k with model y = B 0 + (62) Case 2: A k th -order polynomial in regression variable x with model y = B 0 + B x + B 2 x B k x k + ɛ = (63) For Case :. Find the least-squares estimates ( B 0, B,..., B k ). This will produce the prediction model: ŷ = B 0 + (64) 2. To estimate y U, replace each variable with its mean: ŷ reg = B 0 + (65) where x Ui is the mean of x i. For Case 2:. Find the least-squares estimates ( B 0, B,..., B k ). This will produce the prediction model: 2. To estimate y U, replace x with its mean: ŷ = B 0 + B x + B 2 x B k x k (66) ŷ reg = B 0 + B x U + B 2 x 2 U + + B k x k U (67) For Case and for Case 2, we use the SSE from the regression output to calculate the estimated variance of ŷ reg : V (ŷ reg ) = N n SSE = (68) Nn(n k ) where MSE = SSE/(n k ) is the mean squared error from the regression having n k degrees of freedom for the Error term. 23
6 Example: Example: Use Use the the data data from from the the Florida Florida Game Game and and Freshwater Freshwater Fish Fish Commission Commission example example Fit Fitthe thequadratic regression model model y y = B 0 x 2 x B x + B 2 x 2 + ɛ. ɛ. Estimate the themean alligator weight yy UU using the themultiple regression estimator ŷreg. ŷ. Find aa95% confidence interval for fory y U U.. Regression output from MINITAB for the quadratic regression of of alligator weight vs length (n = 25). The Theregression equation is is weight = = length length**2 S S = = R-Sq = = 98.7 %% R-Sq(adj) = = 98.6 %% Analysis of ofvariance Source DF DF SS SS MS MS FF PP Regression Error Total Source DF DF Seq SeqSS SS FF PP Linear Quadratic
7 5.6 Regression Analyses using R and SAS 5.6. Example with unknown N R code for Regression Estimation (N unknown but large) CASE : We will fit the model gatorwgt = B 0 + B gatorlen using the original 22 pairs of alligator length (x) and weight (y) data. We will also estimate the mean weight y U using regression estimation (assuming x U = 90 pounds). The df = n 2 = 20 because the model has 2 parameters. Unfortunately, the survreg function in R uses Nn(n ) in the estimated variance formula (see equation (54) instead of Nn(n 2) for linear regression. If the regression model has p parameters, we want Nn(n p) in the variance formula. Thus, if we multiply the standard error given in R by (n )/(n p) = (n )/df where df = n p the regression degrees of freedom we get the same analysis as SAS. R code for Regression Estimation: Case library(survey) # Regression estimation x <- c(94,74,82,58,86,94,63,86,69,72,85,86,88,72,74,6,90,89,68,76,78,90) y <- c(30,5,80,28,80,0,33,90,36,38,84,83,70,6,54,44,06,84,39,42,57,02) ci.level =.95 # confidence level <--- p = 2 # number of parameters in the model <--- n = length(x) df = n - p corr.fctr = sqrt((n-)/df) regdata <- data.frame(x,y) <--- regdsgn <- svydesign(id=~,data=regdata) regdsgn svyreg <- svyglm(y~x,design=regdsgn) <-- enter regression model svyreg atmean <- c(,90) <-- enter x mean = 90 # Assign names to each term in the model names(atmean) <- c("(intercept)","x") atmean <- as.data.frame(t(atmean)) <-- two model terms meanwgt <- predict(svyreg,newdata=atmean,total=) meanwgt # Enter standard error se.meanwgt = <-- value output by meanwgt # Compute confidence interval for mean with correction factor meanwgt + c(-,)*corr.fctr*qt(ci.level+(-ci.level)/2,df)*se.meanwgt 25
8 R output for Regression Estimation: Case Independent Sampling design (with replacement) svydesign(id = ~, data = regdata) Coefficients: (Intercept) x Degrees of Freedom: 2 Total (i.e. Null); Null Deviance: 7220 Residual Deviance: 2708 AIC: Residual > meanwgt link SE > > # Enter standard error > se.meanwgt = > > # Compute confidence interval for mean with correction factor [] CASE 2: We will fit the model gatorwgt = B 0 + B gatorlen using the 25 pairs of alligator length (x) and weight (y) data. We will again estimate the mean weight y U using regression estimation (assuming x U = 90 pounds). R code for Regression Estimation: Case 2 library(survey) # Regression estimation x <- c(94,74,82,58,86,94,63,86,69,72,85,86,88,72,74,6,90,89,68,76,78,90, 47,28,4) y <- c(30,5,80,28,80,0,33,90,36,38,84,83,70,6,54,44,06,84,39,42,57,02, 640,366,97) ci.level =.95 # confidence level <-- p = 2 # number of parameters in the model <-- n = length(x) df = n - p corr.fctr = sqrt((n-)/df) regdata <- data.frame(x,y) <-- regdsgn <- svydesign(id=~,data=regdata) regdsgn svyreg <- svyglm(y~x,design=regdsgn) <-- enter regression model svyreg atmean <- c(,90) <-- enter x mean = 90 26
9 # Assign names to each term in the model names(atmean) <- c("(intercept)","x") atmean <- as.data.frame(t(atmean)) <-- two model terms meanwgt <- predict(svyreg,newdata=atmean,total=) meanwgt # Enter standard error se.meanwgt = <-- value output by meanwgt # Compute confidence interval for mean with correction factor meanwgt + c(-,)*corr.fctr*qt(ci.level+(-ci.level)/2,df)*se.meanwgt R output for Regression Estimation: Case 2 Independent Sampling design (with replacement) Coefficients: (Intercept) x Degrees of Freedom: 24 Total (i.e. Null); Null Deviance: Residual Deviance: 6700 AIC: Residual > meanwgt link SE > # Enter standard error > se.meanwgt = > # Compute confidence interval for mean with correction factor [] CASE 3: We will fit the model gatorwgt = B 0 + B gatorlen + B 2 gatorlen 2 using the original 22 pairs of alligator length (x) and weight (y) data. We will estimate the mean weight y U using quadratic model regression estimation (assuming x U = 90 pounds). The df = n 3 = 9 because the model has 3 parameters. R code for regression estimation: Case 3 library(survey) # Regression estimation x <- c(94,74,82,58,86,94,63,86,69,72,85,86,88,72,74,6,90,89,68,76,78,90) y <- c(30,5,80,28,80,0,33,90,36,38,84,83,70,6,54,44,06,84,39,42,57,02) 27
10 xsq <- x^2 xsq <-- create x-squared values for quadratic model ci.level =.95 # confidence level <-- p = 3 # number of parameters in the model <-- n = length(x) df = n - p corr.fctr = sqrt((n-)/df) regdata <- data.frame(x,xsq,y) <-- regdsgn <- svydesign(id=~,data=regdata) regdsgn svyreg <- svyglm(y~x+xsq,design=regdsgn) <-- svyreg atmean <- c(,90,800) <-- # Assign names to each term in the model names(atmean) <- c("(intercept)","x","xsq") <-- atmean <- as.data.frame(t(atmean)) meanwgt <- predict(svyreg,newdata=atmean,total=) meanwgt # Enter standard error se.meanwgt = <-- # Compute confidence interval for mean with correction factor meanwgt + c(-,)*corr.fctr*qt(ci.level+(-ci.level)/2,df)*se.meanwgt R output for regression estimation: Case 3 Independent Sampling design (with replacement) Coefficients: (Intercept) x xsq Degrees of Freedom: 2 Total (i.e. Null); Null Deviance: 7220 Residual Deviance: 628 AIC: Residual > meanwgt link SE > # Enter standard error > se.meanwgt = > # Compute confidence interval for mean with correction factor []
11 CASE 4: We will fit the model gatorwgt = B 0 + B gatorlen + B 2 gatorlen 2 using the 25 pairs of alligator length (x) and weight (y) data. We will again estimate the mean weight y U using quadratic model regression estimation (assuming x U = 90 pounds). R code for regression estimation: Case 4 library(survey) # Regression estimation x <- c(94,74,82,58,86,94,63,86,69,72,85,86,88,72,74,6,90,89,68,76,78,90, 47,28,4) y <- c(30,5,80,28,80,0,33,90,36,38,84,83,70,6,54,44,06,84,39,42,57,02, 640,366,97) xsq <- x^2 xsq <-- create x-squared values for quadratic model ci.level =.95 # confidence level <-- p = 3 # number of parameters in the model <-- n = length(x) df = n - p corr.fctr = sqrt((n-)/df) regdata <- data.frame(x,xsq,y) <-- regdsgn <- svydesign(id=~,data=regdata) regdsgn svyreg <- svyglm(y~x+xsq,design=regdsgn) <-- svyreg atmean <- c(,90,800) <-- # Assign names to each term in the model names(atmean) <- c("(intercept)","x","xsq") <-- atmean <- as.data.frame(t(atmean)) meanwgt <- predict(svyreg,newdata=atmean,total=) meanwgt # Enter standard error se.meanwgt = <-- # Compute confidence interval for mean with correction factor meanwgt + c(-,)*corr.fctr*qt(ci.level+(-ci.level)/2,df)*se.meanwgt 29
12 R output for regression estimation: Case 4 Independent Sampling design (with replacement) svydesign(id = ~, data = regdata) Coefficients: (Intercept) x xsq Degrees of Freedom: 24 Total (i.e. Null); Null Deviance: Residual Deviance: 5273 AIC: Residual > meanwgt link SE > # Enter standard error > se.meanwgt = > # Compute confidence interval for mean with correction factor [] SAS code for Regression Estimation (N unknown but large) We will reproduce the R analyses for Case (linear regression with n = 22 points) and Case 4 (quadratic regression with n = 25 points). For Case 2 and Case 4, you just have to change the data set. CASE : We will fit the model gatorwgt = B 0 + B gatorlen using the original 22 pairs of alligator length (x) and weight (y) data. We will also estimate the mean weight y U using regression estimation (assuming x U = 90 pounds). The df = n 2 = 20 because the model has 2 parameters. data in; input gatorid gatorlen cards; ; /* Use proc surveyreg to estimate the average alligator weight */ proc surveyreg data=in ; model gatorwgt = gatorlen / clparm solution df=20; /* To estimate a mean substitute for intercept, mean(x) for gatorwgt */ estimate Average gator weight intercept gatorlen 90; run; 30
13 SAS output for Case The SURVEYREG Procedure Regression Analysis for Dependent Variable gatorwgt Data Summary Number of Observations 22 Mean of gatorwgt Sum of gatorwgt Fit Statistics R-square Root MSE.6352 Denominator DF 20 Tests of Model Effects Effect Num DF F Value Pr > F Model <.000 Intercept <.000 gatorlen <.000 NOTE: The denominator degrees of freedom for the F tests is 20. Estimated Regression Coefficients Standard 95% Confidence Parameter Estimate Error t Value Pr > t Interval Intercept < gatorlen < Analysis of Estimable Functions Standard Parameter Estimate Error t Value Pr > t Average gator weight <.000 Analysis of Estimable Functions Parameter 95% Confidence Interval Average gator weight NOTE: The denominator degrees of freedom for the t tests is 20. 3
14 CASE 4: We will fit the model gatorwgt = B 0 + B gatorlen + B 2 gatorlen 2 using the 25 pairs of alligator length (x) and weight (y) data. We will again estimate the mean weight y U using quadratic model regression estimation (assuming x U = 90 pounds). This is the data for Case 4. Change df=9 to df=22 in the SAS code. data in; input gatorid gatorlen gatorsq = gatorlen**2; cards; ; SAS output for Case 4 The SURVEYREG Procedure Regression Analysis for Dependent Variable gatorwgt Data Summary Number of Observations 25 Mean of gatorwgt Sum of gatorwgt Fit Statistics R-square Root MSE Denominator DF 22 Tests of Model Effects Effect Num DF F Value Pr > F Model <.000 Intercept <.000 gatorlen 8.42 <.000 gatorsq <.000 NOTE: The denominator degrees of freedom for the F tests is 22. Estimated Regression Coefficients Standard 95% Confidence Parameter Estimate Error t Value Pr > t Interval Intercept < gatorlen < gatorsq < NOTE: The denominator degrees of freedom for the t tests is 22. Analysis of Estimable Functions Standard Parameter Average gator weight Estimate Error t Value 8.05 Pr > t <.000 Analysis of Estimable Functions 95% Confidence Parameter Interval Average gator weight NOTE: The denominator degrees of freedom for the t tests is
15 5.6.2 Example with known N and t x The data is from Lohr Example 4.9 (page 39-4): To estimate the number of dead trees in a study area, we divide the study area into 00 square plots and count the number of dead trees in a photograph of each plot. Photo counts can be made quickly, but sometimes a tree is misclassified or not detected. A SRS of 25 of the plots for field counts of dead trees (ground truthing) is taken. The population mean number of dead trees per plot from the photo counts is.3. The data is given in the R code below. Estimate the actual mean number of dead trees per plot and the total number of dead trees in the study area. R code for regression estimation t x and N known library(survey) # Regression estimation with known tx and N (Lohr Example 4.9) x <- c(0,2,7,3,3,6,7,6,5,0,4,2,0,5,2,0,0,9,6,,7,9,,0,0) y <- c(5,4,9,4,8,5,8,5,3,5,,5,2,8,3,9,,2,9,2,3,,0,9,8) ci.level =.95 # confidence level <--- p = 2 # number of parameters in the model <--- n = length(x) df = n - p corr.fctr = sqrt((n-)/df) fpc <- c(rep(00,n)) regdata <- data.frame(x,y) regdsgn <- svydesign(id=~,fpc=~fpc,data=regdata) regdsgn svyreg <- svyglm(y~x,design=regdsgn) svyreg atmean <- c(,.3) < = N <--- include fpc <-- evaluate at x-mean # Assign names to each term in the model names(atmean) <- c("(intercept)","x") atmean <- as.data.frame(t(atmean)) meanwgt <- predict(svyreg,newdata=atmean,total=) meanwgt # Enter standard error se.meanwgt =.48 <-- input s.e. from R # Compute confidence interval for mean with correction factor meanwgt + c(-,)*corr.fctr*qt(ci.level+(-ci.level)/2,df)*se.meanwgt # Compute confidence interval for a total with correction factor ttl = 00 # ttl is the total number of units sumwgt <- svycontrast(meanwgt,ttl) sumwgt + c(-,)*corr.fctr*qt(ci.level+(-ci.level)/2,df)*se.meanwgt*ttl 33
16 R output for regression estimation t x and N known Coefficients: (Intercept) x Degrees of Freedom: 24 Total (i.e. Null); Null Deviance: 28.2 Residual Deviance: 33.2 AIC: Residual > meanwgt link SE > # Enter standard error > se.meanwgt =.48 > # Compute confidence interval for mean with correction factor [] > # Compute confidence interval for a total with correction factor [] SAS code for regression estimation t x and N known (supplemental) DATA trees; INPUT photo N = 00; DATALINES; ; /* Use proc surveyreg to estimate the total number of trees */ PROC SURVEYREG DATA=trees TOTAL=00; MODEL field = photo / clparm solution df=23; ESTIMATE Total field trees intercept 00 photo 30; /* substitute N for intercept, t_x for photo */ ESTIMATE Mean field trees intercept 00 photo 30 / divisor=00; run; 34
17 SAS output for regression estimation t x and N known The SURVEYREG Procedure Regression Analysis for Dependent Variable field Data Summary Number of Observations 25 Sum of Weights Weighted Mean of field Weighted Sum of field 56.0 Fit Statistics R-square Root MSE Denominator DF 23 Tests of Model Effects Effect Num DF F Value Pr > F Model <.000 Intercept photo <.000 NOTE: The denominator degrees of freedom for the F tests is 23. Estimated Regression Coefficients Standard 95% Confidence Parameter Estimate Error t Value Pr > t Interval Intercept photo < NOTE: The denominator degrees of freedom for the t tests is 23. Analysis of Estimable Functions Standard Parameter Estimate Error t Value Pr > t Total field trees <.000 Mean field trees <.000 Regression Analysis for Dependent Variable field Analysis of Estimable Functions Parameter 95% Confidence Interval Total field trees Mean field trees NOTE: The denominator degrees of freedom for the t tests is
18 5.7 ŷ U vs ŷ reg or t vs t reg Which is better? SRS or Regression Estimation? When does regression estimation provide better estimates of y U or t y than the simple random sample (SRS) estimator? Recall that for least squares regression we have SS(total) = SS(regression)+ SSE. Then If x and y are strongly correlated, most of the variability in the y values can be explained by the regression and there will be very little random variability about the regression line. That is, M S(regression) is large relative to M SE. If x and y are weakly correlated, very little of the variability in the y values can be explained by the regression and most of the variability is random variability about the regression line. That is, M S(regresssion) is small relative to M SE. Thus, for a moderate to strong linear relationship between x and y, regression estimation is recommended over SRS estimation. 5.8 Sample Size Estimation for Ratio and Regression Estimation To determine the sample size formulas for ratio and regression estimation, we will use the same approach that was used for determining a sample size for simple random sampling:. Specify a maximum allowable difference d for the parameter we want to estimate. This is equivalent to stating the largest margin of error the researcher would want for a confidence interval. 2. Specify α (where 00( α)% is the confidence level for the confidence interval). 3. Specify a prior estimate of a variance V ( θ) where θ is the estimator of parameter θ. θ could be y U or t y or population ratio R = y U /x U (in ratio estimation). 4. Set the margin of error formula equal to d, and solve for n. Sample Size Determination for Ratio Estimation: From equations (50), (47), (48), and (49), the margin of error formulas for parameters B, y U, and t y with estimated variance s 2 e are For R : For y U : For t y : (N ) n s 2 z α/2 V (r) = z e α/2 Nx 2 U n (N ) n s 2 z α/2 V ( µ r ) = z e α/2 N n z α/2 V ( t r ) = z α/2 N(N n) s2 e n 36
19 After setting the margin of error = d, and solving for n yields For B : n = n 0 + N where n 0 = where d is the maximum allowable difference for estimating the ratio B. It is assumed that you know x U. If you do not know x U, you must provide an estimate x U. For y U : n = For t y : n = n 0 + N where n 0 = estimating the mean y U. n 0 + N where n 0 = for estimating the total t y. where d is the maximum allowable difference for where d is the maximum allowable difference s 2 e is an estimate of the variability of the (x, y) points about a line having a zero intercept. You can use s 2 e from a prior study, a pilot study, or double sampling. Example of sample size determination: Using the information from the pulpwood and dry wood example, estimate the sample size required so that t y is within 200 kg of t y with a probability of.95. Assume the new truckloads contain 000 bundles of wood. Sample Size Estimation for Regression Estimation: From equation (6), (47) and the confidence interval formulas for y U and t y, the margin of error formulas for parameters y U, and t y with estimated variance MSE are For y U : For t y : z α/2 V (ŷ reg ) = z α/2 N n Nn MSE z α/2 N(N n) V ( τ reg ) = z α/2 MSE n After setting the margin of error = d, and solving for n yields For y U : n = For t y : n = n 0 + N where n 0 = for estimating the mean y U. n 0 + N where n 0 = for estimating the total t y. where d is the maximum allowable difference where d is the maximum allowable difference The MSE for the regression is an estimate of the variability of the (x, y) points about the regression line. You can use MSE from a prior study, a pilot study, or double sampling. 37
ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.
ST512 Fall Quarter, 2005 Exam 1 Name: Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. 1. (42 points) A random sample of n = 30 NBA basketball
More informationTHIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010
THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE
More informationRegression Analysis and Linear Regression Models
Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical
More informationIntroduction to Statistical Analyses in SAS
Introduction to Statistical Analyses in SAS Programming Workshop Presented by the Applied Statistics Lab Sarah Janse April 5, 2017 1 Introduction Today we will go over some basic statistical analyses in
More informationWeek 4: Simple Linear Regression III
Week 4: Simple Linear Regression III Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Goodness of
More informationFactorial ANOVA with SAS
Factorial ANOVA with SAS /* potato305.sas */ options linesize=79 noovp formdlim='_' ; title 'Rotten potatoes'; title2 ''; proc format; value tfmt 1 = 'Cool' 2 = 'Warm'; data spud; infile 'potato2.data'
More informationStat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors
Week 10: Autocorrelated errors This week, I have done one possible analysis and provided lots of output for you to consider. Case study: predicting body fat Body fat is an important health measure, but
More informationTHE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)
THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533 MIDTERM EXAMINATION: October 14, 2005 Instructor: Val LeMay Time: 50 minutes 40 Marks FRST 430 50 Marks FRST 533 (extra questions) This examination
More informationSTAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.
STAT 2607 REVIEW PROBLEMS 1 REMINDER: On the final exam 1. Word problems must be answered in words of the problem. 2. "Test" means that you must carry out a formal hypothesis testing procedure with H0,
More informationBivariate (Simple) Regression Analysis
Revised July 2018 Bivariate (Simple) Regression Analysis This set of notes shows how to use Stata to estimate a simple (two-variable) regression equation. It assumes that you have set Stata up on your
More informationExample 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1
Panel data set Consists of n entities or subjects (e.g., firms and states), each of which includes T observations measured at 1 through t time period. total number of observations : nt Panel data have
More information2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008
MIT OpenCourseWare http://ocw.mit.edu.83j / 6.78J / ESD.63J Control of Manufacturing Processes (SMA 633) Spring 8 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationIQR = number. summary: largest. = 2. Upper half: Q3 =
Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number
More informationRegression on the trees data with R
> trees Girth Height Volume 1 8.3 70 10.3 2 8.6 65 10.3 3 8.8 63 10.2 4 10.5 72 16.4 5 10.7 81 18.8 6 10.8 83 19.7 7 11.0 66 15.6 8 11.0 75 18.2 9 11.1 80 22.6 10 11.2 75 19.9 11 11.3 79 24.2 12 11.4 76
More informationCompare Linear Regression Lines for the HP-67
Compare Linear Regression Lines for the HP-67 by Namir Shammas This article presents an HP-67 program that calculates the linear regression statistics for two data sets and then compares their slopes and
More informationPSY 9556B (Feb 5) Latent Growth Modeling
PSY 9556B (Feb 5) Latent Growth Modeling Fixed and random word confusion Simplest LGM knowing how to calculate dfs How many time points needed? Power, sample size Nonlinear growth quadratic Nonlinear growth
More informationWeek 4: Simple Linear Regression II
Week 4: Simple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Algebraic properties
More informationLecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010
Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2010 1 / 26 Additive predictors
More informationSection 2.2: Covariance, Correlation, and Least Squares
Section 2.2: Covariance, Correlation, and Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 A Deeper
More informationSection 2.3: Simple Linear Regression: Predictions and Inference
Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple
More informationRobust Linear Regression (Passing- Bablok Median-Slope)
Chapter 314 Robust Linear Regression (Passing- Bablok Median-Slope) Introduction This procedure performs robust linear regression estimation using the Passing-Bablok (1988) median-slope algorithm. Their
More informationVariable selection is intended to select the best subset of predictors. But why bother?
Chapter 10 Variable Selection Variable selection is intended to select the best subset of predictors. But why bother? 1. We want to explain the data in the simplest way redundant predictors should be removed.
More informationSAS data statements and data: /*Factor A: angle Factor B: geometry Factor C: speed*/
STAT:5201 Applied Statistic II (Factorial with 3 factors as 2 3 design) Three-way ANOVA (Factorial with three factors) with replication Factor A: angle (low=0/high=1) Factor B: geometry (shape A=0/shape
More informationSTAT 705 Introduction to generalized additive models
STAT 705 Introduction to generalized additive models Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 22 Generalized additive models Consider a linear
More informationSubset Selection in Multiple Regression
Chapter 307 Subset Selection in Multiple Regression Introduction Multiple regression analysis is documented in Chapter 305 Multiple Regression, so that information will not be repeated here. Refer to that
More informationCross-validation and the Bootstrap
Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. These methods refit a model of interest to samples formed from the training set,
More informationMultiple Regression White paper
+44 (0) 333 666 7366 Multiple Regression White paper A tool to determine the impact in analysing the effectiveness of advertising spend. Multiple Regression In order to establish if the advertising mechanisms
More informationResources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.
Resources for statistical assistance Quantitative covariates and regression analysis Carolyn Taylor Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC January 24, 2017 Department
More informationTwo-Stage Least Squares
Chapter 316 Two-Stage Least Squares Introduction This procedure calculates the two-stage least squares (2SLS) estimate. This method is used fit models that include instrumental variables. 2SLS includes
More informationInstruction on JMP IN of Chapter 19
Instruction on JMP IN of Chapter 19 Example 19.2 (1). Download the dataset xm19-02.jmp from the website for this course and open it. (2). Go to the Analyze menu and select Fit Model. Click on "REVENUE"
More informationLab #9: ANOVA and TUKEY tests
Lab #9: ANOVA and TUKEY tests Objectives: 1. Column manipulation in SAS 2. Analysis of variance 3. Tukey test 4. Least Significant Difference test 5. Analysis of variance with PROC GLM 6. Levene test for
More informationOutline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model
Topic 16 - Other Remedies Ridge Regression Robust Regression Regression Trees Outline - Fall 2013 Piecewise Linear Model Bootstrapping Topic 16 2 Ridge Regression Modification of least squares that addresses
More informationRecall the expression for the minimum significant difference (w) used in the Tukey fixed-range method for means separation:
Topic 11. Unbalanced Designs [ST&D section 9.6, page 219; chapter 18] 11.1 Definition of missing data Accidents often result in loss of data. Crops are destroyed in some plots, plants and animals die,
More informationSection 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business
Section 3.4: Diagnostics and Transformations Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Regression Model Assumptions Y i = β 0 + β 1 X i + ɛ Recall the key assumptions
More informationUNIT 1: NUMBER LINES, INTERVALS, AND SETS
ALGEBRA II CURRICULUM OUTLINE 2011-2012 OVERVIEW: 1. Numbers, Lines, Intervals and Sets 2. Algebraic Manipulation: Rational Expressions and Exponents 3. Radicals and Radical Equations 4. Function Basics
More informationEvaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München
Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics
More informationCell means coding and effect coding
Cell means coding and effect coding /* mathregr_3.sas */ %include 'readmath.sas'; title2 ''; /* The data step continues */ if ethnic ne 6; /* Otherwise, throw the case out */ /* Indicator dummy variables
More informationModel Selection and Inference
Model Selection and Inference Merlise Clyde January 29, 2017 Last Class Model for brain weight as a function of body weight In the model with both response and predictor log transformed, are dinosaurs
More information14.2 The Regression Equation
14.2 The Regression Equation Tom Lewis Fall Term 2009 Tom Lewis () 14.2 The Regression Equation Fall Term 2009 1 / 12 Outline 1 Exact and inexact linear relationships 2 Fitting lines to data 3 Formulas
More informationA. Incorrect! This would be the negative of the range. B. Correct! The range is the maximum data value minus the minimum data value.
AP Statistics - Problem Drill 05: Measures of Variation No. 1 of 10 1. The range is calculated as. (A) The minimum data value minus the maximum data value. (B) The maximum data value minus the minimum
More informationCH5: CORR & SIMPLE LINEAR REFRESSION =======================================
STAT 430 SAS Examples SAS5 ===================== ssh xyz@glue.umd.edu, tap sas913 (old sas82), sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm CH5: CORR & SIMPLE LINEAR REFRESSION =======================================
More informationStat 5100 Handout #6 SAS: Linear Regression Remedial Measures
Stat 5100 Handout #6 SAS: Linear Regression Remedial Measures Example: Age and plasma level for 25 healthy children in a study are reported. Of interest is how plasma level depends on age. (Text Table
More informationUnit 5: Estimating with Confidence
Unit 5: Estimating with Confidence Section 8.3 The Practice of Statistics, 4 th edition For AP* STARNES, YATES, MOORE Unit 5 Estimating with Confidence 8.1 8.2 8.3 Confidence Intervals: The Basics Estimating
More informationBivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017
Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 4, 217 PDF file location: http://www.murraylax.org/rtutorials/regression_intro.pdf HTML file location:
More information9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10
St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 9: R 9.1 Random coefficients models...................... 1 9.1.1 Constructed data........................
More informationGetting Correct Results from PROC REG
Getting Correct Results from PROC REG Nate Derby Stakana Analytics Seattle, WA, USA SUCCESS 3/12/15 Nate Derby Getting Correct Results from PROC REG 1 / 29 Outline PROC REG 1 PROC REG 2 Nate Derby Getting
More informationConcept of Curve Fitting Difference with Interpolation
Curve Fitting Content Concept of Curve Fitting Difference with Interpolation Estimation of Linear Parameters by Least Squares Curve Fitting by Polynomial Least Squares Estimation of Non-linear Parameters
More informationCentering and Interactions: The Training Data
Centering and Interactions: The Training Data A random sample of 150 technical support workers were first given a test of their technical skill and knowledge, and then randomly assigned to one of three
More informationNormal Plot of the Effects (response is Mean free height, Alpha = 0.05)
Percent Normal Plot of the Effects (response is Mean free height, lpha = 5) 99 95 Effect Type Not Significant Significant 90 80 70 60 50 40 30 20 E F actor C D E Name C D E 0 5 E -0.3-0. Effect 0. 0.3
More informationToday is the last day to register for CU Succeed account AND claim your account. Tuesday is the last day to register for my class
Today is the last day to register for CU Succeed account AND claim your account. Tuesday is the last day to register for my class Back board says your name if you are on my roster. I need parent financial
More information1. What specialist uses information obtained from bones to help police solve crimes?
Mathematics: Modeling Our World Unit 4: PREDICTION HANDOUT VIDEO VIEWING GUIDE H4.1 1. What specialist uses information obtained from bones to help police solve crimes? 2.What are some things that can
More informationZ-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown
Z-TEST / Z-STATISTIC: used to test hypotheses about µ when the population standard deviation is known and population distribution is normal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses
More informationMultiple Linear Regression
Multiple Linear Regression Rebecca C. Steorts, Duke University STA 325, Chapter 3 ISL 1 / 49 Agenda How to extend beyond a SLR Multiple Linear Regression (MLR) Relationship Between the Response and Predictors
More information1. Answer: x or x. Explanation Set up the two equations, then solve each equation. x. Check
Thinkwell s Placement Test 5 Answer Key If you answered 7 or more Test 5 questions correctly, we recommend Thinkwell's Algebra. If you answered fewer than 7 Test 5 questions correctly, we recommend Thinkwell's
More informationUnit 8 SUPPLEMENT Normal, T, Chi Square, F, and Sums of Normals
BIOSTATS 540 Fall 017 8. SUPPLEMENT Normal, T, Chi Square, F and Sums of Normals Page 1 of Unit 8 SUPPLEMENT Normal, T, Chi Square, F, and Sums of Normals Topic 1. Normal Distribution.. a. Definition..
More informationGraphs and Linear Functions
Graphs and Linear Functions A -dimensional graph is a visual representation of a relationship between two variables given by an equation or an inequality. Graphs help us solve algebraic problems by analysing
More informationA Multiple-Line Fitting Algorithm Without Initialization Yan Guo
A Multiple-Line Fitting Algorithm Without Initialization Yan Guo Abstract: The commonest way to fit multiple lines is to use methods incorporate the EM algorithm. However, the EM algorithm dose not guarantee
More information1. Solve the following system of equations below. What does the solution represent? 5x + 2y = 10 3x + 5y = 2
1. Solve the following system of equations below. What does the solution represent? 5x + 2y = 10 3x + 5y = 2 2. Given the function: f(x) = a. Find f (6) b. State the domain of this function in interval
More informationRegression. Dr. G. Bharadwaja Kumar VIT Chennai
Regression Dr. G. Bharadwaja Kumar VIT Chennai Introduction Statistical models normally specify how one set of variables, called dependent variables, functionally depend on another set of variables, called
More informationSection 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business
Section 3.2: Multiple Linear Regression II Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Multiple Linear Regression: Inference and Understanding We can answer new questions
More informationSTAT:5201 Applied Statistic II
STAT:5201 Applied Statistic II Two-Factor Experiment (one fixed blocking factor, one fixed factor of interest) Randomized complete block design (RCBD) Primary Factor: Day length (short or long) Blocking
More informationLinear Methods for Regression and Shrinkage Methods
Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors
More informationCross-validation and the Bootstrap
Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. 1/44 Cross-validation and the Bootstrap In the section we discuss two resampling
More informationEstimating R 0 : Solutions
Estimating R 0 : Solutions John M. Drake and Pejman Rohani Exercise 1. Show how this result could have been obtained graphically without the rearranged equation. Here we use the influenza data discussed
More informationCSC 328/428 Summer Session I 2002 Data Analysis for the Experimenter FINAL EXAM
options pagesize=53 linesize=76 pageno=1 nodate; proc format; value $stcktyp "1"="Growth" "2"="Combined" "3"="Income"; data invstmnt; input stcktyp $ perform; label stkctyp="type of Stock" perform="overall
More informationVoluntary State Curriculum Algebra II
Algebra II Goal 1: Integration into Broader Knowledge The student will develop, analyze, communicate, and apply models to real-world situations using the language of mathematics and appropriate technology.
More informationSection 2.1: Intro to Simple Linear Regression & Least Squares
Section 2.1: Intro to Simple Linear Regression & Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 Regression:
More informationAssignment 4 (Sol.) Introduction to Data Analytics Prof. Nandan Sudarsanam & Prof. B. Ravindran
Assignment 4 (Sol.) Introduction to Data Analytics Prof. andan Sudarsanam & Prof. B. Ravindran 1. Which among the following techniques can be used to aid decision making when those decisions depend upon
More informationSpatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data
Spatial Patterns We will examine methods that are used to analyze patterns in two sorts of spatial data: Point Pattern Analysis - These methods concern themselves with the location information associated
More informationMAC 1105 Fall Term 2018
MAC 1105 Fall Term 2018 Each objective covered in MAC 1105 is listed below. Along with each objective is the homework list used with MyMathLab (MML) and a list to use with the text (if you want to use
More informationWeek 5: Multiple Linear Regression II
Week 5: Multiple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Adjusted R
More informationCREATING THE ANALYSIS
Chapter 14 Multiple Regression Chapter Table of Contents CREATING THE ANALYSIS...214 ModelInformation...217 SummaryofFit...217 AnalysisofVariance...217 TypeIIITests...218 ParameterEstimates...218 Residuals-by-PredictedPlot...219
More informationStatistics Lab #7 ANOVA Part 2 & ANCOVA
Statistics Lab #7 ANOVA Part 2 & ANCOVA PSYCH 710 7 Initialize R Initialize R by entering the following commands at the prompt. You must type the commands exactly as shown. options(contrasts=c("contr.sum","contr.poly")
More informationOne Factor Experiments
One Factor Experiments 20-1 Overview Computation of Effects Estimating Experimental Errors Allocation of Variation ANOVA Table and F-Test Visual Diagnostic Tests Confidence Intervals For Effects Unequal
More informationStat 8053, Fall 2013: Additive Models
Stat 853, Fall 213: Additive Models We will only use the package mgcv for fitting additive and later generalized additive models. The best reference is S. N. Wood (26), Generalized Additive Models, An
More informationEconomics Nonparametric Econometrics
Economics 217 - Nonparametric Econometrics Topics covered in this lecture Introduction to the nonparametric model The role of bandwidth Choice of smoothing function R commands for nonparametric models
More informationR-Square Coeff Var Root MSE y Mean
STAT:50 Applied Statistics II Exam - Practice 00 possible points. Consider a -factor study where each of the factors has 3 levels. The factors are Diet (,,3) and Drug (A,B,C) and there are n = 3 observations
More informationNCSS Statistical Software
Chapter 327 Geometric Regression Introduction Geometric regression is a special case of negative binomial regression in which the dispersion parameter is set to one. It is similar to regular multiple regression
More information( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.
Calibration OVERVIEW... 2 INTRODUCTION... 2 CALIBRATION... 3 ANOTHER REASON FOR CALIBRATION... 4 CHECKING THE CALIBRATION OF A REGRESSION... 5 CALIBRATION IN SIMPLE REGRESSION (DISPLAY.JMP)... 5 TESTING
More informationAn introduction to SPSS
An introduction to SPSS To open the SPSS software using U of Iowa Virtual Desktop... Go to https://virtualdesktop.uiowa.edu and choose SPSS 24. Contents NOTE: Save data files in a drive that is accessible
More informationCollege Prep Algebra II Summer Packet
Name: College Prep Algebra II Summer Packet This packet is an optional review which is highly recommended before entering CP Algebra II. It provides practice for necessary Algebra I topics. Remember: When
More informationIntegrated Math I. IM1.1.3 Understand and use the distributive, associative, and commutative properties.
Standard 1: Number Sense and Computation Students simplify and compare expressions. They use rational exponents and simplify square roots. IM1.1.1 Compare real number expressions. IM1.1.2 Simplify square
More information3. Data Analysis and Statistics
3. Data Analysis and Statistics 3.1 Visual Analysis of Data 3.2.1 Basic Statistics Examples 3.2.2 Basic Statistical Theory 3.3 Normal Distributions 3.4 Bivariate Data 3.1 Visual Analysis of Data Visual
More informationIntroduction to Hierarchical Linear Model. Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017
Introduction to Hierarchical Linear Model Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017 1 Outline What is Hierarchical Linear Model? Why do nested data create analytic problems? Graphic presentation
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 13: The bootstrap (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 30 Resampling 2 / 30 Sampling distribution of a statistic For this lecture: There is a population model
More informationCHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY
23 CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 3.1 DESIGN OF EXPERIMENTS Design of experiments is a systematic approach for investigation of a system or process. A series
More informationAssignment 2. with (a) (10 pts) naive Gauss elimination, (b) (10 pts) Gauss with partial pivoting
Assignment (Be sure to observe the rules about handing in homework). Solve: with (a) ( pts) naive Gauss elimination, (b) ( pts) Gauss with partial pivoting *You need to show all of the steps manually.
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 2015 MODULE 4 : Modelling experimental data Time allowed: Three hours Candidates should answer FIVE questions. All questions carry equal
More informationPoisson Regression and Model Checking
Poisson Regression and Model Checking Readings GH Chapter 6-8 September 27, 2017 HIV & Risk Behaviour Study The variables couples and women_alone code the intervention: control - no counselling (both 0)
More informationFor our example, we will look at the following factors and factor levels.
In order to review the calculations that are used to generate the Analysis of Variance, we will use the statapult example. By adjusting various settings on the statapult, you are able to throw the ball
More informationSAS/STAT 13.1 User s Guide. The NESTED Procedure
SAS/STAT 13.1 User s Guide The NESTED Procedure This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS Institute
More informationDoing Math: Scientific Notation Scientific notation is a way to show (and do math with) very small or very large numbers b 10 c
Doing Math: Scientific Notation Scientific notation is a way to show (and do math with) very small or very large numbers b 10 c b = a decimal number between 1.0 and 9.99... c =anexponentof10 Example: 1.53
More informationPsychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding
Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding In the previous lecture we learned how to incorporate a categorical research factor into a MLR model by using
More informationSection 1.5. Finding Linear Equations
Section 1.5 Finding Linear Equations Using Slope and a Point to Find an Equation of a Line Example Find an equation of a line that has slope m = 3 and contains the point (2, 5). Solution Substitute m =
More informationDescriptive Statistics, Standard Deviation and Standard Error
AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.
More informationApplied Statistics and Econometrics Lecture 6
Applied Statistics and Econometrics Lecture 6 Giuseppe Ragusa Luiss University gragusa@luiss.it http://gragusa.org/ March 6, 2017 Luiss University Empirical application. Data Italian Labour Force Survey,
More informationSimulating Multivariate Normal Data
Simulating Multivariate Normal Data You have a population correlation matrix and wish to simulate a set of data randomly sampled from a population with that structure. I shall present here code and examples
More informationCompare Linear Regression Lines for the HP-41C
Compare Linear Regression Lines for the HP-41C by Namir Shammas This article presents an HP-41C program that calculates the linear regression statistics for two data sets and then compares their slopes
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationSpline Models. Introduction to CS and NCS. Regression splines. Smoothing splines
Spline Models Introduction to CS and NCS Regression splines Smoothing splines 3 Cubic Splines a knots: a< 1 < 2 < < m
More information