Solution to Bonus Questions

Size: px

Start display at page:

Download "Solution to Bonus Questions"

Lambert Thompson
5 years ago
Views:

1 Solution to Bonus Questions Q2: (a) The histogram of 1000 sample means and sample variances are plotted below. Both histogram are symmetrically centered around the true lambda value 20. But the sample variances have much larger spread and range than the sample means. Hence as estimators for lambda, sample mean is better than sample variance because it is more precise. Histogram of Sample Means Histograms of Sample Variances Frequency Frequency xbar s2 (b) The average the 1000 sample means is and the variance of them is This leads to the simulated MSE of sample mean: (c) The average the 1000 sample variances is and the variance of them is This leads to the simulated MSE of sample mean: (d) Based on part (b) and (c), sample mean seems a better estimator than sample variance for lambda, since it has a smaller variance and a smaller MSE. (The bias of the sample means and the bias of the sample variances is about same. ) The result is same regardless of the choice of lambda. Please see the following table for comparison in various choice of lambda Bias:xbar e Bias:s e Var:xbar e Var:s e MSE:xbar e MSE:s e

2 (e) The first few confidence intervals are presented in the following as an illustration. Totally, 53 out of 1000 simulated confidence intervals do not contain the true lambda 20. This is similar as what we expect, since we constructed 95% confidence intervals. [,1] [,2] [1,] [2,] [3,] [4,] [5,] [6,] (f) The first few confidence intervals are presented in the following as an illustration. Totally, 51 out of 1000 simulated confidence intervals do not contain the true lambda 20. This is similar as what we expect, since we constructed 95% confidence intervals. [,1] [,2] [1,] [2,] [3,] [4,] [5,] [6,] (g) Comparing result from (e) and (f), both confidence intervals have similar coverage (similar chance of containing the true lambda). The average lengths of the confidence intervals in part (e) and (f) are 3.15 and 3.20, which are similar too. However, if we further check the standard deviation of the lengths of confidence intervals, we found the confidence interval length in part (f) is much more stable than the that in part (f). The standard deviation of the length of CIs in part (e) is 0.43, while it is only 0.07 in part (f), which is only 15% of the previous standard deviation. In another words, the confidence intervals in part (f) is much more stable than the confidence interval in part (e). In the following, we plot the upper limit against the lower limit of each confidence intervals. The black points are for CIs in part (e) and the blue points are for part (f). Obviously, the blue points are more tightly clustered than the black points, which implies that the CIs in part (f) is more stable than those in part (e). The points outside of the two red lines are the CIs that don t contain the true lambda. To summary, we conclude the confidence interval in part (f) is better. This is because we use extra distribution information to construct this confidence interval. The CI for population mean in part (e) is generally true, no matter what distribution of the data is. The CI in part (f) used the additional information that the mean and variance of the data is same, hence is more efficient and achieve better precision.

3 Confidence Intervals for lambda Upper limit CIs in (e) CIs in (f) Lower limit

4 Q3: (a) The regression output from R is attached below. About 29.49% variability in PIQ is accounted for by a person s brain size, height and weight. > summary(regfit) Call: lm(formula = PIQ ~ MRI + Height + Weight, data = piq) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 1.114e e MRI 2.060e e *** Height e e * Weight 5.599e e Signif. codes: 0 *** ** 0.01 * Residual standard error: on 34 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 34 DF, p-value: (b) A person s brain size significantly affects his/her PIQ, and 1 unit increase in of brain size (MRI count/10000) will lead to 2.06 unit of PIQ increase. Height is negatively associated with PIQ, and one inch taller will lead to 2.73 unit decrease in PIQ. Weight is not significantly associated with PIQ given brain size and height of a person. (c) The correlation matrix between brain size, height and weight is given below, together with their correlation with PIQ. Height and weight are most correlated with correlation 0.7. PIQ RI Height Weight PIQ MRI Height Weight The variance inflation factors (VIF) of brain size, height and weight are 1.58, 2.28, and 2.02, respectively. These are all less than 10 and are acceptable. There does not appear to be high multicollinearity. (d) The correlation between PIQ and brain size, height, weight are , respectively. The partial correlation between PIQ and height given brain size is (with p- value=0.02), and the partial correlation between PIQ and weight given brain size is (with p-value=0.16). Hence, height is a better predictor to be included, given brain size is already included in the model.

5 Appendix: ###### Bonus Question 2 ## Part(a) lambda=20 xn=matrix(rpois(30*1000,lambda),ncol=1000,nrow=30) xbar=apply(xn,2,mean) s2=apply(xn,2,var) par(mfrow=c(1,2)) hist(xbar,main="histogram of Sample Means") abline(v=lambda,col='red') hist(s2,main="histograms of Sample Variances") abline(v=lambda,col='red') ## Part (b) mean(xbar) var(xbar) MSExbar=(mean(xbar)-lambda)^2+var(xbar) MSExbar ## Part (c) mean(s2) var(s2) MSEs2=(mean(s2)-lambda)^2+var(s2) MSEs2 ## Part (d) ## This comparison under different lambda will be done later. ## Part (e) CIe=cbind(xbar-1.96*sqrt(s2)/sqrt(30),xbar+1.96*sqrt(s2)/sqrt(30)) head(cie) sum(cie[,1]>lambda)+ sum(cie[,2]<lambda) head(cie) ## Part (f) CIf=cbind(xbar-1.96*sqrt(xbar)/sqrt(30),xbar+1.96*sqrt(xbar)/sqrt(30)) head(cif) sum(cif[,1]>lambda)+ sum(cif[,2]<lambda) head(cif) ## Part(g) mean(cie[,2]-cie[,1]) sd(cie[,2]-cie[,1]) mean(cif[,2]-cif[,1]) sd(cif[,2]-cif[,1]) par(pch=19) plot(cie[,1],cie[,2],xlab="lower limit",ylab="upper limit", main="confidence Intervals for lambda") points(cif[,1],cif[,2],col="blue") abline(v=lambda,col='red',lty=2) abline(h=lambda,col='red',lty=2) legend("topleft",c("cis in (e)","cis in (f)"),col=c("black","blue"),pch=19)

6 ## Part (d) bias=c() variance=c() mse=c() for (lambda in c(20,30,50,100)){ ## simulate 1000 samples xn=matrix(rpois(30*1000,lambda),ncol=1000,nrow=30) ## record sample mean and sample variance of each sample for all 1000 samples xbar=apply(xn,2,mean) s2=apply(xn,2,var) bias=cbind(bias,c((mean(xbar)-lambda), (mean(s2)-lambda)) ) variance=cbind(variance, c(var(xbar),var(s2))) } MSExbar=(mean(xbar)-lambda)^2+var(xbar) MSEs2=(mean(s2)-lambda)^2+var(s2) mse=cbind(mse,c(msexbar,mses2)) mysummary=rbind(bias,variance,mse) colnames(mysummary)=c(20,30,50,100) rownames(mysummary)=c("bias:xbar","bias:s2","var:xbar","var:s2","mse:xbar","mse :s2") print(mysummary,digits=2) ###### Bonus Question 3 ## part(a) Ex piq=read.table("c:/data/work/teaching/math3200/tamhane_data/tamhane_data/ascii/ Chapt11/Ex11_3.txt",header=T) head(piq) regfit=lm(piq~mri+height+weight,data=piq) summary(regfit) ## part(c) Ex print(cor(piq),width=3) library(car) vif(regfit) ## part(d) Ex SSEx1=16198 SSEx1x2=13322 SSEx1x3=15258 n=dim(piq)[1] (r12=-sqrt((ssex1-ssex1x2)/ssex1)) (f2=(ssex1-ssex1x2)/(ssex1/(n-3))) 1-pf(f2,df1=1,df2=n-3) (r13=-sqrt((ssex1-ssex1x3)/ssex1)) (f3=(ssex1-ssex1x3)/(ssex1/(n-3))) 1-pf(r13^2*(n-3),df1=1,df2=n-3)

Regression Analysis and Linear Regression Models

Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical