Lab 2 - Simulation Study

Size: px

Start display at page:

Download "Lab 2 - Simulation Study"

Phyllis Foster
5 years ago
Views:

1 Lab 2 - Simulation Study Matthew T. Pratola 1, Jervyn Ang 1 and Andrew MacDougal 2 February 5, Department of Statistics and Actuarial Science, Simon Fraser University, Bunaby, BC, Canada 2 Department of Earth Sciences, Simon Fraser University, Burnaby, BC, Canada 1

Introduction In this lab, we investigate the behaviour of parameter estimation for spatial data modeled with Exponential and Gaussian variogram models.

2 Introduction In this lab, we investigate the behaviour of parameter estimation for spatial data modeled with Exponential and Gaussian variogram models. Two realizations of such random fields are shown in Figure 1 and Figure 2. The n = 143 data locations were the same as the parana dataset analyzed in Lab 1. Unless otherwise indicated, the parameters used in this report were σ 2 = 0.9, τ 2 = 0.1, µ = 0 and φ = 50. study. In the study, the 4 sample size settings correspond to 100%, 80%, 60% and 40% of the 143 data locations. The smaller samples were randomly selected from the full dataset to form four observed datasets of size n = 143, n 80 = 114, n 60 = 86 and n 40 = 57 respectively. Then, for each dataset, the Exponential and Gaussian models were fit using maximum likelihood (ML), with the true parameter settings as the starting values. This entire procedure was repeated 1, 000 times to form the empirical distributions shown in the following boxplots. Figure 3 shows the empirical distributions of the parameter estimates when the Exponential model is fit to simulated Exponential data, while Figure 4 shows the empirical distribution of the parameter estimates when the Gaussian model is fit to simu- Figure 1: Realization of a random field from Exponential covariance model with σ 2 = 0.9, τ 2 = 0.1, µ = 0 and φ = 50. lated Gaussian data. These results seem to indicate that the estimate of µ is well determined even at smaller sample sizes for both models. The distributions of τ 2, σ 2 and φ show an increase in variance as the sample size decreases. There is also some skewness, most notably for the τ 2 distribution in the Exponential case shown in Figure 3. The effect on prediction is summarized by the first two lines of Figure 2: Realization of a random field from Gaussian covariance model with σ 2 = 0.9, τ 2 = 0.1, µ = 0 and φ = 50. Table 1 which lists the squared prediction errors for the sites removed from the original data set. These results show that the Exponential model, which does not have the added constraint of smoothness, shows Effect of Sample Size We investigated the effects of 4 different sample sizes on parameter estimates by performing a simulation larger prediction errors as compared to the Gausian model. However, the relative difference in prediction error between both models decreased as the sample size decreased. 2

3 Model 80% 60% 40% Exp. data, Exp. model Gauss data, Gauss model Exp. data, Gauss model Gauss data, Exp. model Gauss data, τ 2 = Gauss data, τ 2 = Figure 3: Summary of parameter estimates for fitting Exponential model to Exponential data. True parameter values shown by dashed line. Table 1: Average squared prediction errors estimated by cross-validation using the hold-out set for the three reduced sample sizes and averaged over 1, 000 simulations. First two lines summarize the error when fitting the correct model, while lines 3-4 summarize model mis-specification results. The final two lines summarize the error for a large nugget (τ 2 = 0.4) and small nugget (τ 2 = 0.1) for the Gaussian data/model case. from an Exponential model. The results are shown in Figure 5 and Figure 6 respectively. Here again µ is well estimated for both mis-specified models and at all the sample sizes investigated. When the Exponential model was mis-specified as Gaussian, Figure 5 shows the τ 2 parameter was overestimated and the σ 2 parameter underestimated at all sample sizes. The φ parameter did not appear to be affected by this mis-specification. In contrast, when the Gaussian model was mis-specified as Exponential, Figure 6 shows the τ 2 parameter was underestimated while σ 2 was not seriously affected. Instead, the Figure 4: Summary of parameter estimates for fitting Gaussian model to Gaussian data. True parameter values shown by dashed line. Fitting the wrong variogram model The next experiment involved fitting the Exponential model to data simulated from a Gaussian model, and fitting the Gaussian model to data simulated φ estimate showed a tendancy to be over-estimated at large sample sizes with this mis-specification, although this effect is reduced as the sample size decreases. In terms of prediction, comparing lines 1,3 and 2,4 of Table 1 shows that fitting the Gaussian model to Exponential data only moderately affected the prediction errors for the hold-out set, while fitting 3

4 the Exponential model to Gaussian data led to prediction errors that were around 2 times greater than when fitting the Gaussian model. We also investigated model mis-specification when the range parameter was increased to φ = 100, resulting in smoother realized fields. The results of this analysis showed the same patterns just described, although they became more pronounced (see plots attached in Appendix A). In summary, the results in this section suggest the Gaussian model is more robust when fit to data that may have been generated by other models. Effect of Nugget Figure 5: Summary of parameter estimates for fitting Exponential and Gaussian model to Exponential data. True parameter values shown by dashed line. Our final experiment investigated two settings of the nugget parameter. We compared our original setting of τ 2 = 0.1 to a larger nugget setting given by τ 2 = 0.4 for the Gaussian case. The empirical distributions of the simulation study are in Figure 7. The larger nugget did not noticeably impact the empirical distributions of the µ parameter. The distributions of the τ 2 parameter showed the greatest impact, especially at smaller sample sizes where the variance of the estimate is noticeably larger when estimating the big nugget (0.4) as opposed to the small nugget (0.1). The empirical distributions of σ 2 and φ suggest greater parameter estimate uncertainty when modeling with the big nugget, but the effect is not very pronounced, nor conclusive. A possible interpretation for the increased uncertainty in τ 2 is that as the nugget increases, it s harder to determine what is the functional signal Figure 6: Summary of parameter estimates for fitting Gaussian and Exponential model to Gaussian data. True parameter values shown by dashed line. versus noise, and this will in turn affect the range parameter because either we have a really wavy surface with short range and smaller observational error or a smooth surface with long range and large ob- 4

5 Figure 7: Summary of parameter estimates for fitting Gaussian model to Gaussian-generated random field with small nugget (τ 2 = 0.1) and big nugget (τ 2 = 0.4). True parameter values shown by dashed line. Large nugget value shown by dotted line in τ 2 panel. servational error. It would seem plausible that this tradeoff becomes more problematic for smaller n. 5

6 Appendix A: Model Mis-specification with larger range parameter (φ = 100) Figure 8: Realization of a random field from Exponential covariance model with σ 2 = 0.9, τ 2 = 0.1, µ = 0 and φ = 100. Figure 10: Summary of parameter estimates for fitting Exponential and Gaussian model to Exponential data with φ = 100. True parameter values shown by dashed line. Figure 9: Realization of a random field from Gaussian covariance model with σ 2 = 0.9, τ 2 = 0.1, µ = 0 and φ = 100. Figure 11: Summary of parameter estimates for fitting Gaussian and Exponential model to Gaussian data with φ = 100. True parameter values shown by dashed line. 6

7 Code for Part A library(geor,lib.loc="/home/mtpratol/rlibs") data(parana) predgrid=expand.grid(seq(0,800,l=51),seq(0,600,l=51)) N=1000 (a) Trying different covariance models. We ll do gaussian and exponential. n=c(143,114,86,57) corresponds to all the data, 80%, 60% and 40% of the data. params.exp=list(all=matrix(nrow=n,ncol=4),eighty=matrix(nrow=n,ncol=4),sixty=matrix(nrow=n,ncol=4),fourty=matrix(nrow=n,ncol=4)) params.gauss=list(all=matrix(nrow=n,ncol=4),eighty=matrix(nrow=n,ncol=4),sixty=matrix(nrow=n,ncol=4),fourty=matrix(nrow=n,ncol=4)) Simulation using exponential model simulated.values.exp<-matrix(0,143,n) create matrix to store all simulated values for cross validation after fitting models pred.indices.exp<-array(0,c(143,4,n)) create matrix to store indices cat("simulation run ",i,"\r") First, create our new dataset: the true params are mean=0, tausq=.1, sigmasq=.9, phi=50 sim=grf(143, parana$coords, borders=parana$borders, cov.pars=c(.9, 50), nug=.1, cov.model="exp",messages=false) parana.new=parana parana.new$data=sim$data simulated.values.exp[,i]=sim$data for(j in 1:4) subsamp=sample(1:143,n[j]) parana.sub=parana.new parana.sub$coords=parana.sub$coords[subsamp,] parana.sub$data=parana.sub$data[subsamp] ml.exp=likfit(parana.sub, ini = c(.9,50), nug =.1,cov.model="exp",messages=FALSE) params.exp[[j]][i,]=ml.exp$parameters.summary$values[1:4] copy the beta, tausq, sigmasq and phi estimated params. pred.indices.exp[1:n[j],j,i]<-subsamp create matrix to store indices kcg=krige.conv(parana.sub, locations=predgrid,krige=krige.control(obj.m=ml.exp)) save(params.exp,file="params.exp.dat") save(pred.indices.exp,file="pred.indices.exp.dat") save(simulated.values.exp,file="sim.vals.exp.dat") Simulation using gaussian model simulated.values.gauss<-matrix(0,143,n) create matrix to store all simulated values for cross validation after fitting models pred.indices.gauss<-array(0,c(143,4,n)) create matrix to store indices cat("simulation run ",i,"\r") First, create our new dataset: the true params are mean=0, tausq=.1, sigmasq=.9, phi=50 sim=grf(143, parana$coords, borders=parana$borders, cov.pars=c(.9, 50), nug=.1, cov.model="gaussian",messages=false) parana.new=parana parana.new$data=sim$data simulated.values.gauss[,i]<-sim$data for(j in 1:4) subsamp=sample(1:143,n[j]) parana.sub=parana.new parana.sub$coords=parana.sub$coords[subsamp,] parana.sub$data=parana.sub$data[subsamp] ml.exp=likfit(parana.sub, ini = c(.9,50), nug =.1,cov.model="gaussian",messages=FALSE) params.gauss[[j]][i,]=ml.exp$parameters.summary$values[1:4] copy the beta, tausq, sigmasq and phi estimated params. pred.indices.gauss[1:n[j],j,i]<-subsamp create matrix to store indices kcg=krige.conv(parana.sub, locations=predgrid,krige=krige.control(obj.m=ml.exp)) save(params.gauss,file="params.gauss.dat") save(pred.indices.gauss,file="pred.indices.gauss.dat") save(simulated.values.gauss,file="sim.vals.gauss.dat") load(file="params.exp.dat") load(file="pred.indices.exp.dat") load(file="sim.vals.exp.dat") load(file="params.gauss.dat") load(file="pred.indices.gauss.dat") load(file="sim.vals.gauss.dat") Boxplot for fitting Exponential model to Exponential GRF over 1000 simulations with varying sample sizes. 7

8 boxplot(params.exp$all[,1],params.exp$eighty[,1],params.exp$sixty[,1],params.exp$fourty[,1],varwidth=t,notches=t,names=c("100%","80%","60%","40%"),main="mu estimate for Exponen abline(h=0,lty=2,col="grey") boxplot(params.exp$all[,2],params.exp$eighty[,2],params.exp$sixty[,2],params.exp$fourty[,2],varwidth=t,notches=t,names=c("100%","80%","60%","40%"),main="tau2 estimate for Expon abline(h=0.1,lty=2,col="grey") boxplot(params.exp$all[,3],params.exp$eighty[,3],params.exp$sixty[,3],params.exp$fourty[,3],varwidth=t,notches=t,names=c("100%","80%","60%","40%"),main="s2 estimate for Exponen abline(h=0.9,lty=2,col="grey") boxplot(params.exp$all[,4],params.exp$eighty[,4],params.exp$sixty[,4],params.exp$fourty[,4],varwidth=t,notches=t,names=c("100%","80%","60%","40%"),main="phi estimate for Expone abline(h=50,lty=2,col="grey") Boxplot for fitting Gaussian model to Gaussian GRF over 1000 simulations with varying sample sizes. boxplot(params.gauss$all[,1],params.gauss$eighty[,1],params.gauss$sixty[,1],params.gauss$fourty[,1],varwidth=t,notches=t,names=c("100%","80%","60%","40%"),main="mu estimate for abline(h=0,lty=2,col="grey") boxplot(params.gauss$all[,2],params.gauss$eighty[,2],params.gauss$sixty[,2],params.gauss$fourty[,2],varwidth=t,notches=t,names=c("100%","80%","60%","40%"),main="tau2 estimate f abline(h=.1,lty=2,col="grey") boxplot(params.gauss$all[,3],params.gauss$eighty[,3],params.gauss$sixty[,3],params.gauss$fourty[,3],varwidth=t,notches=t,names=c("100%","80%","60%","40%"),main="s2 estimate for abline(h=.9,lty=2,col="grey") boxplot(params.gauss$all[,4],params.gauss$eighty[,4],params.gauss$sixty[,4],params.gauss$fourty[,4],varwidth=t,notches=t,names=c("100%","80%","60%","40%"),main="phi estimate fo abline(h=50,lty=2,col="grey") Code for Part B library(geor,lib.loc="/home/mtpratol/rlibs") data(parana) predgrid=expand.grid(seq(0,800,l=51),seq(0,600,l=51)) N=1000 n=c(143,114,86,57) corresponds to all the data, 80%, 60% and 40% of the data. (b) Now fit the _wrong_ covariance model. We ll do gaussian and exponential. params.exp.fit.gauss=list(all=matrix(nrow=n,ncol=4),eighty=matrix(nrow=n,ncol=4),sixty=matrix(nrow=n,ncol=4),fourty=matrix(nrow=n,ncol=4)) params.gauss.fit.exp=list(all=matrix(nrow=n,ncol=4),eighty=matrix(nrow=n,ncol=4),sixty=matrix(nrow=n,ncol=4),fourty=matrix(nrow=n,ncol=4)) Simulation using exponential model BUT fit gaussian simulated.values.exp.fit.gauss<-matrix(0,143,n) create matrix to store all simulated values for cross validation after fitting models pred.indices.exp.fit.gauss<-array(0,c(143,4,n)) create matrix to store indices cat("simulation run ",i,"\r") First, create our new dataset: the true params are mean=0, tausq=.1, sigmasq=.9, phi=50 sim=grf(143, parana$coords, borders=parana$borders, cov.pars=c(.9, 50), nug=.1, cov.model="exp",messages=false) parana.new=parana parana.new$data=sim$data simulated.values.exp.fit.gauss[,i]<-sim$data for(j in 1:4) subsamp=sample(1:143,n[j]) parana.sub=parana.new parana.sub$coords=parana.sub$coords[subsamp,] parana.sub$data=parana.sub$data[subsamp] ml.exp=try(likfit(parana.sub, ini = c(.9,50), nug =.1,cov.model="gaussian",messages=FALSE)) capture singularities, repeat... if(class(ml.exp)=="try-error") j=4 i=i-1 else params.exp.fit.gauss[[j]][i,]=ml.exp$parameters.summary$values[1:4] copy the beta, tausq, sigmasq and phi estimated params. pred.indices.exp.fit.gauss[1:n[j],j,i]<-subsamp create matrix to store indices kcg=krige.conv(parana.sub, locations=predgrid,krige=krige.control(obj.m=ml.exp)) save(params.exp.fit.gauss,file="params.exp.fit.gauss.dat") save(pred.indices.exp.fit.gauss,file="pred.indices.exp.fit.gauss.dat") save(simulated.values.exp.fit.gauss,file="sim.vals.exp.fit.gauss.dat") Simulation using gaussian model BUT fit exponential simulated.values.gauss.fit.exp<-matrix(0,143,n) create matrix to store all simulated values for cross validation after fitting models pred.indices.gauss.fit.exp<-array(0,c(143,4,n)) create matrix to store indices cat("simulation run ",i,"\r") First, create our new dataset: the true params are mean=0, tausq=.1, sigmasq=.9, phi=50 sim=grf(143, parana$coords, borders=parana$borders, cov.pars=c(.9, 50), nug=.1, cov.model="gaussian",messages=false) parana.new=parana parana.new$data=sim$data simulated.values.gauss.fit.exp[,i]<-sim$data store all simulated values pred.indices.gauss.fit.exp[1:n[j],j,i]<-subsamp create matrix to store indices for(j in 1:4) 8

9 subsamp=sample(1:143,n[j]) parana.sub=parana.new parana.sub$coords=parana.sub$coords[subsamp,] parana.sub$data=parana.sub$data[subsamp] ml.exp=likfit(parana.sub, ini = c(.9,50), nug =.1,cov.model="exp",messages=FALSE) params.gauss.fit.exp[[j]][i,]=ml.exp$parameters.summary$values[1:4] copy the beta, tausq, sigmasq and phi estimated params. pred.indices.gauss.fit.exp[1:n[j],j,i]<-subsamp create matrix to store indices kcg=krige.conv(parana.sub, locations=predgrid,krige=krige.control(obj.m=ml.exp)) save(params.gauss.fit.exp,file="params.gauss.fit.exp.dat") save(pred.indices.gauss.fit.exp,file="pred.indices.gauss.fit.exp.dat") save(simulated.values.gauss.fit.exp,file="sim.vals.gauss.fit.exp.dat") load(file="params.exp.fit.gauss.dat") load(file="pred.indices.exp.fit.gauss.dat") load(file="sim.vals.exp.fit.gauss.dat") load(file="params.gauss.fit.exp.dat") load(file="pred.indices.gauss.fit.exp.dat") load(file="sim.vals.gauss.fit.exp.dat") Boxplot for fitting Gaussian model to Exponential GRF over 1000 simulations with varying sample sizes. boxplot(params.exp.fit.gauss$all[,1],params.exp.fit.gauss$eighty[,1],params.exp.fit.gauss$sixty[,1],params.exp.fit.gauss$fourty[,1],varwidth=t,notches=t,names=c("all","eighty", abline(h=0,lty=2,col="grey") boxplot(params.exp.fit.gauss$all[,2],params.exp.fit.gauss$eighty[,2],params.exp.fit.gauss$sixty[,2],params.exp.fit.gauss$fourty[,2],varwidth=t,notches=t,names=c("all","eighty", abline(h=0.1,lty=2,col="grey") boxplot(params.exp.fit.gauss$all[,3],params.exp.fit.gauss$eighty[,3],params.exp.fit.gauss$sixty[,3],params.exp.fit.gauss$fourty[,3],varwidth=t,notches=t,names=c("all","eighty", abline(h=0.9,lty=2,col="grey") boxplot(params.exp.fit.gauss$all[,4],params.exp.fit.gauss$eighty[,4],params.exp.fit.gauss$sixty[,4],params.exp.fit.gauss$fourty[,4],varwidth=t,notches=t,names=c("all","eighty", abline(h=50,lty=2,col="grey") Boxplot for fitting Exponential model to Gaussian GRF over 1000 simulations with varying sample sizes. boxplot(params.gauss.fit.exp$all[,1],params.gauss.fit.exp$eighty[,1],params.gauss.fit.exp$sixty[,1],params.gauss.fit.exp$fourty[,1],varwidth=t,notches=t,names=c("all","eighty", abline(h=0,lty=2,col="grey") boxplot(params.gauss.fit.exp$all[,2],params.gauss.fit.exp$eighty[,2],params.gauss.fit.exp$sixty[,2],params.gauss.fit.exp$fourty[,2],varwidth=t,notches=t,names=c("all","eighty", abline(h=.1,lty=2,col="grey") boxplot(params.gauss.fit.exp$all[,3],params.gauss.fit.exp$eighty[,3],params.gauss.fit.exp$sixty[,3],params.gauss.fit.exp$fourty[,3],varwidth=t,notches=t,names=c("all","eighty", abline(h=.9,lty=2,col="grey") boxplot(params.gauss.fit.exp$all[,4],params.gauss.fit.exp$eighty[,4],params.gauss.fit.exp$sixty[,4],params.gauss.fit.exp$fourty[,4],varwidth=t,notches=t,names=c("all","eighty", abline(h=50,lty=2,col="grey") Combined boxplots for Gaussian generated field par(omd=c(0.05,1,0,1)) add 0.05 offset to left margin to fit horizon. labels boxplot(params.gauss$all[,1],params.gauss.fit.exp$all[,1],params.gauss$eighty[,1],params.gauss.fit.exp$eighty[,1],params.gauss$sixty[,1],params.gauss.fit.exp$sixty[,1],params.g abline(v=0,lty=2,col="grey") par(omd=c(0.05,1,0,1)) add 0.05 offset to left margin to fit horizon. labels boxplot(params.gauss$all[,2],params.gauss.fit.exp$all[,2],params.gauss$eighty[,2],params.gauss.fit.exp$eighty[,2],params.gauss$sixty[,2],params.gauss.fit.exp$sixty[,2],params.g abline(v=0.1,lty=2,col="grey") par(omd=c(0.05,1,0,1)) add 0.05 offset to left margin to fit horizon. labels boxplot(params.gauss$all[,3],params.gauss.fit.exp$all[,3],params.gauss$eighty[,3],params.gauss.fit.exp$eighty[,3],params.gauss$sixty[,3],params.gauss.fit.exp$sixty[,3],params.g abline(v=0.9,lty=2,col="grey") par(omd=c(0.05,1,0,1)) add 0.05 offset to left margin to fit horizon. labels boxplot(params.gauss$all[,4],params.gauss.fit.exp$all[,4],params.gauss$eighty[,4],params.gauss.fit.exp$eighty[,4],params.gauss$sixty[,4],params.gauss.fit.exp$sixty[,4],params.g abline(v=50,lty=2,col="grey") Combined boxplots for Exponential generated field par(omd=c(0.05,1,0,1)) add 0.05 offset to left margin to fit horizon. labels boxplot(params.exp$all[,1],params.exp.fit.gauss$all[,1],params.exp$eighty[,1],params.exp.fit.gauss$eighty[,1],params.exp$sixty[,1],params.exp.fit.gauss$sixty[,1],params.exp$fou abline(v=0,lty=2,col="grey") par(omd=c(0.05,1,0,1)) add 0.05 offset to left margin to fit horizon. labels boxplot(params.exp$all[,2],params.exp.fit.gauss$all[,2],params.exp$eighty[,2],params.exp.fit.gauss$eighty[,2],params.exp$sixty[,2],params.exp.fit.gauss$sixty[,2],params.exp$fou abline(v=0.1,lty=2,col="grey") par(omd=c(0.05,1,0,1)) add 0.05 offset to left margin to fit horizon. labels boxplot(params.exp$all[,3],params.exp.fit.gauss$all[,3],params.exp$eighty[,3],params.exp.fit.gauss$eighty[,3],params.exp$sixty[,3],params.exp.fit.gauss$sixty[,3],params.exp$fou abline(v=0.9,lty=2,col="grey") par(omd=c(0.05,1,0,1)) add 0.05 offset to left margin to fit horizon. labels boxplot(params.exp$all[,4],params.exp.fit.gauss$all[,4],params.exp$eighty[,4],params.exp.fit.gauss$eighty[,4],params.exp$sixty[,4],params.exp.fit.gauss$sixty[,4],params.exp$fou abline(v=50,lty=2,col="grey") Code for Part C library(geor,lib.loc="/home/mtpratol/rlibs") data(parana) predgrid=expand.grid(seq(0,800,l=51),seq(0,600,l=51)) N=1000 n=c(143,114,86,57) corresponds to all the data, 80%, 60% and 40% of the data. 9

10 (c) Now play with the nugget, we ll use gaussian model params.gauss.smallnugget=list(all=matrix(nrow=n,ncol=4),eighty=matrix(nrow=n,ncol=4),sixty=matrix(nrow=n,ncol=4),fourty=matrix(nrow=n,ncol=4)) params.gauss.bignugget=list(all=matrix(nrow=n,ncol=4),eighty=matrix(nrow=n,ncol=4),sixty=matrix(nrow=n,ncol=4),fourty=matrix(nrow=n,ncol=4)) Simulation using gaussian, small nugget simulated.values.gauss.small.nugget<-matrix(0,143,n) create matrix to store all simulated values for cross validation after fitting models pred.indices.gauss.small.nugget<-array(0,c(143,4,n)) create matrix to store indices cat("simulation run ",i,"\r") First, create our new dataset: the true params are mean=0, tausq=.1, sigmasq=.9, phi=50 sim=grf(143, parana$coords, borders=parana$borders, cov.pars=c(.9, 50), nug=.1, cov.model="gaussian",messages=false) parana.new=parana parana.new$data=sim$data simulated.values.gauss.small.nugget[,i]<-sim$data store all simulated values for(j in 1:4) subsamp=sample(1:143,n[j]) parana.sub=parana.new parana.sub$coords=parana.sub$coords[subsamp,] parana.sub$data=parana.sub$data[subsamp] ml.exp=likfit(parana.sub, ini = c(.9,50), nug =.1,cov.model="gaussian",messages=FALSE) params.gauss.smallnugget[[j]][i,]=ml.exp$parameters.summary$values[1:4] copy the beta, tausq, sigmasq and phi estimated params. pred.indices.gauss.small.nugget[1:n[j],j,i]<-subsamp create matrix to store indices kcg=krige.conv(parana.sub, locations=predgrid,krige=krige.control(obj.m=ml.exp)) save(params.gauss.smallnugget,file="params.gauss.smallnugget.dat") save(pred.indices.gauss.small.nugget,file="pred.indices.gauss.smallnugget.dat") save(simulated.values.gauss.small.nugget,file="sim.vals.gauss.smallnugget.dat") Simulation using gaussian model, fit with big nugget simulated.values.gauss.big.nugget<-matrix(0,143,n) create matrix to store all simulated values for cross validation after fitting models pred.indices.gauss.big.nugget<-array(0,c(143,4,n)) create matrix to store indices cat("simulation run ",i,"\r") First, create our new dataset: the true params are mean=0, tausq=.1, sigmasq=.9, phi=50 sim=grf(143, parana$coords, borders=parana$borders, cov.pars=c(.9, 50), nug=.4, cov.model="gaussian",messages=false) parana.new=parana parana.new$data=sim$data simulated.values.gauss.big.nugget[,i]<-sim$data store all simulated values for(j in 1:4) subsamp=sample(1:143,n[j]) parana.sub=parana.new parana.sub$coords=parana.sub$coords[subsamp,] parana.sub$data=parana.sub$data[subsamp] ml.exp=likfit(parana.sub, ini = c(.9,50), nug =.6,cov.model="gaussian",messages=FALSE) nug=.6 to avoid singularities params.gauss.bignugget[[j]][i,]=ml.exp$parameters.summary$values[1:4] copy the beta, tausq, sigmasq and phi estimated params. pred.indices.gauss.big.nugget[1:n[j],j,i]<-subsamp create matrix to store indices kcg=krige.conv(parana.sub, locations=predgrid,krige=krige.control(obj.m=ml.exp)) save(params.gauss.bignugget,file="params.gauss.bignugget.dat") save(pred.indices.gauss.big.nugget,file="pred.indices.gauss.bignugget.dat") save(simulated.values.gauss.big.nugget,file="sim.vals.gauss.bignugget.dat") load(file="params.gauss.smallnugget.dat") load(file="pred.indices.gauss.smallnugget.dat") load(file="sim.vals.gauss.smallnugget.dat") load(file="params.gauss.bignugget.dat") load(file="pred.indices.gauss.bignugget.dat") load(file="sim.vals.gauss.bignugget.dat") Boxplot for fitting Gaussian model w/ big nugget to Gaussian GRF over 1000 simulations with varying sample sizes. boxplot(params.gauss.bignugget$all[,1],params.gauss.bignugget$eighty[,1],params.gauss.bignugget$sixty[,1],params.gauss.bignugget$fourty[,1],varwidth=t,notches=t,names=c("all"," abline(h=0,lty=2,col="grey") boxplot(params.gauss.bignugget$all[,2],params.gauss.bignugget$eighty[,2],params.gauss.bignugget$sixty[,2],params.gauss.bignugget$fourty[,2],varwidth=t,notches=t,names=c("all"," abline(h=0.4,lty=2,col="grey") boxplot(params.gauss.bignugget$all[,3],params.gauss.bignugget$eighty[,3],params.gauss.bignugget$sixty[,3],params.gauss.bignugget$fourty[,3],varwidth=t,notches=t,names=c("all"," abline(h=0.9,lty=2,col="grey") boxplot(params.gauss.bignugget$all[,4],params.gauss.bignugget$eighty[,4],params.gauss.bignugget$sixty[,4],params.gauss.bignugget$fourty[,4],varwidth=t,notches=t,names=c("all"," abline(h=50,lty=2,col="grey") Boxplot for fitting Gaussian model w/ small nugget to Gaussian GRF over 1000 simulations with varying sample sizes. 10

11 boxplot(params.gauss.smallnugget$all[,1],params.gauss.smallnugget$eighty[,1],params.gauss.smallnugget$sixty[,1],params.gauss.smallnugget$fourty[,1],varwidth=t,notches=t,names=c abline(h=0,lty=2,col="grey") boxplot(params.gauss.smallnugget$all[,2],params.gauss.smallnugget$eighty[,2],params.gauss.smallnugget$sixty[,2],params.gauss.smallnugget$fourty[,2],varwidth=t,notches=t,names=c abline(h=.1,lty=2,col="grey") boxplot(params.gauss.smallnugget$all[,3],params.gauss.smallnugget$eighty[,3],params.gauss.smallnugget$sixty[,3],params.gauss.smallnugget$fourty[,3],varwidth=t,notches=t,names=c abline(h=.9,lty=2,col="grey") boxplot(params.gauss.smallnugget$all[,4],params.gauss.smallnugget$eighty[,4],params.gauss.smallnugget$sixty[,4],params.gauss.smallnugget$fourty[,4],varwidth=t,notches=t,names=c abline(h=50,lty=2,col="grey") Combined boxplots par(omd=c(0.05,1,0,1)) add 0.05 offset to left margin to fit horizon. labels boxplot(params.gauss.bignugget$all[,1],params.gauss.smallnugget$all[,1],params.gauss.bignugget$eighty[,1],params.gauss.smallnugget$eighty[,1],params.gauss.bignugget$sixty[,1],p abline(v=0,lty=2,col="grey") par(omd=c(0.05,1,0,1)) add 0.05 offset to left margin to fit horizon. labels boxplot(params.gauss.bignugget$all[,2],params.gauss.smallnugget$all[,2],params.gauss.bignugget$eighty[,2],params.gauss.smallnugget$eighty[,2],params.gauss.bignugget$sixty[,2],p abline(v=0.1,lty=2,col="grey") abline(v=0.4,lty=3,col="blue") par(omd=c(0.05,1,0,1)) add 0.05 offset to left margin to fit horizon. labels boxplot(params.gauss.bignugget$all[,3],params.gauss.smallnugget$all[,3],params.gauss.bignugget$eighty[,3],params.gauss.smallnugget$eighty[,3],params.gauss.bignugget$sixty[,3],p abline(v=0.9,lty=2,col="grey") par(omd=c(0.05,1,0,1)) add 0.05 offset to left margin to fit horizon. labels boxplot(params.gauss.bignugget$all[,4],params.gauss.smallnugget$all[,4],params.gauss.bignugget$eighty[,4],params.gauss.smallnugget$eighty[,4],params.gauss.bignugget$sixty[,4],p abline(v=50,lty=2,col="grey") Code for Cross-validation results Predicted Values at sites left out of parameter optimization: library(geor) data(parana) N= n=c(143,114,86,57) crds=parana$coords (a) Try same covariance models expenential. load("/library/notes/stats/lab2/outputs/sim.vals.exp.dat") load("/library/notes/stats/lab2/outputs/params.exp.dat") load("/library/notes/stats/lab2/outputs/pred.indices.exp.dat") krg.values.exp.fit.exp<-array(0,c(143,3,n)) matrix to hold kriged values dta.exp=simulated.values.exp[,i] for(j in 2:4) smp.exp=pred.indices.exp[,j,i] if(j==2) prm.exp=params.exp$eighty[i,] else if(j==3) prm.exp=params.exp$sixty[i,] else if(j==4) prm.exp=params.exp$fourty[i,] Krige at parana coords for subsampled varogram fits dta.exp.sub=dta.exp[smp.exp] crds.sub=crds[smp.exp,] ep<- krige.conv(coords=crds.sub, data=dta.exp.sub, locations=crds, krige = krige.control(type.krige="ok",cov.model="exp",cov.par=c(prm.exp[3],prm.exp[4]),nug=prm.exp[2],)) krg.values.exp.fit.exp[,j-1,i]=ep$predict save(krg.values.exp.fit.exp,file="krg.values.exp.fit.exp.dat") (aii) Try same covariance models gausian. load("/library/notes/stats/lab2/outputs/sim.vals.gauss.dat") load("/library/notes/stats/lab2/outputs/params.gauss.dat") load("/library/notes/stats/lab2/outputs/pred.indices.gauss.dat") krg.values.gauss.fit.gauss<-array(0,c(143,3,n)) matrix to hold kriged values dta.gauss=simulated.values.gauss[,i] for(j in 2:4) smp.gauss=pred.indices.gauss[,j,i] if(j==2) prm.gauss=params.gauss$eighty[i,] else if(j==3) prm.gauss=params.gauss$sixty[i,] else if(j==4) prm.gauss=params.gauss$fourty[i,] Krige at parana coords for subsampled varogram fits 11

12 dta.gauss.sub=dta.gauss[smp.gauss] crds.sub=crds[smp.gauss,] gp<- krige.conv(coords=crds.sub, data=dta.gauss.sub, locations=crds, krige = krige.control(type.krige="ok",cov.model="gaussian",cov.par=c(prm.gauss[3],prm.gauss[4]),nug=prm.gauss[2],)) krg.values.gauss.fit.gauss[,j-1,i]=gp$predict save(krg.values.gauss.fit.gauss,file="krg.values.gauss.fit.gauss.dat") (b) Now fit the _wrong_ covariance model. We ll do gaussian and exponential. load("/library/notes/stats/lab2/outputs/sim.vals.exp.fit.gauss.dat") load("/library/notes/stats/lab2/outputs/params.exp.fit.gauss.dat") load("/library/notes/stats/lab2/outputs/pred.indices.exp.fit.gauss.dat") krg.values.exp.fit.gauss<-array(0,c(143,3,n)) matrix to hold kriged values dta.exp.fit.gauss=simulated.values.exp.fit.gauss[,i] for(j in 2:4) smp.exp.fit.gauss=pred.indices.exp.fit.gauss[,j,i] if(j==2) prm.exp.fit.gauss=params.exp.fit.gauss$eighty[i,] else if(j==3) prm.exp.fit.gauss=params.exp.fit.gauss$sixty[i,] else if(j==4) prm.exp.fit.gauss=params.exp.fit.gauss$fourty[i,] Krige at parana coords for subsampled varogram fits dta.exp.fit.gauss.sub=dta.exp.fit.gauss[smp.exp.fit.gauss] crds.sub=crds[smp.exp.fit.gauss,] epg<- krige.conv(coords=crds.sub, data=dta.exp.fit.gauss.sub, locations=crds, krige = krige.control(type.krige="ok",cov.model="gaussian",cov.par=c(prm.exp.fit.gauss[3],prm.exp.fit.gauss[4]),nug=prm.exp.fit.gauss[ 2],)) krg.values.exp.fit.gauss[,j-1,i]=epg$predict save(krg.values.exp.fit.gauss,file="krg.values.exp.fit.gauss.dat") Simulation using gaussian model BUT fit exponential load("/library/notes/stats/lab2/outputs/sim.vals.gauss.fit.exp.dat") load("/library/notes/stats/lab2/outputs/params.gauss.fit.exp.dat") load("/library/notes/stats/lab2/outputs/pred.indices.gauss.fit.exp.dat") krg.values.gauss.fit.exp<-array(0,c(143,3,n)) matrix to hold kriged values dta.gauss.fit.exp=simulated.values.gauss.fit.exp[,i] for(j in 2:4) smp.gauss.fit.exp=pred.indices.gauss.fit.exp[,j,i] if(j==2) prm.gauss.fit.exp=params.gauss.fit.exp$eighty[i,] else if(j==3) prm.gauss.fit.exp=params.gauss.fit.exp$sixty[i,] else if(j==4) prm.gauss.fit.exp=params.gauss.fit.exp$fourty[i,] Krige at parana coords for subsampled varogram fits dta.gauss.fit.exp.sub=dta.gauss.fit.exp[smp.gauss.fit.exp] crds.sub=crds[smp.gauss.fit.exp,] epg<- krige.conv(coords=crds.sub, data=dta.gauss.fit.exp.sub, locations=crds, krige = krige.control(type.krige="ok",cov.model="exp",cov.par=c(prm.gauss.fit.exp[3],prm.gauss.fit.exp[4]),nug=prm.gauss.fit.exp[2],)) krg.values.gauss.fit.exp[,j-1,i]=epg$predict save(krg.values.gauss.fit.exp,file="krg.values.gauss.fit.exp.dat") (c) Now play with the nugget, we ll use gaussian model load("/library/notes/stats/lab2/outputs/sim.vals.gauss.smallnugget.dat") load("/library/notes/stats/lab2/outputs/params.gauss.smallnugget.dat") load("/library/notes/stats/lab2/outputs/pred.indices.gauss.smallnugget.dat") krg.values.gauss.smallnugget<-array(0,c(143,3,n)) matrix to hold kriged values dta.gauss.small.nugget=simulated.values.gauss.small.nugget[,i] for(j in 2:4) 12

13 smp.gauss.small.nugget=pred.indices.gauss.small.nugget[,j,i] if(j==2) prm.gauss.small.nugget=params.gauss.smallnugget$eighty[i,] else if(j==3) prm.gauss.small.nugget=params.gauss.smallnugget$sixty[i,] else if(j==4) prm.gauss.small.nugget=params.gauss.smallnugget$fourty[i,] Krige at parana coords for subsampled varogram fits dta.gauss.small.nugget.sub=dta.gauss.small.nugget[smp.gauss.small.nugget] crds.sub=crds[smp.gauss.small.nugget,] gsn<- krige.conv(coords=crds.sub, data=dta.gauss.small.nugget.sub, locations=crds, krige = krige.control(type.krige="ok",cov.model="gaussian",cov.par=c(prm.gauss.small.nugget[3],prm.gauss.small.nugget[4]),nug=prm. gauss.small.nugget[2],)) krg.values.gauss.smallnugget[,j-1,i]=gsn$predict save(krg.values.gauss.smallnugget,file="krg.values.gauss.smallnugget.dat") Simulation using gaussian model, fit with big nugget load("/library/notes/stats/lab2/outputs/sim.vals.gauss.bignugget.dat") load("/library/notes/stats/lab2/outputs/params.gauss.bignugget.dat") load("/library/notes/stats/lab2/outputs/pred.indices.gauss.bignugget.dat") krg.values.gauss.bignugget<-array(0,c(143,3,n)) matrix to hold kriged values dta.gauss.big.nugget=simulated.values.gauss.big.nugget[,i] for(j in 2:4) smp.gauss.big.nugget=pred.indices.gauss.big.nugget[,j,i] if(j==2) prm.gauss.big.nugget=params.gauss.bignugget$eighty[i,] else if(j==3) prm.gauss.big.nugget=params.gauss.bignugget$sixty[i,] else if(j==4) prm.gauss.big.nugget=params.gauss.bignugget$fourty[i,] Krige at parana coords for subsampled varogram fits dta.gauss.big.nugget.sub=dta.gauss.big.nugget[smp.gauss.big.nugget] crds.sub=crds[smp.gauss.big.nugget,] gbn<- krige.conv(coords=crds.sub, data=dta.gauss.big.nugget.sub, locations=crds, krige = krige.control(type.krige="ok",cov.model="gaussian",cov.par=c(prm.gauss.big.nugget[3],prm.gauss.big.nugget[4]),nug=prm.gaus s.big.nugget[2],)) krg.values.gauss.bignugget[,j-1,i]=gbn$predict save(krg.values.gauss.bignugget,file="krg.values.gauss.bignugget.dat") Finding difference between known site values and estimated: N<-1000 load("krg.values.exp.fit.exp.dat") load("sim.vals.exp.dat") load("krg.values.gauss.fit.gauss.dat") load("sim.vals.gauss.dat") load("sim.vals.exp.fit.gauss.dat") load("sim.vals.gauss.fit.exp.dat") load("sim.vals.gauss.bignugget.dat") load("sim.vals.gauss.smallnugget.dat") load("krg.values.exp.fit.gauss.dat") load("krg.values.gauss.fit.exp.dat") load("krg.values.gauss.bignugget.dat") load("krg.values.gauss.smallnugget.dat") pred.values<-krg.values.exp.fit.exp sim.values.array<-array(0,c(143,3,n)) sim.values.array[,1,]<-simulated.values.exp sim.values.array[,2,]<-simulated.values.exp sim.values.array[,3,]<-simulated.values.exp pred.dev.array<-(pred.values-sim.values.array) pred.sq.dev<-pred.dev.array^2 pred.sum.sq.dev<-apply(pred.sq.dev,c(2,3),sum) mean.pred.error.exp<-pred.sum.sq.dev/c(29,57,86) cross.valid.exp<-apply(mean.pred.error.exp,1,mean) pred.values<-krg.values.gauss.fit.gauss sim.values.array<-array(0,c(143,3,n)) sim.values.array[,1,]<-simulated.values.gauss sim.values.array[,2,]<-simulated.values.gauss sim.values.array[,3,]<-simulated.values.gauss pred.dev.array<-(pred.values-sim.values.array) pred.sq.dev<-pred.dev.array^2 pred.sum.sq.dev<-apply(pred.sq.dev,c(2,3),sum) mean.pred.error.gauss<-pred.sum.sq.dev/c(29,57,86) cross.valid.gauss<-apply(mean.pred.error.gauss,1,mean) pred.values<-krg.values.gauss.bignugget sim.values.array<-array(0,c(143,3,n)) sim.values.array[,1,]<-simulated.values.gauss.big.nugget sim.values.array[,2,]<-simulated.values.gauss.big.nugget sim.values.array[,3,]<-simulated.values.gauss.big.nugget 13

14 pred.dev.array<-(pred.values-sim.values.array) pred.sq.dev<-pred.dev.array^2 pred.sum.sq.dev<-apply(pred.sq.dev,c(2,3),sum) mean.pred.error.gauss.bignugget<-pred.sum.sq.dev/c(29,57,86) cross.valid.gauss.bignugget<-apply(mean.pred.error.gauss.bignugget,1,mean) pred.values<-krg.values.gauss.smallnugget sim.values.array<-array(0,c(143,3,n)) sim.values.array[,1,]<-simulated.values.gauss.small.nugget sim.values.array[,2,]<-simulated.values.gauss.small.nugget sim.values.array[,3,]<-simulated.values.gauss.small.nugget pred.dev.array<-(pred.values-sim.values.array) pred.sq.dev<-pred.dev.array^2 pred.sum.sq.dev<-apply(pred.sq.dev,c(2,3),sum) mean.pred.error.gauss.smallnugget<-pred.sum.sq.dev/c(29,57,86) cross.valid.gauss.smallnugget<-apply(mean.pred.error.gauss.smallnugget,1,mean) pred.values<-krg.values.exp.fit.gauss sim.values.array<-array(0,c(143,3,n)) sim.values.array[,1,]<-simulated.values.exp.fit.gauss sim.values.array[,2,]<-simulated.values.exp.fit.gauss sim.values.array[,3,]<-simulated.values.exp.fit.gauss pred.dev.array<-(pred.values-sim.values.array) pred.sq.dev<-pred.dev.array^2 pred.sum.sq.dev<-apply(pred.sq.dev,c(2,3),sum) mean.pred.error.exp.fit.gauss<-pred.sum.sq.dev/c(29,57,86) cross.valid.exp.fit.gauss<-apply(mean.pred.error.exp.fit.gauss,1,mean) pred.values<-krg.values.gauss.fit.exp sim.values.array<-array(0,c(143,3,n)) sim.values.array[,1,]<-simulated.values.gauss.fit.exp sim.values.array[,2,]<-simulated.values.gauss.fit.exp sim.values.array[,3,]<-simulated.values.gauss.fit.exp pred.dev.array<-(pred.values-sim.values.array) pred.sq.dev<-pred.dev.array^2 pred.sum.sq.dev<-apply(pred.sq.dev,c(2,3),sum) mean.pred.error.gauss.fit.exp<-pred.sum.sq.dev/c(29,57,86) cross.valid.gauss.fit.exp<-apply(mean.pred.error.gauss.fit.exp,1,mean) 14

Applied Statistics : Practical 11

Applied Statistics : Practical 11 This practical will introduce basic tools for geostatistics in R. You may need first to install and load a few packages. The packages sp and lattice contain useful function