Lab 2 - Simulation Study
|
|
- Phyllis Foster
- 5 years ago
- Views:
Transcription
1 Lab 2 - Simulation Study Matthew T. Pratola 1, Jervyn Ang 1 and Andrew MacDougal 2 February 5, Department of Statistics and Actuarial Science, Simon Fraser University, Bunaby, BC, Canada 2 Department of Earth Sciences, Simon Fraser University, Burnaby, BC, Canada 1
2 Introduction In this lab, we investigate the behaviour of parameter estimation for spatial data modeled with Exponential and Gaussian variogram models. Two realizations of such random fields are shown in Figure 1 and Figure 2. The n = 143 data locations were the same as the parana dataset analyzed in Lab 1. Unless otherwise indicated, the parameters used in this report were σ 2 = 0.9, τ 2 = 0.1, µ = 0 and φ = 50. study. In the study, the 4 sample size settings correspond to 100%, 80%, 60% and 40% of the 143 data locations. The smaller samples were randomly selected from the full dataset to form four observed datasets of size n = 143, n 80 = 114, n 60 = 86 and n 40 = 57 respectively. Then, for each dataset, the Exponential and Gaussian models were fit using maximum likelihood (ML), with the true parameter settings as the starting values. This entire procedure was repeated 1, 000 times to form the empirical distributions shown in the following boxplots. Figure 3 shows the empirical distributions of the parameter estimates when the Exponential model is fit to simulated Exponential data, while Figure 4 shows the empirical distribution of the parameter estimates when the Gaussian model is fit to simu- Figure 1: Realization of a random field from Exponential covariance model with σ 2 = 0.9, τ 2 = 0.1, µ = 0 and φ = 50. lated Gaussian data. These results seem to indicate that the estimate of µ is well determined even at smaller sample sizes for both models. The distributions of τ 2, σ 2 and φ show an increase in variance as the sample size decreases. There is also some skewness, most notably for the τ 2 distribution in the Exponential case shown in Figure 3. The effect on prediction is summarized by the first two lines of Figure 2: Realization of a random field from Gaussian covariance model with σ 2 = 0.9, τ 2 = 0.1, µ = 0 and φ = 50. Table 1 which lists the squared prediction errors for the sites removed from the original data set. These results show that the Exponential model, which does not have the added constraint of smoothness, shows Effect of Sample Size We investigated the effects of 4 different sample sizes on parameter estimates by performing a simulation larger prediction errors as compared to the Gausian model. However, the relative difference in prediction error between both models decreased as the sample size decreased. 2
3 Model 80% 60% 40% Exp. data, Exp. model Gauss data, Gauss model Exp. data, Gauss model Gauss data, Exp. model Gauss data, τ 2 = Gauss data, τ 2 = Figure 3: Summary of parameter estimates for fitting Exponential model to Exponential data. True parameter values shown by dashed line. Table 1: Average squared prediction errors estimated by cross-validation using the hold-out set for the three reduced sample sizes and averaged over 1, 000 simulations. First two lines summarize the error when fitting the correct model, while lines 3-4 summarize model mis-specification results. The final two lines summarize the error for a large nugget (τ 2 = 0.4) and small nugget (τ 2 = 0.1) for the Gaussian data/model case. from an Exponential model. The results are shown in Figure 5 and Figure 6 respectively. Here again µ is well estimated for both mis-specified models and at all the sample sizes investigated. When the Exponential model was mis-specified as Gaussian, Figure 5 shows the τ 2 parameter was overestimated and the σ 2 parameter underestimated at all sample sizes. The φ parameter did not appear to be affected by this mis-specification. In contrast, when the Gaussian model was mis-specified as Exponential, Figure 6 shows the τ 2 parameter was underestimated while σ 2 was not seriously affected. Instead, the Figure 4: Summary of parameter estimates for fitting Gaussian model to Gaussian data. True parameter values shown by dashed line. Fitting the wrong variogram model The next experiment involved fitting the Exponential model to data simulated from a Gaussian model, and fitting the Gaussian model to data simulated φ estimate showed a tendancy to be over-estimated at large sample sizes with this mis-specification, although this effect is reduced as the sample size decreases. In terms of prediction, comparing lines 1,3 and 2,4 of Table 1 shows that fitting the Gaussian model to Exponential data only moderately affected the prediction errors for the hold-out set, while fitting 3
4 the Exponential model to Gaussian data led to prediction errors that were around 2 times greater than when fitting the Gaussian model. We also investigated model mis-specification when the range parameter was increased to φ = 100, resulting in smoother realized fields. The results of this analysis showed the same patterns just described, although they became more pronounced (see plots attached in Appendix A). In summary, the results in this section suggest the Gaussian model is more robust when fit to data that may have been generated by other models. Effect of Nugget Figure 5: Summary of parameter estimates for fitting Exponential and Gaussian model to Exponential data. True parameter values shown by dashed line. Our final experiment investigated two settings of the nugget parameter. We compared our original setting of τ 2 = 0.1 to a larger nugget setting given by τ 2 = 0.4 for the Gaussian case. The empirical distributions of the simulation study are in Figure 7. The larger nugget did not noticeably impact the empirical distributions of the µ parameter. The distributions of the τ 2 parameter showed the greatest impact, especially at smaller sample sizes where the variance of the estimate is noticeably larger when estimating the big nugget (0.4) as opposed to the small nugget (0.1). The empirical distributions of σ 2 and φ suggest greater parameter estimate uncertainty when modeling with the big nugget, but the effect is not very pronounced, nor conclusive. A possible interpretation for the increased uncertainty in τ 2 is that as the nugget increases, it s harder to determine what is the functional signal Figure 6: Summary of parameter estimates for fitting Gaussian and Exponential model to Gaussian data. True parameter values shown by dashed line. versus noise, and this will in turn affect the range parameter because either we have a really wavy surface with short range and smaller observational error or a smooth surface with long range and large ob- 4
5 Figure 7: Summary of parameter estimates for fitting Gaussian model to Gaussian-generated random field with small nugget (τ 2 = 0.1) and big nugget (τ 2 = 0.4). True parameter values shown by dashed line. Large nugget value shown by dotted line in τ 2 panel. servational error. It would seem plausible that this tradeoff becomes more problematic for smaller n. 5
6 Appendix A: Model Mis-specification with larger range parameter (φ = 100) Figure 8: Realization of a random field from Exponential covariance model with σ 2 = 0.9, τ 2 = 0.1, µ = 0 and φ = 100. Figure 10: Summary of parameter estimates for fitting Exponential and Gaussian model to Exponential data with φ = 100. True parameter values shown by dashed line. Figure 9: Realization of a random field from Gaussian covariance model with σ 2 = 0.9, τ 2 = 0.1, µ = 0 and φ = 100. Figure 11: Summary of parameter estimates for fitting Gaussian and Exponential model to Gaussian data with φ = 100. True parameter values shown by dashed line. 6
7 Code for Part A library(geor,lib.loc="/home/mtpratol/rlibs") data(parana) predgrid=expand.grid(seq(0,800,l=51),seq(0,600,l=51)) N=1000 (a) Trying different covariance models. We ll do gaussian and exponential. n=c(143,114,86,57) corresponds to all the data, 80%, 60% and 40% of the data. params.exp=list(all=matrix(nrow=n,ncol=4),eighty=matrix(nrow=n,ncol=4),sixty=matrix(nrow=n,ncol=4),fourty=matrix(nrow=n,ncol=4)) params.gauss=list(all=matrix(nrow=n,ncol=4),eighty=matrix(nrow=n,ncol=4),sixty=matrix(nrow=n,ncol=4),fourty=matrix(nrow=n,ncol=4)) Simulation using exponential model simulated.values.exp<-matrix(0,143,n) create matrix to store all simulated values for cross validation after fitting models pred.indices.exp<-array(0,c(143,4,n)) create matrix to store indices cat("simulation run ",i,"\r") First, create our new dataset: the true params are mean=0, tausq=.1, sigmasq=.9, phi=50 sim=grf(143, parana$coords, borders=parana$borders, cov.pars=c(.9, 50), nug=.1, cov.model="exp",messages=false) parana.new=parana parana.new$data=sim$data simulated.values.exp[,i]=sim$data for(j in 1:4) subsamp=sample(1:143,n[j]) parana.sub=parana.new parana.sub$coords=parana.sub$coords[subsamp,] parana.sub$data=parana.sub$data[subsamp] ml.exp=likfit(parana.sub, ini = c(.9,50), nug =.1,cov.model="exp",messages=FALSE) params.exp[[j]][i,]=ml.exp$parameters.summary$values[1:4] copy the beta, tausq, sigmasq and phi estimated params. pred.indices.exp[1:n[j],j,i]<-subsamp create matrix to store indices kcg=krige.conv(parana.sub, locations=predgrid,krige=krige.control(obj.m=ml.exp)) save(params.exp,file="params.exp.dat") save(pred.indices.exp,file="pred.indices.exp.dat") save(simulated.values.exp,file="sim.vals.exp.dat") Simulation using gaussian model simulated.values.gauss<-matrix(0,143,n) create matrix to store all simulated values for cross validation after fitting models pred.indices.gauss<-array(0,c(143,4,n)) create matrix to store indices cat("simulation run ",i,"\r") First, create our new dataset: the true params are mean=0, tausq=.1, sigmasq=.9, phi=50 sim=grf(143, parana$coords, borders=parana$borders, cov.pars=c(.9, 50), nug=.1, cov.model="gaussian",messages=false) parana.new=parana parana.new$data=sim$data simulated.values.gauss[,i]<-sim$data for(j in 1:4) subsamp=sample(1:143,n[j]) parana.sub=parana.new parana.sub$coords=parana.sub$coords[subsamp,] parana.sub$data=parana.sub$data[subsamp] ml.exp=likfit(parana.sub, ini = c(.9,50), nug =.1,cov.model="gaussian",messages=FALSE) params.gauss[[j]][i,]=ml.exp$parameters.summary$values[1:4] copy the beta, tausq, sigmasq and phi estimated params. pred.indices.gauss[1:n[j],j,i]<-subsamp create matrix to store indices kcg=krige.conv(parana.sub, locations=predgrid,krige=krige.control(obj.m=ml.exp)) save(params.gauss,file="params.gauss.dat") save(pred.indices.gauss,file="pred.indices.gauss.dat") save(simulated.values.gauss,file="sim.vals.gauss.dat") load(file="params.exp.dat") load(file="pred.indices.exp.dat") load(file="sim.vals.exp.dat") load(file="params.gauss.dat") load(file="pred.indices.gauss.dat") load(file="sim.vals.gauss.dat") Boxplot for fitting Exponential model to Exponential GRF over 1000 simulations with varying sample sizes. 7
8 boxplot(params.exp$all[,1],params.exp$eighty[,1],params.exp$sixty[,1],params.exp$fourty[,1],varwidth=t,notches=t,names=c("100%","80%","60%","40%"),main="mu estimate for Exponen abline(h=0,lty=2,col="grey") boxplot(params.exp$all[,2],params.exp$eighty[,2],params.exp$sixty[,2],params.exp$fourty[,2],varwidth=t,notches=t,names=c("100%","80%","60%","40%"),main="tau2 estimate for Expon abline(h=0.1,lty=2,col="grey") boxplot(params.exp$all[,3],params.exp$eighty[,3],params.exp$sixty[,3],params.exp$fourty[,3],varwidth=t,notches=t,names=c("100%","80%","60%","40%"),main="s2 estimate for Exponen abline(h=0.9,lty=2,col="grey") boxplot(params.exp$all[,4],params.exp$eighty[,4],params.exp$sixty[,4],params.exp$fourty[,4],varwidth=t,notches=t,names=c("100%","80%","60%","40%"),main="phi estimate for Expone abline(h=50,lty=2,col="grey") Boxplot for fitting Gaussian model to Gaussian GRF over 1000 simulations with varying sample sizes. boxplot(params.gauss$all[,1],params.gauss$eighty[,1],params.gauss$sixty[,1],params.gauss$fourty[,1],varwidth=t,notches=t,names=c("100%","80%","60%","40%"),main="mu estimate for abline(h=0,lty=2,col="grey") boxplot(params.gauss$all[,2],params.gauss$eighty[,2],params.gauss$sixty[,2],params.gauss$fourty[,2],varwidth=t,notches=t,names=c("100%","80%","60%","40%"),main="tau2 estimate f abline(h=.1,lty=2,col="grey") boxplot(params.gauss$all[,3],params.gauss$eighty[,3],params.gauss$sixty[,3],params.gauss$fourty[,3],varwidth=t,notches=t,names=c("100%","80%","60%","40%"),main="s2 estimate for abline(h=.9,lty=2,col="grey") boxplot(params.gauss$all[,4],params.gauss$eighty[,4],params.gauss$sixty[,4],params.gauss$fourty[,4],varwidth=t,notches=t,names=c("100%","80%","60%","40%"),main="phi estimate fo abline(h=50,lty=2,col="grey") Code for Part B library(geor,lib.loc="/home/mtpratol/rlibs") data(parana) predgrid=expand.grid(seq(0,800,l=51),seq(0,600,l=51)) N=1000 n=c(143,114,86,57) corresponds to all the data, 80%, 60% and 40% of the data. (b) Now fit the _wrong_ covariance model. We ll do gaussian and exponential. params.exp.fit.gauss=list(all=matrix(nrow=n,ncol=4),eighty=matrix(nrow=n,ncol=4),sixty=matrix(nrow=n,ncol=4),fourty=matrix(nrow=n,ncol=4)) params.gauss.fit.exp=list(all=matrix(nrow=n,ncol=4),eighty=matrix(nrow=n,ncol=4),sixty=matrix(nrow=n,ncol=4),fourty=matrix(nrow=n,ncol=4)) Simulation using exponential model BUT fit gaussian simulated.values.exp.fit.gauss<-matrix(0,143,n) create matrix to store all simulated values for cross validation after fitting models pred.indices.exp.fit.gauss<-array(0,c(143,4,n)) create matrix to store indices cat("simulation run ",i,"\r") First, create our new dataset: the true params are mean=0, tausq=.1, sigmasq=.9, phi=50 sim=grf(143, parana$coords, borders=parana$borders, cov.pars=c(.9, 50), nug=.1, cov.model="exp",messages=false) parana.new=parana parana.new$data=sim$data simulated.values.exp.fit.gauss[,i]<-sim$data for(j in 1:4) subsamp=sample(1:143,n[j]) parana.sub=parana.new parana.sub$coords=parana.sub$coords[subsamp,] parana.sub$data=parana.sub$data[subsamp] ml.exp=try(likfit(parana.sub, ini = c(.9,50), nug =.1,cov.model="gaussian",messages=FALSE)) capture singularities, repeat... if(class(ml.exp)=="try-error") j=4 i=i-1 else params.exp.fit.gauss[[j]][i,]=ml.exp$parameters.summary$values[1:4] copy the beta, tausq, sigmasq and phi estimated params. pred.indices.exp.fit.gauss[1:n[j],j,i]<-subsamp create matrix to store indices kcg=krige.conv(parana.sub, locations=predgrid,krige=krige.control(obj.m=ml.exp)) save(params.exp.fit.gauss,file="params.exp.fit.gauss.dat") save(pred.indices.exp.fit.gauss,file="pred.indices.exp.fit.gauss.dat") save(simulated.values.exp.fit.gauss,file="sim.vals.exp.fit.gauss.dat") Simulation using gaussian model BUT fit exponential simulated.values.gauss.fit.exp<-matrix(0,143,n) create matrix to store all simulated values for cross validation after fitting models pred.indices.gauss.fit.exp<-array(0,c(143,4,n)) create matrix to store indices cat("simulation run ",i,"\r") First, create our new dataset: the true params are mean=0, tausq=.1, sigmasq=.9, phi=50 sim=grf(143, parana$coords, borders=parana$borders, cov.pars=c(.9, 50), nug=.1, cov.model="gaussian",messages=false) parana.new=parana parana.new$data=sim$data simulated.values.gauss.fit.exp[,i]<-sim$data store all simulated values pred.indices.gauss.fit.exp[1:n[j],j,i]<-subsamp create matrix to store indices for(j in 1:4) 8
9 subsamp=sample(1:143,n[j]) parana.sub=parana.new parana.sub$coords=parana.sub$coords[subsamp,] parana.sub$data=parana.sub$data[subsamp] ml.exp=likfit(parana.sub, ini = c(.9,50), nug =.1,cov.model="exp",messages=FALSE) params.gauss.fit.exp[[j]][i,]=ml.exp$parameters.summary$values[1:4] copy the beta, tausq, sigmasq and phi estimated params. pred.indices.gauss.fit.exp[1:n[j],j,i]<-subsamp create matrix to store indices kcg=krige.conv(parana.sub, locations=predgrid,krige=krige.control(obj.m=ml.exp)) save(params.gauss.fit.exp,file="params.gauss.fit.exp.dat") save(pred.indices.gauss.fit.exp,file="pred.indices.gauss.fit.exp.dat") save(simulated.values.gauss.fit.exp,file="sim.vals.gauss.fit.exp.dat") load(file="params.exp.fit.gauss.dat") load(file="pred.indices.exp.fit.gauss.dat") load(file="sim.vals.exp.fit.gauss.dat") load(file="params.gauss.fit.exp.dat") load(file="pred.indices.gauss.fit.exp.dat") load(file="sim.vals.gauss.fit.exp.dat") Boxplot for fitting Gaussian model to Exponential GRF over 1000 simulations with varying sample sizes. boxplot(params.exp.fit.gauss$all[,1],params.exp.fit.gauss$eighty[,1],params.exp.fit.gauss$sixty[,1],params.exp.fit.gauss$fourty[,1],varwidth=t,notches=t,names=c("all","eighty", abline(h=0,lty=2,col="grey") boxplot(params.exp.fit.gauss$all[,2],params.exp.fit.gauss$eighty[,2],params.exp.fit.gauss$sixty[,2],params.exp.fit.gauss$fourty[,2],varwidth=t,notches=t,names=c("all","eighty", abline(h=0.1,lty=2,col="grey") boxplot(params.exp.fit.gauss$all[,3],params.exp.fit.gauss$eighty[,3],params.exp.fit.gauss$sixty[,3],params.exp.fit.gauss$fourty[,3],varwidth=t,notches=t,names=c("all","eighty", abline(h=0.9,lty=2,col="grey") boxplot(params.exp.fit.gauss$all[,4],params.exp.fit.gauss$eighty[,4],params.exp.fit.gauss$sixty[,4],params.exp.fit.gauss$fourty[,4],varwidth=t,notches=t,names=c("all","eighty", abline(h=50,lty=2,col="grey") Boxplot for fitting Exponential model to Gaussian GRF over 1000 simulations with varying sample sizes. boxplot(params.gauss.fit.exp$all[,1],params.gauss.fit.exp$eighty[,1],params.gauss.fit.exp$sixty[,1],params.gauss.fit.exp$fourty[,1],varwidth=t,notches=t,names=c("all","eighty", abline(h=0,lty=2,col="grey") boxplot(params.gauss.fit.exp$all[,2],params.gauss.fit.exp$eighty[,2],params.gauss.fit.exp$sixty[,2],params.gauss.fit.exp$fourty[,2],varwidth=t,notches=t,names=c("all","eighty", abline(h=.1,lty=2,col="grey") boxplot(params.gauss.fit.exp$all[,3],params.gauss.fit.exp$eighty[,3],params.gauss.fit.exp$sixty[,3],params.gauss.fit.exp$fourty[,3],varwidth=t,notches=t,names=c("all","eighty", abline(h=.9,lty=2,col="grey") boxplot(params.gauss.fit.exp$all[,4],params.gauss.fit.exp$eighty[,4],params.gauss.fit.exp$sixty[,4],params.gauss.fit.exp$fourty[,4],varwidth=t,notches=t,names=c("all","eighty", abline(h=50,lty=2,col="grey") Combined boxplots for Gaussian generated field par(omd=c(0.05,1,0,1)) add 0.05 offset to left margin to fit horizon. labels boxplot(params.gauss$all[,1],params.gauss.fit.exp$all[,1],params.gauss$eighty[,1],params.gauss.fit.exp$eighty[,1],params.gauss$sixty[,1],params.gauss.fit.exp$sixty[,1],params.g abline(v=0,lty=2,col="grey") par(omd=c(0.05,1,0,1)) add 0.05 offset to left margin to fit horizon. labels boxplot(params.gauss$all[,2],params.gauss.fit.exp$all[,2],params.gauss$eighty[,2],params.gauss.fit.exp$eighty[,2],params.gauss$sixty[,2],params.gauss.fit.exp$sixty[,2],params.g abline(v=0.1,lty=2,col="grey") par(omd=c(0.05,1,0,1)) add 0.05 offset to left margin to fit horizon. labels boxplot(params.gauss$all[,3],params.gauss.fit.exp$all[,3],params.gauss$eighty[,3],params.gauss.fit.exp$eighty[,3],params.gauss$sixty[,3],params.gauss.fit.exp$sixty[,3],params.g abline(v=0.9,lty=2,col="grey") par(omd=c(0.05,1,0,1)) add 0.05 offset to left margin to fit horizon. labels boxplot(params.gauss$all[,4],params.gauss.fit.exp$all[,4],params.gauss$eighty[,4],params.gauss.fit.exp$eighty[,4],params.gauss$sixty[,4],params.gauss.fit.exp$sixty[,4],params.g abline(v=50,lty=2,col="grey") Combined boxplots for Exponential generated field par(omd=c(0.05,1,0,1)) add 0.05 offset to left margin to fit horizon. labels boxplot(params.exp$all[,1],params.exp.fit.gauss$all[,1],params.exp$eighty[,1],params.exp.fit.gauss$eighty[,1],params.exp$sixty[,1],params.exp.fit.gauss$sixty[,1],params.exp$fou abline(v=0,lty=2,col="grey") par(omd=c(0.05,1,0,1)) add 0.05 offset to left margin to fit horizon. labels boxplot(params.exp$all[,2],params.exp.fit.gauss$all[,2],params.exp$eighty[,2],params.exp.fit.gauss$eighty[,2],params.exp$sixty[,2],params.exp.fit.gauss$sixty[,2],params.exp$fou abline(v=0.1,lty=2,col="grey") par(omd=c(0.05,1,0,1)) add 0.05 offset to left margin to fit horizon. labels boxplot(params.exp$all[,3],params.exp.fit.gauss$all[,3],params.exp$eighty[,3],params.exp.fit.gauss$eighty[,3],params.exp$sixty[,3],params.exp.fit.gauss$sixty[,3],params.exp$fou abline(v=0.9,lty=2,col="grey") par(omd=c(0.05,1,0,1)) add 0.05 offset to left margin to fit horizon. labels boxplot(params.exp$all[,4],params.exp.fit.gauss$all[,4],params.exp$eighty[,4],params.exp.fit.gauss$eighty[,4],params.exp$sixty[,4],params.exp.fit.gauss$sixty[,4],params.exp$fou abline(v=50,lty=2,col="grey") Code for Part C library(geor,lib.loc="/home/mtpratol/rlibs") data(parana) predgrid=expand.grid(seq(0,800,l=51),seq(0,600,l=51)) N=1000 n=c(143,114,86,57) corresponds to all the data, 80%, 60% and 40% of the data. 9
10 (c) Now play with the nugget, we ll use gaussian model params.gauss.smallnugget=list(all=matrix(nrow=n,ncol=4),eighty=matrix(nrow=n,ncol=4),sixty=matrix(nrow=n,ncol=4),fourty=matrix(nrow=n,ncol=4)) params.gauss.bignugget=list(all=matrix(nrow=n,ncol=4),eighty=matrix(nrow=n,ncol=4),sixty=matrix(nrow=n,ncol=4),fourty=matrix(nrow=n,ncol=4)) Simulation using gaussian, small nugget simulated.values.gauss.small.nugget<-matrix(0,143,n) create matrix to store all simulated values for cross validation after fitting models pred.indices.gauss.small.nugget<-array(0,c(143,4,n)) create matrix to store indices cat("simulation run ",i,"\r") First, create our new dataset: the true params are mean=0, tausq=.1, sigmasq=.9, phi=50 sim=grf(143, parana$coords, borders=parana$borders, cov.pars=c(.9, 50), nug=.1, cov.model="gaussian",messages=false) parana.new=parana parana.new$data=sim$data simulated.values.gauss.small.nugget[,i]<-sim$data store all simulated values for(j in 1:4) subsamp=sample(1:143,n[j]) parana.sub=parana.new parana.sub$coords=parana.sub$coords[subsamp,] parana.sub$data=parana.sub$data[subsamp] ml.exp=likfit(parana.sub, ini = c(.9,50), nug =.1,cov.model="gaussian",messages=FALSE) params.gauss.smallnugget[[j]][i,]=ml.exp$parameters.summary$values[1:4] copy the beta, tausq, sigmasq and phi estimated params. pred.indices.gauss.small.nugget[1:n[j],j,i]<-subsamp create matrix to store indices kcg=krige.conv(parana.sub, locations=predgrid,krige=krige.control(obj.m=ml.exp)) save(params.gauss.smallnugget,file="params.gauss.smallnugget.dat") save(pred.indices.gauss.small.nugget,file="pred.indices.gauss.smallnugget.dat") save(simulated.values.gauss.small.nugget,file="sim.vals.gauss.smallnugget.dat") Simulation using gaussian model, fit with big nugget simulated.values.gauss.big.nugget<-matrix(0,143,n) create matrix to store all simulated values for cross validation after fitting models pred.indices.gauss.big.nugget<-array(0,c(143,4,n)) create matrix to store indices cat("simulation run ",i,"\r") First, create our new dataset: the true params are mean=0, tausq=.1, sigmasq=.9, phi=50 sim=grf(143, parana$coords, borders=parana$borders, cov.pars=c(.9, 50), nug=.4, cov.model="gaussian",messages=false) parana.new=parana parana.new$data=sim$data simulated.values.gauss.big.nugget[,i]<-sim$data store all simulated values for(j in 1:4) subsamp=sample(1:143,n[j]) parana.sub=parana.new parana.sub$coords=parana.sub$coords[subsamp,] parana.sub$data=parana.sub$data[subsamp] ml.exp=likfit(parana.sub, ini = c(.9,50), nug =.6,cov.model="gaussian",messages=FALSE) nug=.6 to avoid singularities params.gauss.bignugget[[j]][i,]=ml.exp$parameters.summary$values[1:4] copy the beta, tausq, sigmasq and phi estimated params. pred.indices.gauss.big.nugget[1:n[j],j,i]<-subsamp create matrix to store indices kcg=krige.conv(parana.sub, locations=predgrid,krige=krige.control(obj.m=ml.exp)) save(params.gauss.bignugget,file="params.gauss.bignugget.dat") save(pred.indices.gauss.big.nugget,file="pred.indices.gauss.bignugget.dat") save(simulated.values.gauss.big.nugget,file="sim.vals.gauss.bignugget.dat") load(file="params.gauss.smallnugget.dat") load(file="pred.indices.gauss.smallnugget.dat") load(file="sim.vals.gauss.smallnugget.dat") load(file="params.gauss.bignugget.dat") load(file="pred.indices.gauss.bignugget.dat") load(file="sim.vals.gauss.bignugget.dat") Boxplot for fitting Gaussian model w/ big nugget to Gaussian GRF over 1000 simulations with varying sample sizes. boxplot(params.gauss.bignugget$all[,1],params.gauss.bignugget$eighty[,1],params.gauss.bignugget$sixty[,1],params.gauss.bignugget$fourty[,1],varwidth=t,notches=t,names=c("all"," abline(h=0,lty=2,col="grey") boxplot(params.gauss.bignugget$all[,2],params.gauss.bignugget$eighty[,2],params.gauss.bignugget$sixty[,2],params.gauss.bignugget$fourty[,2],varwidth=t,notches=t,names=c("all"," abline(h=0.4,lty=2,col="grey") boxplot(params.gauss.bignugget$all[,3],params.gauss.bignugget$eighty[,3],params.gauss.bignugget$sixty[,3],params.gauss.bignugget$fourty[,3],varwidth=t,notches=t,names=c("all"," abline(h=0.9,lty=2,col="grey") boxplot(params.gauss.bignugget$all[,4],params.gauss.bignugget$eighty[,4],params.gauss.bignugget$sixty[,4],params.gauss.bignugget$fourty[,4],varwidth=t,notches=t,names=c("all"," abline(h=50,lty=2,col="grey") Boxplot for fitting Gaussian model w/ small nugget to Gaussian GRF over 1000 simulations with varying sample sizes. 10
11 boxplot(params.gauss.smallnugget$all[,1],params.gauss.smallnugget$eighty[,1],params.gauss.smallnugget$sixty[,1],params.gauss.smallnugget$fourty[,1],varwidth=t,notches=t,names=c abline(h=0,lty=2,col="grey") boxplot(params.gauss.smallnugget$all[,2],params.gauss.smallnugget$eighty[,2],params.gauss.smallnugget$sixty[,2],params.gauss.smallnugget$fourty[,2],varwidth=t,notches=t,names=c abline(h=.1,lty=2,col="grey") boxplot(params.gauss.smallnugget$all[,3],params.gauss.smallnugget$eighty[,3],params.gauss.smallnugget$sixty[,3],params.gauss.smallnugget$fourty[,3],varwidth=t,notches=t,names=c abline(h=.9,lty=2,col="grey") boxplot(params.gauss.smallnugget$all[,4],params.gauss.smallnugget$eighty[,4],params.gauss.smallnugget$sixty[,4],params.gauss.smallnugget$fourty[,4],varwidth=t,notches=t,names=c abline(h=50,lty=2,col="grey") Combined boxplots par(omd=c(0.05,1,0,1)) add 0.05 offset to left margin to fit horizon. labels boxplot(params.gauss.bignugget$all[,1],params.gauss.smallnugget$all[,1],params.gauss.bignugget$eighty[,1],params.gauss.smallnugget$eighty[,1],params.gauss.bignugget$sixty[,1],p abline(v=0,lty=2,col="grey") par(omd=c(0.05,1,0,1)) add 0.05 offset to left margin to fit horizon. labels boxplot(params.gauss.bignugget$all[,2],params.gauss.smallnugget$all[,2],params.gauss.bignugget$eighty[,2],params.gauss.smallnugget$eighty[,2],params.gauss.bignugget$sixty[,2],p abline(v=0.1,lty=2,col="grey") abline(v=0.4,lty=3,col="blue") par(omd=c(0.05,1,0,1)) add 0.05 offset to left margin to fit horizon. labels boxplot(params.gauss.bignugget$all[,3],params.gauss.smallnugget$all[,3],params.gauss.bignugget$eighty[,3],params.gauss.smallnugget$eighty[,3],params.gauss.bignugget$sixty[,3],p abline(v=0.9,lty=2,col="grey") par(omd=c(0.05,1,0,1)) add 0.05 offset to left margin to fit horizon. labels boxplot(params.gauss.bignugget$all[,4],params.gauss.smallnugget$all[,4],params.gauss.bignugget$eighty[,4],params.gauss.smallnugget$eighty[,4],params.gauss.bignugget$sixty[,4],p abline(v=50,lty=2,col="grey") Code for Cross-validation results Predicted Values at sites left out of parameter optimization: library(geor) data(parana) N= n=c(143,114,86,57) crds=parana$coords (a) Try same covariance models expenential. load("/library/notes/stats/lab2/outputs/sim.vals.exp.dat") load("/library/notes/stats/lab2/outputs/params.exp.dat") load("/library/notes/stats/lab2/outputs/pred.indices.exp.dat") krg.values.exp.fit.exp<-array(0,c(143,3,n)) matrix to hold kriged values dta.exp=simulated.values.exp[,i] for(j in 2:4) smp.exp=pred.indices.exp[,j,i] if(j==2) prm.exp=params.exp$eighty[i,] else if(j==3) prm.exp=params.exp$sixty[i,] else if(j==4) prm.exp=params.exp$fourty[i,] Krige at parana coords for subsampled varogram fits dta.exp.sub=dta.exp[smp.exp] crds.sub=crds[smp.exp,] ep<- krige.conv(coords=crds.sub, data=dta.exp.sub, locations=crds, krige = krige.control(type.krige="ok",cov.model="exp",cov.par=c(prm.exp[3],prm.exp[4]),nug=prm.exp[2],)) krg.values.exp.fit.exp[,j-1,i]=ep$predict save(krg.values.exp.fit.exp,file="krg.values.exp.fit.exp.dat") (aii) Try same covariance models gausian. load("/library/notes/stats/lab2/outputs/sim.vals.gauss.dat") load("/library/notes/stats/lab2/outputs/params.gauss.dat") load("/library/notes/stats/lab2/outputs/pred.indices.gauss.dat") krg.values.gauss.fit.gauss<-array(0,c(143,3,n)) matrix to hold kriged values dta.gauss=simulated.values.gauss[,i] for(j in 2:4) smp.gauss=pred.indices.gauss[,j,i] if(j==2) prm.gauss=params.gauss$eighty[i,] else if(j==3) prm.gauss=params.gauss$sixty[i,] else if(j==4) prm.gauss=params.gauss$fourty[i,] Krige at parana coords for subsampled varogram fits 11
12 dta.gauss.sub=dta.gauss[smp.gauss] crds.sub=crds[smp.gauss,] gp<- krige.conv(coords=crds.sub, data=dta.gauss.sub, locations=crds, krige = krige.control(type.krige="ok",cov.model="gaussian",cov.par=c(prm.gauss[3],prm.gauss[4]),nug=prm.gauss[2],)) krg.values.gauss.fit.gauss[,j-1,i]=gp$predict save(krg.values.gauss.fit.gauss,file="krg.values.gauss.fit.gauss.dat") (b) Now fit the _wrong_ covariance model. We ll do gaussian and exponential. load("/library/notes/stats/lab2/outputs/sim.vals.exp.fit.gauss.dat") load("/library/notes/stats/lab2/outputs/params.exp.fit.gauss.dat") load("/library/notes/stats/lab2/outputs/pred.indices.exp.fit.gauss.dat") krg.values.exp.fit.gauss<-array(0,c(143,3,n)) matrix to hold kriged values dta.exp.fit.gauss=simulated.values.exp.fit.gauss[,i] for(j in 2:4) smp.exp.fit.gauss=pred.indices.exp.fit.gauss[,j,i] if(j==2) prm.exp.fit.gauss=params.exp.fit.gauss$eighty[i,] else if(j==3) prm.exp.fit.gauss=params.exp.fit.gauss$sixty[i,] else if(j==4) prm.exp.fit.gauss=params.exp.fit.gauss$fourty[i,] Krige at parana coords for subsampled varogram fits dta.exp.fit.gauss.sub=dta.exp.fit.gauss[smp.exp.fit.gauss] crds.sub=crds[smp.exp.fit.gauss,] epg<- krige.conv(coords=crds.sub, data=dta.exp.fit.gauss.sub, locations=crds, krige = krige.control(type.krige="ok",cov.model="gaussian",cov.par=c(prm.exp.fit.gauss[3],prm.exp.fit.gauss[4]),nug=prm.exp.fit.gauss[ 2],)) krg.values.exp.fit.gauss[,j-1,i]=epg$predict save(krg.values.exp.fit.gauss,file="krg.values.exp.fit.gauss.dat") Simulation using gaussian model BUT fit exponential load("/library/notes/stats/lab2/outputs/sim.vals.gauss.fit.exp.dat") load("/library/notes/stats/lab2/outputs/params.gauss.fit.exp.dat") load("/library/notes/stats/lab2/outputs/pred.indices.gauss.fit.exp.dat") krg.values.gauss.fit.exp<-array(0,c(143,3,n)) matrix to hold kriged values dta.gauss.fit.exp=simulated.values.gauss.fit.exp[,i] for(j in 2:4) smp.gauss.fit.exp=pred.indices.gauss.fit.exp[,j,i] if(j==2) prm.gauss.fit.exp=params.gauss.fit.exp$eighty[i,] else if(j==3) prm.gauss.fit.exp=params.gauss.fit.exp$sixty[i,] else if(j==4) prm.gauss.fit.exp=params.gauss.fit.exp$fourty[i,] Krige at parana coords for subsampled varogram fits dta.gauss.fit.exp.sub=dta.gauss.fit.exp[smp.gauss.fit.exp] crds.sub=crds[smp.gauss.fit.exp,] epg<- krige.conv(coords=crds.sub, data=dta.gauss.fit.exp.sub, locations=crds, krige = krige.control(type.krige="ok",cov.model="exp",cov.par=c(prm.gauss.fit.exp[3],prm.gauss.fit.exp[4]),nug=prm.gauss.fit.exp[2],)) krg.values.gauss.fit.exp[,j-1,i]=epg$predict save(krg.values.gauss.fit.exp,file="krg.values.gauss.fit.exp.dat") (c) Now play with the nugget, we ll use gaussian model load("/library/notes/stats/lab2/outputs/sim.vals.gauss.smallnugget.dat") load("/library/notes/stats/lab2/outputs/params.gauss.smallnugget.dat") load("/library/notes/stats/lab2/outputs/pred.indices.gauss.smallnugget.dat") krg.values.gauss.smallnugget<-array(0,c(143,3,n)) matrix to hold kriged values dta.gauss.small.nugget=simulated.values.gauss.small.nugget[,i] for(j in 2:4) 12
13 smp.gauss.small.nugget=pred.indices.gauss.small.nugget[,j,i] if(j==2) prm.gauss.small.nugget=params.gauss.smallnugget$eighty[i,] else if(j==3) prm.gauss.small.nugget=params.gauss.smallnugget$sixty[i,] else if(j==4) prm.gauss.small.nugget=params.gauss.smallnugget$fourty[i,] Krige at parana coords for subsampled varogram fits dta.gauss.small.nugget.sub=dta.gauss.small.nugget[smp.gauss.small.nugget] crds.sub=crds[smp.gauss.small.nugget,] gsn<- krige.conv(coords=crds.sub, data=dta.gauss.small.nugget.sub, locations=crds, krige = krige.control(type.krige="ok",cov.model="gaussian",cov.par=c(prm.gauss.small.nugget[3],prm.gauss.small.nugget[4]),nug=prm. gauss.small.nugget[2],)) krg.values.gauss.smallnugget[,j-1,i]=gsn$predict save(krg.values.gauss.smallnugget,file="krg.values.gauss.smallnugget.dat") Simulation using gaussian model, fit with big nugget load("/library/notes/stats/lab2/outputs/sim.vals.gauss.bignugget.dat") load("/library/notes/stats/lab2/outputs/params.gauss.bignugget.dat") load("/library/notes/stats/lab2/outputs/pred.indices.gauss.bignugget.dat") krg.values.gauss.bignugget<-array(0,c(143,3,n)) matrix to hold kriged values dta.gauss.big.nugget=simulated.values.gauss.big.nugget[,i] for(j in 2:4) smp.gauss.big.nugget=pred.indices.gauss.big.nugget[,j,i] if(j==2) prm.gauss.big.nugget=params.gauss.bignugget$eighty[i,] else if(j==3) prm.gauss.big.nugget=params.gauss.bignugget$sixty[i,] else if(j==4) prm.gauss.big.nugget=params.gauss.bignugget$fourty[i,] Krige at parana coords for subsampled varogram fits dta.gauss.big.nugget.sub=dta.gauss.big.nugget[smp.gauss.big.nugget] crds.sub=crds[smp.gauss.big.nugget,] gbn<- krige.conv(coords=crds.sub, data=dta.gauss.big.nugget.sub, locations=crds, krige = krige.control(type.krige="ok",cov.model="gaussian",cov.par=c(prm.gauss.big.nugget[3],prm.gauss.big.nugget[4]),nug=prm.gaus s.big.nugget[2],)) krg.values.gauss.bignugget[,j-1,i]=gbn$predict save(krg.values.gauss.bignugget,file="krg.values.gauss.bignugget.dat") Finding difference between known site values and estimated: N<-1000 load("krg.values.exp.fit.exp.dat") load("sim.vals.exp.dat") load("krg.values.gauss.fit.gauss.dat") load("sim.vals.gauss.dat") load("sim.vals.exp.fit.gauss.dat") load("sim.vals.gauss.fit.exp.dat") load("sim.vals.gauss.bignugget.dat") load("sim.vals.gauss.smallnugget.dat") load("krg.values.exp.fit.gauss.dat") load("krg.values.gauss.fit.exp.dat") load("krg.values.gauss.bignugget.dat") load("krg.values.gauss.smallnugget.dat") pred.values<-krg.values.exp.fit.exp sim.values.array<-array(0,c(143,3,n)) sim.values.array[,1,]<-simulated.values.exp sim.values.array[,2,]<-simulated.values.exp sim.values.array[,3,]<-simulated.values.exp pred.dev.array<-(pred.values-sim.values.array) pred.sq.dev<-pred.dev.array^2 pred.sum.sq.dev<-apply(pred.sq.dev,c(2,3),sum) mean.pred.error.exp<-pred.sum.sq.dev/c(29,57,86) cross.valid.exp<-apply(mean.pred.error.exp,1,mean) pred.values<-krg.values.gauss.fit.gauss sim.values.array<-array(0,c(143,3,n)) sim.values.array[,1,]<-simulated.values.gauss sim.values.array[,2,]<-simulated.values.gauss sim.values.array[,3,]<-simulated.values.gauss pred.dev.array<-(pred.values-sim.values.array) pred.sq.dev<-pred.dev.array^2 pred.sum.sq.dev<-apply(pred.sq.dev,c(2,3),sum) mean.pred.error.gauss<-pred.sum.sq.dev/c(29,57,86) cross.valid.gauss<-apply(mean.pred.error.gauss,1,mean) pred.values<-krg.values.gauss.bignugget sim.values.array<-array(0,c(143,3,n)) sim.values.array[,1,]<-simulated.values.gauss.big.nugget sim.values.array[,2,]<-simulated.values.gauss.big.nugget sim.values.array[,3,]<-simulated.values.gauss.big.nugget 13
14 pred.dev.array<-(pred.values-sim.values.array) pred.sq.dev<-pred.dev.array^2 pred.sum.sq.dev<-apply(pred.sq.dev,c(2,3),sum) mean.pred.error.gauss.bignugget<-pred.sum.sq.dev/c(29,57,86) cross.valid.gauss.bignugget<-apply(mean.pred.error.gauss.bignugget,1,mean) pred.values<-krg.values.gauss.smallnugget sim.values.array<-array(0,c(143,3,n)) sim.values.array[,1,]<-simulated.values.gauss.small.nugget sim.values.array[,2,]<-simulated.values.gauss.small.nugget sim.values.array[,3,]<-simulated.values.gauss.small.nugget pred.dev.array<-(pred.values-sim.values.array) pred.sq.dev<-pred.dev.array^2 pred.sum.sq.dev<-apply(pred.sq.dev,c(2,3),sum) mean.pred.error.gauss.smallnugget<-pred.sum.sq.dev/c(29,57,86) cross.valid.gauss.smallnugget<-apply(mean.pred.error.gauss.smallnugget,1,mean) pred.values<-krg.values.exp.fit.gauss sim.values.array<-array(0,c(143,3,n)) sim.values.array[,1,]<-simulated.values.exp.fit.gauss sim.values.array[,2,]<-simulated.values.exp.fit.gauss sim.values.array[,3,]<-simulated.values.exp.fit.gauss pred.dev.array<-(pred.values-sim.values.array) pred.sq.dev<-pred.dev.array^2 pred.sum.sq.dev<-apply(pred.sq.dev,c(2,3),sum) mean.pred.error.exp.fit.gauss<-pred.sum.sq.dev/c(29,57,86) cross.valid.exp.fit.gauss<-apply(mean.pred.error.exp.fit.gauss,1,mean) pred.values<-krg.values.gauss.fit.exp sim.values.array<-array(0,c(143,3,n)) sim.values.array[,1,]<-simulated.values.gauss.fit.exp sim.values.array[,2,]<-simulated.values.gauss.fit.exp sim.values.array[,3,]<-simulated.values.gauss.fit.exp pred.dev.array<-(pred.values-sim.values.array) pred.sq.dev<-pred.dev.array^2 pred.sum.sq.dev<-apply(pred.sq.dev,c(2,3),sum) mean.pred.error.gauss.fit.exp<-pred.sum.sq.dev/c(29,57,86) cross.valid.gauss.fit.exp<-apply(mean.pred.error.gauss.fit.exp,1,mean) 14
Applied Statistics : Practical 11
Applied Statistics : Practical 11 This practical will introduce basic tools for geostatistics in R. You may need first to install and load a few packages. The packages sp and lattice contain useful function
More informationKriging Peter Claussen 9/5/2017
Kriging Peter Claussen 9/5/2017 Libraries automap : Automatic interpolation package library(automap) ## Loading required package: sp ## Warning: package 'sp' was built under R version 3.3.2 library(gstat)
More informationSD 372 Pattern Recognition
SD 372 Pattern Recognition Lab 2: Model Estimation and Discriminant Functions 1 Purpose This lab examines the areas of statistical model estimation and classifier aggregation. Model estimation will be
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationMissing Data Analysis for the Employee Dataset
Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationFurther Simulation Results on Resampling Confidence Intervals for Empirical Variograms
University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Information Sciences 2010 Further Simulation Results on Resampling Confidence
More informationRecap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach
Truth Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 2.04.205 Discriminative Approaches (5 weeks)
More informationPackage geofd. R topics documented: October 5, Version 1.0
Version 1.0 Package geofd October 5, 2015 Author Ramon Giraldo , Pedro Delicado , Jorge Mateu Maintainer Pedro Delicado
More informationCPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017
CPSC 340: Machine Learning and Data Mining Kernel Trick Fall 2017 Admin Assignment 3: Due Friday. Midterm: Can view your exam during instructor office hours or after class this week. Digression: the other
More informationPackage spnngp. July 22, 2017
Package spnngp July 22, 2017 Title Spatial Regression Models for Large Datasets using Nearest Neighbor Gaussian Processes Version 0.1.1 Date 2017-07-14 Maintainer Andrew Finley Author
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14
More informationMachine Learning Lecture 3
Machine Learning Lecture 3 Probability Density Estimation II 19.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Exam dates We re in the process
More informationMissing Data Analysis for the Employee Dataset
Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1
More informationSupplementary Figure 1. Decoding results broken down for different ROIs
Supplementary Figure 1 Decoding results broken down for different ROIs Decoding results for areas V1, V2, V3, and V1 V3 combined. (a) Decoded and presented orientations are strongly correlated in areas
More informationRobust Shape Retrieval Using Maximum Likelihood Theory
Robust Shape Retrieval Using Maximum Likelihood Theory Naif Alajlan 1, Paul Fieguth 2, and Mohamed Kamel 1 1 PAMI Lab, E & CE Dept., UW, Waterloo, ON, N2L 3G1, Canada. naif, mkamel@pami.uwaterloo.ca 2
More informationGas Distribution Modeling Using Sparse Gaussian Process Mixture Models
Gas Distribution Modeling Using Sparse Gaussian Process Mixture Models Cyrill Stachniss, Christian Plagemann, Achim Lilienthal, Wolfram Burgard University of Freiburg, Germany & Örebro University, Sweden
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser
More informationIllustration of Adaptive MCMC
Illustration of Adaptive MCMC Andrew O. Finley and Sudipto Banerjee September 5, 2014 1 Overview The adaptmetropgibb function in spbayes generates MCMC samples for a continuous random vector using an adaptive
More informationSection 2.3: Simple Linear Regression: Predictions and Inference
Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple
More informationInstance-based Learning
Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin
More informationRecent advances in Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will
Lectures Recent advances in Metamodel of Optimal Prognosis Thomas Most & Johannes Will presented at the Weimar Optimization and Stochastic Days 2010 Source: www.dynardo.de/en/library Recent advances in
More informationMachine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves
Machine Learning A 708.064 11W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence I [2 P] a) [1 P] Give an example for a probability distribution P (A, B, C) that disproves
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationClustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford
Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically
More informationNon Stationary Variograms Based on Continuously Varying Weights
Non Stationary Variograms Based on Continuously Varying Weights David F. Machuca-Mory and Clayton V. Deutsch Centre for Computational Geostatistics Department of Civil & Environmental Engineering University
More informationCPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017
CPSC 340: Machine Learning and Data Mining More Linear Classifiers Fall 2017 Admin Assignment 3: Due Friday of next week. Midterm: Can view your exam during instructor office hours next week, or after
More informationPackage GauPro. September 11, 2017
Type Package Title Gaussian Process Fitting Version 0.2.2 Author Collin Erickson Package GauPro September 11, 2017 Maintainer Collin Erickson Fits a Gaussian process model to
More informationPerceptron as a graph
Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2
More informationWatershed Sciences 4930 & 6920 GEOGRAPHIC INFORMATION SYSTEMS
HOUSEKEEPING Watershed Sciences 4930 & 6920 GEOGRAPHIC INFORMATION SYSTEMS Quizzes Lab 8? WEEK EIGHT Lecture INTERPOLATION & SPATIAL ESTIMATION Joe Wheaton READING FOR TODAY WHAT CAN WE COLLECT AT POINTS?
More informationGeostatistical Reservoir Characterization of McMurray Formation by 2-D Modeling
Geostatistical Reservoir Characterization of McMurray Formation by 2-D Modeling Weishan Ren, Oy Leuangthong and Clayton V. Deutsch Department of Civil & Environmental Engineering, University of Alberta
More informationUniversity of California, Los Angeles Department of Statistics
University of California, Los Angeles Department of Statistics Statistics C173/C273 Instructor: Nicolas Christou Ordinary kriging using geor and gstat In this document we will discuss kriging using the
More informationBayesian model selection and diagnostics
Bayesian model selection and diagnostics A typical Bayesian analysis compares a handful of models. Example 1: Consider the spline model for the motorcycle data, how many basis functions? Example 2: Consider
More informationFeature Selection. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 262
Feature Selection Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester 2016 239 / 262 What is Feature Selection? Department Biosysteme Karsten Borgwardt Data Mining Course Basel
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationMultiple imputation using chained equations: Issues and guidance for practice
Multiple imputation using chained equations: Issues and guidance for practice Ian R. White, Patrick Royston and Angela M. Wood http://onlinelibrary.wiley.com/doi/10.1002/sim.4067/full By Gabrielle Simoneau
More information08 An Introduction to Dense Continuous Robotic Mapping
NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy
More informationCross-validation. Cross-validation is a resampling method.
Cross-validation Cross-validation is a resampling method. It refits a model of interest to samples formed from the training set, in order to obtain additional information about the fitted model. For example,
More informationRemote Sensing & Photogrammetry W4. Beata Hejmanowska Building C4, room 212, phone:
Remote Sensing & Photogrammetry W4 Beata Hejmanowska Building C4, room 212, phone: +4812 617 22 72 605 061 510 galia@agh.edu.pl 1 General procedures in image classification Conventional multispectral classification
More informationMCMC Methods for data modeling
MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering Introduction 1. Symposium on Data Modelling 2. Outline: a. Definition and uses of MCMC b. MCMC algorithms
More informationInstance-based Learning
Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 15 th, 2007 2005-2007 Carlos Guestrin 1 1-Nearest Neighbor Four things make a memory based learner:
More informationCONDITIONAL SIMULATION OF TRUNCATED RANDOM FIELDS USING GRADIENT METHODS
CONDITIONAL SIMULATION OF TRUNCATED RANDOM FIELDS USING GRADIENT METHODS Introduction Ning Liu and Dean S. Oliver University of Oklahoma, Norman, Oklahoma, USA; ning@ou.edu The problem of estimating the
More informationRobotics. Lecture 5: Monte Carlo Localisation. See course website for up to date information.
Robotics Lecture 5: Monte Carlo Localisation See course website http://www.doc.ic.ac.uk/~ajd/robotics/ for up to date information. Andrew Davison Department of Computing Imperial College London Review:
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More informationDirect Matrix Factorization and Alignment Refinement: Application to Defect Detection
Direct Matrix Factorization and Alignment Refinement: Application to Defect Detection Zhen Qin (University of California, Riverside) Peter van Beek & Xu Chen (SHARP Labs of America, Camas, WA) 2015/8/30
More informationModel selection and validation 1: Cross-validation
Model selection and validation 1: Cross-validation Ryan Tibshirani Data Mining: 36-462/36-662 March 26 2013 Optional reading: ISL 2.2, 5.1, ESL 7.4, 7.10 1 Reminder: modern regression techniques Over the
More informationVariogram Inversion and Uncertainty Using Dynamic Data. Simultaneouos Inversion with Variogram Updating
Variogram Inversion and Uncertainty Using Dynamic Data Z. A. Reza (zreza@ualberta.ca) and C. V. Deutsch (cdeutsch@civil.ualberta.ca) Department of Civil & Environmental Engineering, University of Alberta
More informationPREDICTION AND CALIBRATION USING OUTPUTS FROM MULTIPLE COMPUTER SIMULATORS
PREDICTION AND CALIBRATION USING OUTPUTS FROM MULTIPLE COMPUTER SIMULATORS by Joslin Goh M.Sc., Simon Fraser University, 29 B.Math., University of Waterloo, 27 a thesis submitted in partial fulfillment
More informationMini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class
Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class Guidelines Submission. Submit a hardcopy of the report containing all the figures and printouts of code in class. For readability
More informationA Geostatistical and Flow Simulation Study on a Real Training Image
A Geostatistical and Flow Simulation Study on a Real Training Image Weishan Ren (wren@ualberta.ca) Department of Civil & Environmental Engineering, University of Alberta Abstract A 12 cm by 18 cm slab
More informationWarped Mixture Models
Warped Mixture Models Tomoharu Iwata, David Duvenaud, Zoubin Ghahramani Cambridge University Computational and Biological Learning Lab March 11, 2013 OUTLINE Motivation Gaussian Process Latent Variable
More informationModeling Uncertainty in the Earth Sciences Jef Caers Stanford University
Modeling spatial continuity Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University Motivation uncertain uncertain certain or uncertain uncertain Spatial Input parameters Spatial Stochastic
More informationBART STAT8810, Fall 2017
BART STAT8810, Fall 2017 M.T. Pratola November 1, 2017 Today BART: Bayesian Additive Regression Trees BART: Bayesian Additive Regression Trees Additive model generalizes the single-tree regression model:
More informationEmpirical Comparisons of Fast Methods
Empirical Comparisons of Fast Methods Dustin Lang and Mike Klaas {dalang, klaas}@cs.ubc.ca University of British Columbia December 17, 2004 Fast N-Body Learning - Empirical Comparisons p. 1 Sum Kernel
More informationA Comparison of Spatial Prediction Techniques Using Both Hard and Soft Data
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Dissertations and Theses in Statistics Statistics, Department of 5-2011 A Comparison of Spatial Prediction Techniques Using
More informationA spatio-temporal model for extreme precipitation simulated by a climate model.
A spatio-temporal model for extreme precipitation simulated by a climate model. Jonathan Jalbert Joint work with Anne-Catherine Favre, Claude Bélisle and Jean-François Angers STATMOS Workshop: Climate
More informationInference and Representation
Inference and Representation Rachel Hodos New York University Lecture 5, October 6, 2015 Rachel Hodos Lecture 5: Inference and Representation Today: Learning with hidden variables Outline: Unsupervised
More informationClustering web search results
Clustering K-means Machine Learning CSE546 Emily Fox University of Washington November 4, 2013 1 Clustering images Set of Images [Goldberger et al.] 2 1 Clustering web search results 3 Some Data 4 2 K-means
More informationUniversity of California, Los Angeles Department of Statistics
Statistics C173/C273 University of California, Los Angeles Department of Statistics Instructor: Nicolas Christou Computing the variogram using the geor package in R Spatial statistics computations can
More informationSpatial Interpolation & Geostatistics
(Z i Z j ) 2 / 2 Spatial Interpolation & Geostatistics Lag Lag Mean Distance between pairs of points 11/3/2016 GEO327G/386G, UT Austin 1 Tobler s Law All places are related, but nearby places are related
More informationLecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM)
School of Computer Science Probabilistic Graphical Models Structured Sparse Additive Models Junming Yin and Eric Xing Lecture 7, April 4, 013 Reading: See class website 1 Outline Nonparametric regression
More informationDirect Sequential Co-simulation with Joint Probability Distributions
Math Geosci (2010) 42: 269 292 DOI 10.1007/s11004-010-9265-x Direct Sequential Co-simulation with Joint Probability Distributions Ana Horta Amílcar Soares Received: 13 May 2009 / Accepted: 3 January 2010
More informationMachine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm
Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 17 Review 1 / 17 Decision Tree: Making a
More informationBayesian Background Estimation
Bayesian Background Estimation mum knot spacing was determined by the width of the signal structure that one wishes to exclude from the background curve. This paper extends the earlier work in two important
More informationSmoothing Dissimilarities for Cluster Analysis: Binary Data and Functional Data
Smoothing Dissimilarities for Cluster Analysis: Binary Data and unctional Data David B. University of South Carolina Department of Statistics Joint work with Zhimin Chen University of South Carolina Current
More informationSurface Smoothing Using Kriging
1 AutoCAD Civil 3D has the ability to add data points to a surface based on mathematical criteria. This gives you the ability to strengthen the surface definition in areas where data may be sparse or where
More informationSoftware Tutorial Session Universal Kriging
Software Tutorial Session Universal Kriging The example session with PG2000 which is described in this and Part 1 is intended as an example run to familiarise the user with the package. This documented
More informationWhat is the SLAM problem?
SLAM Tutorial Slides by Marios Xanthidis, C. Stachniss, P. Allen, C. Fermuller Paul Furgale, Margarita Chli, Marco Hutter, Martin Rufli, Davide Scaramuzza, Roland Siegwart What is the SLAM problem? The
More informationarxiv: v2 [stat.ml] 5 Nov 2018
Kernel Distillation for Fast Gaussian Processes Prediction arxiv:1801.10273v2 [stat.ml] 5 Nov 2018 Congzheng Song Cornell Tech cs2296@cornell.edu Abstract Yiming Sun Cornell University ys784@cornell.edu
More informationMarkov chain Monte Carlo methods
Markov chain Monte Carlo methods (supplementary material) see also the applet http://www.lbreyer.com/classic.html February 9 6 Independent Hastings Metropolis Sampler Outline Independent Hastings Metropolis
More informationMethods to define confidence intervals for kriged values: Application on Precision Viticulture data.
Methods to define confidence intervals for kriged values: Application on Precision Viticulture data. J-N. Paoli 1, B. Tisseyre 1, O. Strauss 2, J-M. Roger 3 1 Agricultural Engineering University of Montpellier,
More informationHierarchical Modeling for Large non-gaussian Datasets in R
Hierarchical Modeling for Large non-gaussian Datasets in R Andrew O. Finley and Sudipto Banerjee March 6, 2013 1 Data preparation and initial exploration We make use of several libraries in the following
More informationMachine Learning A WS15/16 1sst KU Version: January 11, b) [1 P] For the probability distribution P (A, B, C, D) with the factorization
Machine Learning A 708.064 WS15/16 1sst KU Version: January 11, 2016 Exercises Problems marked with * are optional. 1 Conditional Independence I [3 P] a) [1 P] For the probability distribution P (A, B,
More informationMachine Learning Lecture 3
Many slides adapted from B. Schiele Machine Learning Lecture 3 Probability Density Estimation II 26.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course
More informationgruvi Edge Aware Anisotropic Diffusion for 3D Scalar Data graphics + usability + visualization Simon Fraser University Burnaby, BC Canada.
gruvi graphics + usability + visualization Edge Aware Anisotropic Diffusion for 3D Scalar Data Zahid Hossain Torsten Möller Simon Fraser University Burnaby, BC Canada. Motivation Original Sheep s heart
More informationSpatial Interpolation & Geostatistics
(Z i Z j ) 2 / 2 Spatial Interpolation & Geostatistics Lag Lag Mean Distance between pairs of points 1 Tobler s Law All places are related, but nearby places are related more than distant places Corollary:
More informationTree-GP: A Scalable Bayesian Global Numerical Optimization algorithm
Utrecht University Department of Information and Computing Sciences Tree-GP: A Scalable Bayesian Global Numerical Optimization algorithm February 2015 Author Gerben van Veenendaal ICA-3470792 Supervisor
More informationMachine Learning Lecture 3
Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 26.04.206 Discriminative Approaches (5 weeks) Linear
More informationLocal spatial-predictor selection
University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Information Sciences 2013 Local spatial-predictor selection Jonathan
More informationJoint quantification of uncertainty on spatial and non-spatial reservoir parameters
Joint quantification of uncertainty on spatial and non-spatial reservoir parameters Comparison between the Method and Distance Kernel Method Céline Scheidt and Jef Caers Stanford Center for Reservoir Forecasting,
More informationInf2B assignment 2. Natural images classification. Hiroshi Shimodaira and Pol Moreno. Submission due: 4pm, Wednesday 30 March 2016.
Inf2B assignment 2 (Ver. 1.2) Natural images classification Submission due: 4pm, Wednesday 30 March 2016 Hiroshi Shimodaira and Pol Moreno This assignment is out of 100 marks and forms 12.5% of your final
More informationhumor... May 3, / 56
humor... May 3, 2017 1 / 56 Power As discussed previously, power is the probability of rejecting the null hypothesis when the null is false. Power depends on the effect size (how far from the truth the
More informationSpatial Interpolation - Geostatistics 4/3/2018
Spatial Interpolation - Geostatistics 4/3/201 (Z i Z j ) 2 / 2 Spatial Interpolation & Geostatistics Lag Distance between pairs of points Lag Mean Tobler s Law All places are related, but nearby places
More informationMachine Learning A W 1sst KU. b) [1 P] For the probability distribution P (A, B, C, D) with the factorization
Machine Learning A 708.064 13W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence a) [1 P] For the probability distribution P (A, B, C, D) with the factorization P (A, B,
More informationVisual object classification by sparse convolutional neural networks
Visual object classification by sparse convolutional neural networks Alexander Gepperth 1 1- Ruhr-Universität Bochum - Institute for Neural Dynamics Universitätsstraße 150, 44801 Bochum - Germany Abstract.
More informationPerformance Evaluation
Performance Evaluation Dan Lizotte 7-9-5 Evaluating Performance..5..5..5..5 Which do ou prefer and wh? Evaluating Performance..5..5 Which do ou prefer and wh?..5..5 Evaluating Performance..5..5..5..5 Performance
More informationParticle Filter for Robot Localization ECE 478 Homework #1
Particle Filter for Robot Localization ECE 478 Homework #1 Phil Lamb pjl@pdx.edu November 15, 2012 1 Contents 1 Introduction 3 2 Implementation 3 2.1 Assumptions and Simplifications.............................
More informationMarkov Random Fields and Segmentation with Graph Cuts
Markov Random Fields and Segmentation with Graph Cuts Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem Administrative stuffs Final project Proposal due Oct 27 (Thursday) HW 4 is out
More informationChapter 6: Examples 6.A Introduction
Chapter 6: Examples 6.A Introduction In Chapter 4, several approaches to the dual model regression problem were described and Chapter 5 provided expressions enabling one to compute the MSE of the mean
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationSelected Implementation Issues with Sequential Gaussian Simulation
Selected Implementation Issues with Sequential Gaussian Simulation Abstract Stefan Zanon (szanon@ualberta.ca) and Oy Leuangthong (oy@ualberta.ca) Department of Civil & Environmental Engineering University
More informationMore on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization
More on Learning Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization Neural Net Learning Motivated by studies of the brain. A network of artificial
More informationPassive Differential Matched-field Depth Estimation of Moving Acoustic Sources
Lincoln Laboratory ASAP-2001 Workshop Passive Differential Matched-field Depth Estimation of Moving Acoustic Sources Shawn Kraut and Jeffrey Krolik Duke University Department of Electrical and Computer
More informationPackage gpr. February 20, 2015
Package gpr February 20, 2015 Version 1.1 Date 2013-08-27 Title A Minimalistic package to apply Gaussian Process in R License GPL-3 Author Maintainer ORPHANED Depends R (>= 2.13) This package provides
More informationChapter 15 Mixed Models. Chapter Table of Contents. Introduction Split Plot Experiment Clustered Data References...
Chapter 15 Mixed Models Chapter Table of Contents Introduction...309 Split Plot Experiment...311 Clustered Data...320 References...326 308 Chapter 15. Mixed Models Chapter 15 Mixed Models Introduction
More informationPackage pnmtrem. February 20, Index 9
Type Package Package pnmtrem February 20, 2015 Title Probit-Normal Marginalized Transition Random Effects Models Version 1.3 Date 2013-05-19 Author Ozgur Asar, Ozlem Ilk Depends MASS Maintainer Ozgur Asar
More informationPackage gwrr. February 20, 2015
Type Package Package gwrr February 20, 2015 Title Fits geographically weighted regression models with diagnostic tools Version 0.2-1 Date 2013-06-11 Author David Wheeler Maintainer David Wheeler
More informationCSE152 Introduction to Computer Vision Assignment 3 (SP15) Instructor: Ben Ochoa Maximum Points : 85 Deadline : 11:59 p.m., Friday, 29-May-2015
Instructions: CSE15 Introduction to Computer Vision Assignment 3 (SP15) Instructor: Ben Ochoa Maximum Points : 85 Deadline : 11:59 p.m., Friday, 9-May-015 This assignment should be solved, and written
More informationClustering Lecture 5: Mixture Model
Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics
More information