Hierarchical Modelling for Large Spatial Datasets

Size: px
Start display at page:

Download "Hierarchical Modelling for Large Spatial Datasets"

Transcription

1 Hierarchical Modelling for Large Spatial Datasets Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. July 2,

2 Big N problem Introduction The Big n issue Univariate spatial regression Y = Xβ + w + ɛ, Estimation involves (σ 2 R(φ) + τ 2 I) 1, which is n n. Matrix computations occur in each MCMC iteration. Known as the Big-N problem in geostatistics. Approach: Use a model Y = Xβ + Zw + ɛ. But what Z? 2 TIES 2009 Hierarchical Modeling and Analysis

3 Big N problem Predictive Process Models Consider knots S = {s 1,..., s n } with n << n. Let w = {w(s i )}n i=1 Z(θ) = {cov(w(s i ), w(s j ))} {var(w )} 1 is n n. Predictive process regression model Y = Xβ + Z(θ)w + ɛ, Fitting requires only n n matrix computations (n << n). Key attraction: The above arises as a process model: w(s) GP (0, σ 2 w ρ( ; φ)) instead of w(s). ρ(s 1, s 2 ; φ) = cov(w(s 1 ), w )var(w ) 1 cov(w, w(s 2 )) 3 TIES 2009 Hierarchical Modeling and Analysis

4 Big N problem Selection of knots Knots: A Knotty problem?? Knot selection: Regular grid? More knots near locations we have sampled more? Formal spatial design paradigm: maximize information metrics (Zhu and Stein, 2006; Diggle & Lophaven, 2006) Geometric considerations: space-filling designs (Royle & Nychka, 1998); various clustering algorithms Compare performance of estimation of range and smoothness by varying knot size. Stein (2007, 2008): method may not work for fine-scale spatial data Still a popular choice seamlessly adapts to multivariate and spatiotemporal settings. 4 TIES 2009 Hierarchical Modeling and Analysis

5 Big N problem Selection of knots tauˆ knots 5 TIES 2009 Hierarchical Modeling and Analysis

6 Big N problem Comparisons: Unrectified VS Rectified A rectified predictive process is defined as ɛ(s) indep w ɛ (s) = w(s) + ɛ(s), where N(0, σ 2 w(1 r(s, φ) R 1 (φ)r(s, φ))). Maximum likelihood estimates of τ 2 : # of Knots Predictive Process Rectified Predictive Process exact TIES 2009 Hierarchical Modeling and Analysis

7 from: Banerjee, S., A.O. Finley, P. Waldmann, and T. Ericsson. (2009). Hierarchical multivariate modeling of large spatially referenced genetic trial datasets. Journal of the American Statistical Association. Finley, A.O., S. Banerjee, P. Waldmann, and T. Ericsson. (2008) Hierarchical spatial modeling of additive and dominance genetic variance for large spatial trial datasets. Biometrics. DOI: /j x 7 TIES 2009 Hierarchical Modeling and Analysis

8 Modeling genetic variation in Scots pine (Pinus sylvestris L.), long-term progeny study in northern Sweden. Quantitative genetics: studies the inheritance of polygenic traits, focusing upon estimation of additive genetic variance, σa, 2 and the heritability h 2 = σa/σ 2 T 2 ot, where the σ2 T ot represents the total genetic and unexplained variation. A high heritability, h 2, should result in a larger selection response (i.e., a higher probability for genetic gain in future generations). 8 TIES 2009 Hierarchical Modeling and Analysis

9 Observed trees Data overview: established in 1971 (by Skogforsk) partial diallel design of 52 parent trees Observed height 8,160 planted randomly on 2.2m squares 1997 reinventory of 4,970 surviving trees, height, DBH, branch angle, etc. 9 TIES 2009 Hierarchical Modeling and Analysis

10 Genetic effects model: Y i = x T i β + a i + d i + ɛ i, a = [a i ] n i=1 MV N(0, σ2 aa) d = [d i ] n i=1 MV N(0, σ2 d D) ɛ = [ɛ i ] n i=1 N(0, τ 2 I n ) A and D are fixed relationship matrices (See e.g., Henderson, 1985; Lynch and Walsh,1998) Note, genetic variance is further partitioned into additive and the non-additive dominance component σ 2 d 10 TIES 2009 Hierarchical Modeling and Analysis

11 Genetic effects model: Y i = x T i β + a i + d i + ɛ i, Common feature is systematic heterogeneity among observational units (i.e., violation of ɛ N(0, τ 2 I n )) Spatial heterogeneity arises from: soil characteristics micro-climates light availability Residual correlation among units as a function of distance and/or direction = erroneous parameter estimates (e.g., biased h 2 ) 11 TIES 2009 Hierarchical Modeling and Analysis

12 Genetic model results Parameter credible intervals, 50% (2.5%, 97.5%) for the non-spatial models Scots pine trial. Non-spatial Parameter Add. Add. Dom. β (69.66, 75.08) (70.04, 74.57) σa (18.30, 49.85) (14.12, 43.96) σd (11.24, 40.11) τ (121.18, ) (100.51, ) h (0.12, 0.28) 0.15 (0.09, 0.26) 12 TIES 2009 Hierarchical Modeling and Analysis

13 Genetic model results, cont d. Residual surface Residual semivariogram So, ɛ N(0, τ 2 I n ). Consider a spatial model. 13 TIES 2009 Hierarchical Modeling and Analysis

14 Previous approaches to accommodating residual spatial dependence: Manipulate the mean function constructing covariates using residuals from neighboring units (see e.g., Wilkinson et al., 1983; Besag and Kempton, 1986; Williams, 1986) Geostatitical spatial process formed AR(1) col AR(1) row (Martin, 1990; Cullis et al., 1998) classical geostatistical method (Zimmerman and Harville, 1991) All are computationally feasible, but ad hoc and/or restrictive from a modeling perspective. 14 TIES 2009 Hierarchical Modeling and Analysis

15 Spatial model for genetic trials: Y (s i ) = x T (s i )β + a i + d i + w(s i ) + ɛ i, a = [a i ] n i=1 MV N(0, σ2 aa) d = [d i ] n i=1 MV N(0, σ2 d D) w = [w(s i )] n i=1 MV N(0, σ2 wc(θ)) ɛ = [ɛ i ] n i=1 N(0, τ 2 I n ) Tools used to estimate parameters: Markov chain Monte Carlo (MCMC) - iterative Gibbs sampler (β, a, d, w) Metropolis-Hastings and Slice samplers (θ) Here MCMC is computationally infeasible because of Big-N! 15 TIES 2009 Hierarchical Modeling and Analysis

16 Trick to sample genetic effects: Gibbs draw for random effects, e.g., a MV N(µ a, Σ a ), [ ] 1 where calculating Σ a = 1 A 1 + In σa 2 τ is computationally 2 expensive! However A and D are known, so use initial spectral decomposition i.e., A 1 = P T Λ 1 P. ( 1 Thus, Σ a = P T 1 Λ 1 + I) 1 σa 2 τ P to achieve computational 2 benefits. 16 TIES 2009 Hierarchical Modeling and Analysis

17 Unfortunately, this trick does not work for w. Rather, we proposed the knot-based predictive process. Corresponding predictive process model: Y (s i ) = x T (s i )β + a i + d i + w(s i ) + ɛ i, w(s i) = c(s i; θ) T C(θ) 1 (θ)w where, w = [w(s i )] m i=1 MV N(0, C (θ)) and C (θ) = [C(s i, s j ; θ)] m i,j=1 Projection = 17 TIES 2009 Hierarchical Modeling and Analysis

18 w can accomodate complex spatial dependence structures, E.g., anisotropic Matérn correlation function: ρ(s i, s j ; θ) = ( 1/Γ(ν)2 ν 1) ( 2 νd ij ) ν κ ν (2 νd ij ), where d ij = (s i s j ) T Σ 1 (s i s j ), Σ = G(ψ)Λ 2 G T (ψ). Thus, θ = (ν, ψ, Λ). Simulated Predicted 18 TIES 2009 Hierarchical Modeling and Analysis

19 Genetic + spatial effects models Candidate spatial models (i.e., specifications of C (θ)): 1 AR(1) col AR(1) row 2 isotropic Matérn 3 anisotropic Matérn Each model evaluated using 64, 144, and 256 knot grids. Model choice using Deviance Information Criterion (DIC) (Spiegelhalter et al., 2002) 19 TIES 2009 Hierarchical Modeling and Analysis

20 Table: Model comparisons using the DIC criterion for the Scots pine dataset. Model p D DIC Non-spatial Add , Add. Dom , Spatial Isotropic 64 Knots , Knots , Knots , Spatial Anisotropic 64 Knots , Knots , Knots , TIES 2009 Hierarchical Modeling and Analysis

21 Genetic + spatial effects models results Parameter credible intervals, 50% (2.5%, 97.5%) for the isotropic Matérn and 64 and 256 knots Scots pine trial. Spatial Parameter 64 Knots 256 Knots β (69.00, 76.05) (69.66, 79.66) σa (17.14, 41.82) (18.19, 53.69) σd (6.00, 34.27) (7.65, 27.05) σw (23.71, 73.34) (30.24, 88.10) τ (72.11, 99.65) (67.90, 96.16) ν 0.83 ( 0.31, 1.46) 0.47 (0.26, 1.28) φ 0.05 (0.02, 0.09) 0.04 (0.02, 0.09) Eff. Range (44.66, ) (45.22, ) h (0.13, 0.31) 0.25 (0.15, 0.39) Decrease in τ 2 due to removal of spatial variation, results in increase in h 2 (i.e., 0.25 vs with confounding). 21 TIES 2009 Hierarchical Modeling and Analysis

22 Genetic + spatial effects models results, cont d. Genetic model residuals w(s), 64 knots w(s), 256 knots Predictive process balance model richness with computational feasibility (e.g., 4,970 4,970 vs ). 22 TIES 2009 Hierarchical Modeling and Analysis

23 Summary Summary Challenge - to meet modeling needs: ensure computationally feasible reduce algorithmic complexity = cheap tricks (e.g., spectral decomp. of A prior to MCMC) reduce dimensionality = predictive process maintain richness and flexibility focus on the model not how to estimate the parameters = embrace new tools (MCMC) for estimating highly flexible hierarchical models truly acknowledge sources of uncertainty propagate uncertainty through hierarchical structures (e.g., recognize uncertainty in C(θ)) 23 TIES 2009 Hierarchical Modeling and Analysis

Monte Carlo for Spatial Models

Monte Carlo for Spatial Models Monte Carlo for Spatial Models Murali Haran Department of Statistics Penn State University Penn State Computational Science Lectures April 2007 Spatial Models Lots of scientific questions involve analyzing

More information

A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM

A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM Jayawant Mandrekar, Daniel J. Sargent, Paul J. Novotny, Jeff A. Sloan Mayo Clinic, Rochester, MN 55905 ABSTRACT A general

More information

Bayesian Spatiotemporal Modeling with Hierarchical Spatial Priors for fmri

Bayesian Spatiotemporal Modeling with Hierarchical Spatial Priors for fmri Bayesian Spatiotemporal Modeling with Hierarchical Spatial Priors for fmri Galin L. Jones 1 School of Statistics University of Minnesota March 2015 1 Joint with Martin Bezener and John Hughes Experiment

More information

Markov chain Monte Carlo methods

Markov chain Monte Carlo methods Markov chain Monte Carlo methods (supplementary material) see also the applet http://www.lbreyer.com/classic.html February 9 6 Independent Hastings Metropolis Sampler Outline Independent Hastings Metropolis

More information

Modeling Criminal Careers as Departures From a Unimodal Population Age-Crime Curve: The Case of Marijuana Use

Modeling Criminal Careers as Departures From a Unimodal Population Age-Crime Curve: The Case of Marijuana Use Modeling Criminal Careers as Departures From a Unimodal Population Curve: The Case of Marijuana Use Donatello Telesca, Elena A. Erosheva, Derek A. Kreader, & Ross Matsueda April 15, 2014 extends Telesca

More information

Variability in Annual Temperature Profiles

Variability in Annual Temperature Profiles Variability in Annual Temperature Profiles A Multivariate Spatial Analysis of Regional Climate Model Output Tamara Greasby, Stephan Sain Institute for Mathematics Applied to Geosciences, National Center

More information

1 Methods for Posterior Simulation

1 Methods for Posterior Simulation 1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing

More information

MCMC Methods for data modeling

MCMC Methods for data modeling MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering Introduction 1. Symposium on Data Modelling 2. Outline: a. Definition and uses of MCMC b. MCMC algorithms

More information

Statistical Matching using Fractional Imputation

Statistical Matching using Fractional Imputation Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:

More information

Parallel Multivariate Slice Sampling

Parallel Multivariate Slice Sampling Parallel Multivariate Slice Sampling Matthew M. Tibbits Department of Statistics Pennsylvania State University mmt143@stat.psu.edu Murali Haran Department of Statistics Pennsylvania State University mharan@stat.psu.edu

More information

GiRaF: a toolbox for Gibbs Random Fields analysis

GiRaF: a toolbox for Gibbs Random Fields analysis GiRaF: a toolbox for Gibbs Random Fields analysis Julien Stoehr *1, Pierre Pudlo 2, and Nial Friel 1 1 University College Dublin 2 Aix-Marseille Université February 24, 2016 Abstract GiRaF package offers

More information

Level-set MCMC Curve Sampling and Geometric Conditional Simulation

Level-set MCMC Curve Sampling and Geometric Conditional Simulation Level-set MCMC Curve Sampling and Geometric Conditional Simulation Ayres Fan John W. Fisher III Alan S. Willsky February 16, 2007 Outline 1. Overview 2. Curve evolution 3. Markov chain Monte Carlo 4. Curve

More information

Approximate Bayesian Computation. Alireza Shafaei - April 2016

Approximate Bayesian Computation. Alireza Shafaei - April 2016 Approximate Bayesian Computation Alireza Shafaei - April 2016 The Problem Given a dataset, we are interested in. The Problem Given a dataset, we are interested in. The Problem Given a dataset, we are interested

More information

A spatio-temporal model for extreme precipitation simulated by a climate model.

A spatio-temporal model for extreme precipitation simulated by a climate model. A spatio-temporal model for extreme precipitation simulated by a climate model. Jonathan Jalbert Joint work with Anne-Catherine Favre, Claude Bélisle and Jean-François Angers STATMOS Workshop: Climate

More information

Predictor Selection Algorithm for Bayesian Lasso

Predictor Selection Algorithm for Bayesian Lasso Predictor Selection Algorithm for Baesian Lasso Quan Zhang Ma 16, 2014 1 Introduction The Lasso [1] is a method in regression model for coefficients shrinkage and model selection. It is often used in the

More information

Integrating auxiliary data in optimal spatial design for species distribution mapping

Integrating auxiliary data in optimal spatial design for species distribution mapping Integrating auxiliary data in optimal spatial design for species distribution mapping Brian Reich, Krishna Pacifici and Jon Stallings North Carolina State University Reich + Pacifici + Stallings Optimal

More information

arxiv: v1 [stat.ap] 24 May 2016

arxiv: v1 [stat.ap] 24 May 2016 Submitted to the Annals of Applied Statistics arxiv: arxiv:. BAYESIAN PRINCIPAL COMPONENT REGRESSION MODEL WITH SPATIAL EFFECTS FOR FOREST INVENTORY UNDER SMALL FIELD SAMPLE SIZE arxiv:165.7439v1 [stat.ap]

More information

We deliver Global Engineering Solutions. Efficiently. This page contains no technical data Subject to the EAR or the ITAR

We deliver Global Engineering Solutions. Efficiently. This page contains no technical data Subject to the EAR or the ITAR Numerical Computation, Statistical analysis and Visualization Using MATLAB and Tools Authors: Jamuna Konda, Jyothi Bonthu, Harpitha Joginipally Infotech Enterprises Ltd, Hyderabad, India August 8, 2013

More information

Asreml-R: an R package for mixed models using residual maximum likelihood

Asreml-R: an R package for mixed models using residual maximum likelihood Asreml-R: an R package for mixed models using residual maximum likelihood David Butler 1 Brian Cullis 2 Arthur Gilmour 3 1 Queensland Department of Primary Industries Toowoomba 2 NSW Department of Primary

More information

Graphical Models, Bayesian Method, Sampling, and Variational Inference

Graphical Models, Bayesian Method, Sampling, and Variational Inference Graphical Models, Bayesian Method, Sampling, and Variational Inference With Application in Function MRI Analysis and Other Imaging Problems Wei Liu Scientific Computing and Imaging Institute University

More information

Discovery of the Source of Contaminant Release

Discovery of the Source of Contaminant Release Discovery of the Source of Contaminant Release Devina Sanjaya 1 Henry Qin Introduction Computer ability to model contaminant release events and predict the source of release in real time is crucial in

More information

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions

More information

Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison*

Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison* Tracking Trends: Incorporating Term Volume into Temporal Topic Models Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison* Dept. of Computer Science and Engineering, Lehigh University, Bethlehem, PA,

More information

Overview. Monte Carlo Methods. Statistics & Bayesian Inference Lecture 3. Situation At End Of Last Week

Overview. Monte Carlo Methods. Statistics & Bayesian Inference Lecture 3. Situation At End Of Last Week Statistics & Bayesian Inference Lecture 3 Joe Zuntz Overview Overview & Motivation Metropolis Hastings Monte Carlo Methods Importance sampling Direct sampling Gibbs sampling Monte-Carlo Markov Chains Emcee

More information

Rolling Markov Chain Monte Carlo

Rolling Markov Chain Monte Carlo Rolling Markov Chain Monte Carlo Din-Houn Lau Imperial College London Joint work with Axel Gandy 4 th July 2013 Predict final ranks of the each team. Updates quick update of predictions. Accuracy control

More information

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework IEEE SIGNAL PROCESSING LETTERS, VOL. XX, NO. XX, XXX 23 An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework Ji Won Yoon arxiv:37.99v [cs.lg] 3 Jul 23 Abstract In order to cluster

More information

08 An Introduction to Dense Continuous Robotic Mapping

08 An Introduction to Dense Continuous Robotic Mapping NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy

More information

MRF-based Algorithms for Segmentation of SAR Images

MRF-based Algorithms for Segmentation of SAR Images This paper originally appeared in the Proceedings of the 998 International Conference on Image Processing, v. 3, pp. 770-774, IEEE, Chicago, (998) MRF-based Algorithms for Segmentation of SAR Images Robert

More information

CS Introduction to Data Mining Instructor: Abdullah Mueen

CS Introduction to Data Mining Instructor: Abdullah Mueen CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 8: ADVANCED CLUSTERING (FUZZY AND CO -CLUSTERING) Review: Basic Cluster Analysis Methods (Chap. 10) Cluster Analysis: Basic Concepts

More information

The Pennsylvania State University The Graduate School PARALLEL MULTIVARIATE SLICE SAMPLING. A Thesis in Statistics by Matthew M.

The Pennsylvania State University The Graduate School PARALLEL MULTIVARIATE SLICE SAMPLING. A Thesis in Statistics by Matthew M. The Pennsylvania State University The Graduate School PARALLEL MULTIVARIATE SLICE SAMPLING A Thesis in Statistics by Matthew M. Tibbits c 2009 Matthew M. Tibbits Submitted in Partial Fulfillment of the

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Basis Functions Tom Kelsey School of Computer Science University of St Andrews http://www.cs.st-andrews.ac.uk/~tom/ tom@cs.st-andrews.ac.uk Tom Kelsey ID5059-02-BF 2015-02-04

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis 7 Computer Vision and Classification 413 / 458 Computer Vision and Classification The k-nearest-neighbor method The k-nearest-neighbor (knn) procedure has been used in data analysis and machine learning

More information

ASReml: AN OVERVIEW. Rajender Parsad 1, Jose Crossa 2 and Juan Burgueno 2

ASReml: AN OVERVIEW. Rajender Parsad 1, Jose Crossa 2 and Juan Burgueno 2 ASReml: AN OVERVIEW Rajender Parsad 1, Jose Crossa 2 and Juan Burgueno 2 1 I.A.S.R.I., Library Avenue, New Delhi - 110 012, India 2 Biometrics and statistics Unit, CIMMYT, Mexico ASReml is a statistical

More information

Spatial Statistical Downscaling for Constructing High-Resolution Nature Runs in Global Observing System Simulation Experiments

Spatial Statistical Downscaling for Constructing High-Resolution Nature Runs in Global Observing System Simulation Experiments Spatial Statistical Downscaling for Constructing High-Resolution Nature Runs in Global Observing System Simulation Experiments arxiv:1711.00484v1 [stat.me] 1 Nov 017 Pulong Ma and Emily L. Kang Amy Braverman

More information

Linear Modeling with Bayesian Statistics

Linear Modeling with Bayesian Statistics Linear Modeling with Bayesian Statistics Bayesian Approach I I I I I Estimate probability of a parameter State degree of believe in specific parameter values Evaluate probability of hypothesis given the

More information

Expectation-Maximization Methods in Population Analysis. Robert J. Bauer, Ph.D. ICON plc.

Expectation-Maximization Methods in Population Analysis. Robert J. Bauer, Ph.D. ICON plc. Expectation-Maximization Methods in Population Analysis Robert J. Bauer, Ph.D. ICON plc. 1 Objective The objective of this tutorial is to briefly describe the statistical basis of Expectation-Maximization

More information

Collective classification in network data

Collective classification in network data 1 / 50 Collective classification in network data Seminar on graphs, UCSB 2009 Outline 2 / 50 1 Problem 2 Methods Local methods Global methods 3 Experiments Outline 3 / 50 1 Problem 2 Methods Local methods

More information

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential

More information

Clustering Relational Data using the Infinite Relational Model

Clustering Relational Data using the Infinite Relational Model Clustering Relational Data using the Infinite Relational Model Ana Daglis Supervised by: Matthew Ludkin September 4, 2015 Ana Daglis Clustering Data using the Infinite Relational Model September 4, 2015

More information

Statistical techniques for data analysis in Cosmology

Statistical techniques for data analysis in Cosmology Statistical techniques for data analysis in Cosmology arxiv:0712.3028; arxiv:0911.3105 Numerical recipes (the bible ) Licia Verde ICREA & ICC UB-IEEC http://icc.ub.edu/~liciaverde outline Lecture 1: Introduction

More information

Intelligent Compaction and Quality Assurance of Roller Measurement Values utilizing Backfitting and Multiresolution Scale Space Analysis

Intelligent Compaction and Quality Assurance of Roller Measurement Values utilizing Backfitting and Multiresolution Scale Space Analysis Intelligent Compaction and Quality Assurance of Roller Measurement Values utilizing Backfitting and Multiresolution Scale Space Analysis Daniel K. Heersink 1, Reinhard Furrer 1, and Mike A. Mooney 2 arxiv:1302.4631v3

More information

INLA: an introduction

INLA: an introduction INLA: an introduction Håvard Rue 1 Norwegian University of Science and Technology Trondheim, Norway May 2009 1 Joint work with S.Martino (Trondheim) and N.Chopin (Paris) Latent Gaussian models Background

More information

Lecture on Modeling Tools for Clustering & Regression

Lecture on Modeling Tools for Clustering & Regression Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into

More information

PREDICTION AND CALIBRATION USING OUTPUTS FROM MULTIPLE COMPUTER SIMULATORS

PREDICTION AND CALIBRATION USING OUTPUTS FROM MULTIPLE COMPUTER SIMULATORS PREDICTION AND CALIBRATION USING OUTPUTS FROM MULTIPLE COMPUTER SIMULATORS by Joslin Goh M.Sc., Simon Fraser University, 29 B.Math., University of Waterloo, 27 a thesis submitted in partial fulfillment

More information

Detection of Smoke in Satellite Images

Detection of Smoke in Satellite Images Detection of Smoke in Satellite Images Mark Wolters Charmaine Dean Shanghai Center for Mathematical Sciences Western University December 15, 2014 TIES 2014, Guangzhou Summary Application Smoke identification

More information

STATISTICS (STAT) Statistics (STAT) 1

STATISTICS (STAT) Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).

More information

Statistics (STAT) Statistics (STAT) 1. Prerequisites: grade in C- or higher in STAT 1200 or STAT 1300 or STAT 1400

Statistics (STAT) Statistics (STAT) 1. Prerequisites: grade in C- or higher in STAT 1200 or STAT 1300 or STAT 1400 Statistics (STAT) 1 Statistics (STAT) STAT 1200: Introductory Statistical Reasoning Statistical concepts for critically evaluation quantitative information. Descriptive statistics, probability, estimation,

More information

From Bayesian Analysis of Item Response Theory Models Using SAS. Full book available for purchase here.

From Bayesian Analysis of Item Response Theory Models Using SAS. Full book available for purchase here. From Bayesian Analysis of Item Response Theory Models Using SAS. Full book available for purchase here. Contents About this Book...ix About the Authors... xiii Acknowledgments... xv Chapter 1: Item Response

More information

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\ Data Preprocessing Javier Béjar BY: $\ URL - Spring 2018 C CS - MAI 1/78 Introduction Data representation Unstructured datasets: Examples described by a flat set of attributes: attribute-value matrix Structured

More information

Short-Cut MCMC: An Alternative to Adaptation

Short-Cut MCMC: An Alternative to Adaptation Short-Cut MCMC: An Alternative to Adaptation Radford M. Neal Dept. of Statistics and Dept. of Computer Science University of Toronto http://www.cs.utoronto.ca/ radford/ Third Workshop on Monte Carlo Methods,

More information

Calibration and emulation of TIE-GCM

Calibration and emulation of TIE-GCM Calibration and emulation of TIE-GCM Serge Guillas School of Mathematics Georgia Institute of Technology Jonathan Rougier University of Bristol Big Thanks to Crystal Linkletter (SFU-SAMSI summer school)

More information

Discussion on Bayesian Model Selection and Parameter Estimation in Extragalactic Astronomy by Martin Weinberg

Discussion on Bayesian Model Selection and Parameter Estimation in Extragalactic Astronomy by Martin Weinberg Discussion on Bayesian Model Selection and Parameter Estimation in Extragalactic Astronomy by Martin Weinberg Phil Gregory Physics and Astronomy Univ. of British Columbia Introduction Martin Weinberg reported

More information

An Introduction to Markov Chain Monte Carlo

An Introduction to Markov Chain Monte Carlo An Introduction to Markov Chain Monte Carlo Markov Chain Monte Carlo (MCMC) refers to a suite of processes for simulating a posterior distribution based on a random (ie. monte carlo) process. In other

More information

A GENETIC ALGORITHM APPROACH TO OPTIMAL TOPOLOGICAL DESIGN OF ALL TERMINAL NETWORKS

A GENETIC ALGORITHM APPROACH TO OPTIMAL TOPOLOGICAL DESIGN OF ALL TERMINAL NETWORKS A GENETIC ALGORITHM APPROACH TO OPTIMAL TOPOLOGICAL DESIGN OF ALL TERMINAL NETWORKS BERNA DENGIZ AND FULYA ALTIPARMAK Department of Industrial Engineering Gazi University, Ankara, TURKEY 06570 ALICE E.

More information

Bayesian Robust Inference of Differential Gene Expression The bridge package

Bayesian Robust Inference of Differential Gene Expression The bridge package Bayesian Robust Inference of Differential Gene Expression The bridge package Raphael Gottardo October 30, 2017 Contents Department Statistics, University of Washington http://www.rglab.org raph@stat.washington.edu

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1

More information

CSCI 599 Class Presenta/on. Zach Levine. Markov Chain Monte Carlo (MCMC) HMM Parameter Es/mates

CSCI 599 Class Presenta/on. Zach Levine. Markov Chain Monte Carlo (MCMC) HMM Parameter Es/mates CSCI 599 Class Presenta/on Zach Levine Markov Chain Monte Carlo (MCMC) HMM Parameter Es/mates April 26 th, 2012 Topics Covered in this Presenta2on A (Brief) Review of HMMs HMM Parameter Learning Expecta2on-

More information

Bayesian Modelling with JAGS and R

Bayesian Modelling with JAGS and R Bayesian Modelling with JAGS and R Martyn Plummer International Agency for Research on Cancer Rencontres R, 3 July 2012 CRAN Task View Bayesian Inference The CRAN Task View Bayesian Inference is maintained

More information

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24 MCMC Diagnostics Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) MCMC Diagnostics MATH 9810 1 / 24 Convergence to Posterior Distribution Theory proves that if a Gibbs sampler iterates enough,

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

Sensor Tasking and Control

Sensor Tasking and Control Sensor Tasking and Control Outline Task-Driven Sensing Roles of Sensor Nodes and Utilities Information-Based Sensor Tasking Joint Routing and Information Aggregation Summary Introduction To efficiently

More information

Curve Sampling and Geometric Conditional Simulation

Curve Sampling and Geometric Conditional Simulation Curve Sampling and Geometric Conditional Simulation Ayres Fan Joint work with John Fisher, William Wells, Jonathan Kane, and Alan Willsky S S G Massachusetts Institute of Technology September 19, 2007

More information

Simplicial Global Optimization

Simplicial Global Optimization Simplicial Global Optimization Julius Žilinskas Vilnius University, Lithuania September, 7 http://web.vu.lt/mii/j.zilinskas Global optimization Find f = min x A f (x) and x A, f (x ) = f, where A R n.

More information

Performance of Sequential Imputation Method in Multilevel Applications

Performance of Sequential Imputation Method in Multilevel Applications Section on Survey Research Methods JSM 9 Performance of Sequential Imputation Method in Multilevel Applications Enxu Zhao, Recai M. Yucel New York State Department of Health, 8 N. Pearl St., Albany, NY

More information

Multiple Imputation with Mplus

Multiple Imputation with Mplus Multiple Imputation with Mplus Tihomir Asparouhov and Bengt Muthén Version 2 September 29, 2010 1 1 Introduction Conducting multiple imputation (MI) can sometimes be quite intricate. In this note we provide

More information

Infectious Disease Models. Angie Dobbs. A Thesis. Presented to. The University of Guelph. In partial fulfilment of requirements.

Infectious Disease Models. Angie Dobbs. A Thesis. Presented to. The University of Guelph. In partial fulfilment of requirements. Issues of Computational Efficiency and Model Approximation for Individual-Level Infectious Disease Models by Angie Dobbs A Thesis Presented to The University of Guelph In partial fulfilment of requirements

More information

Divide and Conquer Kernel Ridge Regression

Divide and Conquer Kernel Ridge Regression Divide and Conquer Kernel Ridge Regression Yuchen Zhang John Duchi Martin Wainwright University of California, Berkeley COLT 2013 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT 2013 1 / 15 Problem

More information

Issues in MCMC use for Bayesian model fitting. Practical Considerations for WinBUGS Users

Issues in MCMC use for Bayesian model fitting. Practical Considerations for WinBUGS Users Practical Considerations for WinBUGS Users Kate Cowles, Ph.D. Department of Statistics and Actuarial Science University of Iowa 22S:138 Lecture 12 Oct. 3, 2003 Issues in MCMC use for Bayesian model fitting

More information

Local spatial-predictor selection

Local spatial-predictor selection University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Information Sciences 2013 Local spatial-predictor selection Jonathan

More information

for Images A Bayesian Deformation Model

for Images A Bayesian Deformation Model Statistics in Imaging Workshop July 8, 2004 A Bayesian Deformation Model for Images Sining Chen Postdoctoral Fellow Biostatistics Division, Dept. of Oncology, School of Medicine Outline 1. Introducing

More information

Stochastic Simulation: Algorithms and Analysis

Stochastic Simulation: Algorithms and Analysis Soren Asmussen Peter W. Glynn Stochastic Simulation: Algorithms and Analysis et Springer Contents Preface Notation v xii I What This Book Is About 1 1 An Illustrative Example: The Single-Server Queue 1

More information

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany Syllabus Fri. 27.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 3.11. (2) A.1 Linear Regression Fri. 10.11. (3) A.2 Linear Classification Fri. 17.11. (4) A.3 Regularization

More information

Bipartite Edge Prediction via Transductive Learning over Product Graphs

Bipartite Edge Prediction via Transductive Learning over Product Graphs Bipartite Edge Prediction via Transductive Learning over Product Graphs Hanxiao Liu, Yiming Yang School of Computer Science, Carnegie Mellon University July 8, 2015 ICML 2015 Bipartite Edge Prediction

More information

On Integration of Multi-Point Improvements

On Integration of Multi-Point Improvements On Integration of Multi-Point Improvements R. Girdziušas 1, J. Janusevskis 1,2, and R. Le Riche 1,3 1 École Nationale Supérieure des Mines de Saint-Étienne 2 Riga Technical University, Latvia 3 CNRS UMR

More information

Rolling Markov Chain Monte Carlo

Rolling Markov Chain Monte Carlo Rolling Markov Chain Monte Carlo Din-Houn Lau Imperial College London Joint work with Axel Gandy 4 th September 2013 RSS Conference 2013: Newcastle Output predicted final ranks of the each team. Updates

More information

Chapter 11. Network Community Detection

Chapter 11. Network Community Detection Chapter 11. Network Community Detection Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Outline

More information

Semiparametric Mixed Effecs with Hierarchical DP Mixture

Semiparametric Mixed Effecs with Hierarchical DP Mixture Semiparametric Mixed Effecs with Hierarchical DP Mixture R topics documented: April 21, 2007 hdpm-package........................................ 1 hdpm............................................ 2 hdpmfitsetup........................................

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction A Monte Carlo method is a compuational method that uses random numbers to compute (estimate) some quantity of interest. Very often the quantity we want to compute is the mean of

More information

Parametric Response Surface Models for Analysis of Multi-Site fmri Data

Parametric Response Surface Models for Analysis of Multi-Site fmri Data Parametric Response Surface Models for Analysis of Multi-Site fmri Data Seyoung Kim 1, Padhraic Smyth 1, Hal Stern 1, Jessica Turner 2, and FIRST BIRN 1 Bren School of Information and Computer Sciences,

More information

Lecture 25: Review I

Lecture 25: Review I Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

More information

Organizing data in R. Fitting Mixed-Effects Models Using the lme4 Package in R. R packages. Accessing documentation. The Dyestuff data set

Organizing data in R. Fitting Mixed-Effects Models Using the lme4 Package in R. R packages. Accessing documentation. The Dyestuff data set Fitting Mixed-Effects Models Using the lme4 Package in R Deepayan Sarkar Fred Hutchinson Cancer Research Center 18 September 2008 Organizing data in R Standard rectangular data sets (columns are variables,

More information

Mixed Effects Models. Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC.

Mixed Effects Models. Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC. Mixed Effects Models Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC March 6, 2018 Resources for statistical assistance Department of Statistics

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Overview of Part Two Probabilistic Graphical Models Part Two: Inference and Learning Christopher M. Bishop Exact inference and the junction tree MCMC Variational methods and EM Example General variational

More information

Nearest Neighbor Methods for Imputing Missing Data Within and Across Scales

Nearest Neighbor Methods for Imputing Missing Data Within and Across Scales Nearest Neighbor Methods for Imputing Missing Data Within and Across Scales Valerie LeMay University of British Columbia, Canada and H. Temesgen, Oregon State University Presented at the Evaluation of

More information

Convexization in Markov Chain Monte Carlo

Convexization in Markov Chain Monte Carlo in Markov Chain Monte Carlo 1 IBM T. J. Watson Yorktown Heights, NY 2 Department of Aerospace Engineering Technion, Israel August 23, 2011 Problem Statement MCMC processes in general are governed by non

More information

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme Machine Learning A. Supervised Learning A.7. Decision Trees Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany 1 /

More information

Dimension reduction : PCA and Clustering

Dimension reduction : PCA and Clustering Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Non-Stationary Covariance Models for Discontinuous Functions as Applied to Aircraft Design Problems

Non-Stationary Covariance Models for Discontinuous Functions as Applied to Aircraft Design Problems Non-Stationary Covariance Models for Discontinuous Functions as Applied to Aircraft Design Problems Trent Lukaczyk December 1, 2012 1 Introduction 1.1 Application The NASA N+2 Supersonic Aircraft Project

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Spatial Outlier Detection

Spatial Outlier Detection Spatial Outlier Detection Chang-Tien Lu Department of Computer Science Northern Virginia Center Virginia Tech Joint work with Dechang Chen, Yufeng Kou, Jiang Zhao 1 Spatial Outlier A spatial data point

More information

Overview. Background. Locating quantitative trait loci (QTL)

Overview. Background. Locating quantitative trait loci (QTL) Overview Implementation of robust methods for locating quantitative trait loci in R Introduction to QTL mapping Andreas Baierl and Andreas Futschik Institute of Statistics and Decision Support Systems

More information

MCMC algorithms for structured multivariate normal models

MCMC algorithms for structured multivariate normal models MCMC algorithms for structured multivariate normal models William Browne*, Suna Akkol 1 and Harvey Goldstein* *University of Bristol, University of Yuzuncu Yil, Turkey Summary In this paper we look at

More information

5. Key ingredients for programming a random walk

5. Key ingredients for programming a random walk 1. Random Walk = Brownian Motion = Markov Chain (with caveat that each of these is a class of different models, not all of which are exactly equal) 2. A random walk is a specific kind of time series process

More information

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or STA 450/4000 S: February 2 2005 Flexible modelling using basis expansions (Chapter 5) Linear regression: y = Xβ + ɛ, ɛ (0, σ 2 ) Smooth regression: y = f (X) + ɛ: f (X) = E(Y X) to be specified Flexible

More information

Subtleties in the Monte Carlo simulation of lattice polymers

Subtleties in the Monte Carlo simulation of lattice polymers Subtleties in the Monte Carlo simulation of lattice polymers Nathan Clisby MASCOS, University of Melbourne July 9, 2010 Outline Critical exponents for SAWs. The pivot algorithm. Improving implementation

More information

Markov Random Fields and Gibbs Sampling for Image Denoising

Markov Random Fields and Gibbs Sampling for Image Denoising Markov Random Fields and Gibbs Sampling for Image Denoising Chang Yue Electrical Engineering Stanford University changyue@stanfoed.edu Abstract This project applies Gibbs Sampling based on different Markov

More information

A noninformative Bayesian approach to small area estimation

A noninformative Bayesian approach to small area estimation A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported

More information

CS281 Section 9: Graph Models and Practical MCMC

CS281 Section 9: Graph Models and Practical MCMC CS281 Section 9: Graph Models and Practical MCMC Scott Linderman November 11, 213 Now that we have a few MCMC inference algorithms in our toolbox, let s try them out on some random graph models. Graphs

More information