Species distribution modelling for combined data sources

Size: px
Start display at page:

Download "Species distribution modelling for combined data sources"

Transcription

1 Species distribution modelling for combined data sources Ian Renner and Olivier Gimenez. oaggimenez oliviergimenez.github.io

2 Ian Renner - Australia 1

3 Outline Background (Species Distribution Models) Combining Data Sources LASSO Regularisation More to Explore! Ian W. Renner SDM with combined data sources EURING / 48

4 Species Distribution Models Species Data e.g. Reported locations of Eurasian lynx near the Jura mountains in France Ian W. Renner SDM with combined data sources EURING / 48

5 Species Distribution Models Species Distribution Modelling Ian W. Renner SDM with combined data sources EURING / 48

6 Species Distribution Models SDM methods Different species distribution modelling methods are appropriate for different sources of species data: Data source Presence-only Systematic survey Repeated surveys SDM method point process model (PPM) logistic regression occupancy modelling Ian W. Renner SDM with combined data sources EURING / 48

7 Species Distribution Models Poisson point process models Simplest useful model: inhomogeneous Poisson point process model with intensity µ(s) defined over region A fitted to presence locations s P. Intensity modelled as a log-linear function of environmental variables: ln µ(s) = β 0 + β 1 rain(s) + β 2 temp(s) α 1 dist road(s) +... Maximise log-likelihood (using GLM software): l ppm (β, α; s P ) = m i=1 ln µ(s i ) µ(s)ds s A Ian W. Renner SDM with combined data sources EURING / 48

8 Species Distribution Models What is the intensity measuring? Intensity is not a probability, but is related to abundance, but abundance of what? What we want: Ian W. Renner What we get: SDM with combined data sources EURING / 48

9 Species Distribution Models Occupancy Modelling Occupancy models have been developed to account for imperfect detection. They rely on repeated visits to a set of sites at which presence/non-detection is recorded at each site for each visit. Ian W. Renner SDM with combined data sources EURING / 48

10 Species Distribution Models Occupancy Data Detection of species across all sites during Visit 1: Ian W. Renner SDM with combined data sources EURING / 48

11 Species Distribution Models Occupancy Data Detection of species across all sites during Visit 2: Ian W. Renner SDM with combined data sources EURING / 48

12 Species Distribution Models Occupancy Data Detection of species across all sites during Visit 3: Ian W. Renner SDM with combined data sources EURING / 48

13 Species Distribution Models Occupancy Data Detection of species across all sites during Visit 4: Ian W. Renner SDM with combined data sources EURING / 48

14 Species Distribution Models Occupancy Data Detection of species across all sites during Visit 5: Ian W. Renner SDM with combined data sources EURING / 48

15 Species Distribution Models Occupancy Data Total detections of species across all sites during all visits: Problem: We don t know whether sites with 0 detections indicate the species is absent or whether it was present but undetected. Ian W. Renner SDM with combined data sources EURING / 48

16 Species Distribution Models Occupancy Model Fit by maximizing l occ (α O, β) = ln N i=1 P (Y i = y i ) What we want: What we get (more or less): Ian W. Renner SDM with combined data sources EURING / 48

17 Combining Data Sources Multiple sources In many situations, there is more than one source of data. 364 Sightings in the wild (s W ) 242 Domestic interferences (s D ) 73 Camera traps (y O ) Ian W. Renner SDM with combined data sources EURING / 48

18 Combining Data Sources One-source models Common approach: choose only one set of data. Available covariates: Altitude (alt) Forest cover (fc%) Distance to nearest water source (d.wat) Distance to nearest urban area (d.urb) Distance to nearest road (d.rd) Distance to nearest farm (d.farm) Human population density (h.dens) Ian W. Renner SDM with combined data sources EURING / 48

19 Combining Data Sources Point process model for wild sightings Maximise l ppm (α W, β; s W ) using: β: Linear, quadratic, and interaction terms of {alt, fc%, d.wat, d.urb} α W = d.rd Output µ W : intensity of wild reportings per unit area Ian W. Renner SDM with combined data sources EURING / 48

20 Combining Data Sources Point process model for domestic sightings Maximise l ppm (α D, β; s D ) using: β: Linear, quadratic, and interaction terms of {alt, fc%, d.wat, d.urb} α D = d.farm Output µ W : intensity of domestic reportings per unit area Ian W. Renner SDM with combined data sources EURING / 48

21 Combining Data Sources Occupancy model for camera traps Maximise l occ (α O, β; y O ) using: β: Linear, quadratic, and interaction terms of {alt, fc%, d.wat, d.urb} α O = h.dens Output µ occ : intensity of species per unit area Ian W. Renner SDM with combined data sources EURING / 48

22 Combining Data Sources Combined Approach How might we build a model using multiple sources of data? Presence-only and presence-absence : l(α, β, γ, δ) = l ppm (α, β, γ, δ) + l PA (β, γ) Presence-only and occupancy : l(α P O, α Occ, β, γ) = l ppm (α P O, β) + l Occ (α Occ, β) Fithian, W., Elith, J., Hastie, T., & Keith, D.A. (2015) Bias correction in species distribution models: pooling survey and collection data for multiple species. Methods in Ecology and Evolution 6, Dorazio, R.M. (2014) Accounting for imperfect detection and survey bias in statistical analysis of presence-only data. Global Ecology and Biogeography 23, Ian W. Renner SDM with combined data sources EURING / 48

23 Combining Data Sources Combined model Maximise l ppm (α W, β; s W ) + l ppm (α D, β; s D ) + l occ (α O, β; y O ) using: β: Linear, quadratic, and interaction terms of {alt, fc%, d.wat, d.urb} α W = d.rd α D = d.farm α O = h.dens Output µ combined :? Ian W. Renner SDM with combined data sources EURING / 48

24 Combining Data Sources Comparing models Ian W. Renner SDM with combined data sources EURING / 48

25 LASSO Regularisation Regularisation with the LASSO LASSO: Least Absolute Selection and Shrinkage Operator p β = argmax l(β) λ β j. j=1 Ian W. Renner SDM with combined data sources EURING / 48

26 Lasso vs. ridge regression, graphically 9

27 LASSO Regularisation The LASSO in Action: Regularization Paths Regularization paths for the three individual models: The occupancy model appears to be greatly overfitted with 15 covariates. Ian W. Renner SDM with combined data sources EURING / 48

28 LASSO Regularisation Regularized Individual Models Ian W. Renner SDM with combined data sources EURING / 48

29 LASSO Regularisation Regularized Combined Model Ian W. Renner SDM with combined data sources EURING / 48

30 Future Work Weighted Likelihood The combined model puts presence-only and survey data on equal footing. One way to acknowledge superior quality of survey data: weighted likelihood. Model RSS (survey data) Occupancy Wild P-O Domestic P-O Ian W. Renner SDM with combined data sources EURING / 48

31 Future Work Residual-weighted Combined Model Maximise w W l ppm (α W, β; s W ) + w D l ppm (α D, β; s D ) + w O l occ (α O, β; y O ). Ian W. Renner SDM with combined data sources EURING / 48

32 Future Work Model checking There are many tools for diagnostics of point process models. K-envelopes (to diagnose conditional independence of point locations): Ian W. Renner SDM with combined data sources EURING / 48

33 Future Work Model checking Spatial residual plots: Ian W. Renner SDM with combined data sources EURING / 48

34 Future Work More to explore Some next steps: Other weighting approaches Developing diagnostic tools Combinations involving non-poisson PPMs Please come see me if you are interested in contributing! Ian W. Renner SDM with combined data sources EURING / 48

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form

More information

Classification with PAM and Random Forest

Classification with PAM and Random Forest 5/7/2007 Classification with PAM and Random Forest Markus Ruschhaupt Practical Microarray Analysis 2007 - Regensburg Two roads to classification Given: patient profiles already diagnosed by an expert.

More information

Workshop 8: Model selection

Workshop 8: Model selection Workshop 8: Model selection Selecting among candidate models requires a criterion for evaluating and comparing models, and a strategy for searching the possibilities. In this workshop we will explore some

More information

Machine Learning Duncan Anderson Managing Director, Willis Towers Watson

Machine Learning Duncan Anderson Managing Director, Willis Towers Watson Machine Learning Duncan Anderson Managing Director, Willis Towers Watson 21 March 2018 GIRO 2016, Dublin - Response to machine learning Don t panic! We re doomed! 2 This is not all new Actuaries adopt

More information

Machine Learning. Chao Lan

Machine Learning. Chao Lan Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian

More information

GLMSELECT for Model Selection

GLMSELECT for Model Selection Winnipeg SAS User Group Meeting May 11, 2012 GLMSELECT for Model Selection Sylvain Tremblay SAS Canada Education Copyright 2010 SAS Institute Inc. All rights reserved. Proc GLM Proc REG Class Statement

More information

Genotype x Environmental Analysis with R for Windows

Genotype x Environmental Analysis with R for Windows Genotype x Environmental Analysis with R for Windows Biometrics and Statistics Unit Angela Pacheco CIMMYT,Int. 23-24 Junio 2015 About GEI In agricultural experimentation, a large number of genotypes are

More information

Chapter 7: Dual Modeling in the Presence of Constant Variance

Chapter 7: Dual Modeling in the Presence of Constant Variance Chapter 7: Dual Modeling in the Presence of Constant Variance 7.A Introduction An underlying premise of regression analysis is that a given response variable changes systematically and smoothly due to

More information

Lecture 13: Model selection and regularization

Lecture 13: Model selection and regularization Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always

More information

SURFEX LDAS March 2012

SURFEX LDAS March 2012 SURFEX LDAS March 2012 Alina Barbu Combined assimilation of satellite-derived soil moisture and LAI 2 Motivation of our work GEOLAND 2 project Land Carbon Information Service (LCIS) on vegetation/land

More information

Classification by Nearest Shrunken Centroids and Support Vector Machines

Classification by Nearest Shrunken Centroids and Support Vector Machines Classification by Nearest Shrunken Centroids and Support Vector Machines Florian Markowetz florian.markowetz@molgen.mpg.de Max Planck Institute for Molecular Genetics, Computational Diagnostics Group,

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description

More information

Evaluating generalization (validation) Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support

Evaluating generalization (validation) Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Evaluating generalization (validation) Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Topics Validation of biomedical models Data-splitting Resampling Cross-validation

More information

Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums

Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums José Garrido Department of Mathematics and Statistics Concordia University, Montreal EAJ 2016 Lyon, September

More information

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential

More information

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,

More information

GENREG DID THAT? Clay Barker Research Statistician Developer JMP Division, SAS Institute

GENREG DID THAT? Clay Barker Research Statistician Developer JMP Division, SAS Institute GENREG DID THAT? Clay Barker Research Statistician Developer JMP Division, SAS Institute GENREG WHAT IS IT? The Generalized Regression platform was introduced in JMP Pro 11 and got much better in version

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Chapter 9 Chapter 9 1 / 50 1 91 Maximal margin classifier 2 92 Support vector classifiers 3 93 Support vector machines 4 94 SVMs with more than two classes 5 95 Relationshiop to

More information

Integrating auxiliary data in optimal spatial design for species distribution mapping

Integrating auxiliary data in optimal spatial design for species distribution mapping Integrating auxiliary data in optimal spatial design for species distribution mapping Brian Reich, Krishna Pacifici and Jon Stallings North Carolina State University Reich + Pacifici + Stallings Optimal

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

Fast or furious? - User analysis of SF Express Inc

Fast or furious? - User analysis of SF Express Inc CS 229 PROJECT, DEC. 2017 1 Fast or furious? - User analysis of SF Express Inc Gege Wen@gegewen, Yiyuan Zhang@yiyuan12, Kezhen Zhao@zkz I. MOTIVATION The motivation of this project is to predict the likelihood

More information

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100.

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100. Linear Model Selection and Regularization especially usefull in high dimensions p>>100. 1 Why Linear Model Regularization? Linear models are simple, BUT consider p>>n, we have more features than data records

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1

More information

Spatial Outlier Detection

Spatial Outlier Detection Spatial Outlier Detection Chang-Tien Lu Department of Computer Science Northern Virginia Center Virginia Tech Joint work with Dechang Chen, Yufeng Kou, Jiang Zhao 1 Spatial Outlier A spatial data point

More information

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning  Ian Goodfellow Practical Methodology Lecture slides for Chapter 11 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-26 What drives success in ML? Arcane knowledge of dozens of obscure algorithms? Mountains

More information

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Emily Fox University of Washington February 3, 2017 Simplest approach: Nearest neighbor regression 1 Fit locally to each

More information

Last time... Coryn Bailer-Jones. check and if appropriate remove outliers, errors etc. linear regression

Last time... Coryn Bailer-Jones. check and if appropriate remove outliers, errors etc. linear regression Machine learning, pattern recognition and statistical data modelling Lecture 3. Linear Methods (part 1) Coryn Bailer-Jones Last time... curse of dimensionality local methods quickly become nonlocal as

More information

Clustering Lecture 5: Mixture Model

Clustering Lecture 5: Mixture Model Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics

More information

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017 Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last

More information

Leveling Up as a Data Scientist. ds/2014/10/level-up-ds.jpg

Leveling Up as a Data Scientist.   ds/2014/10/level-up-ds.jpg Model Optimization Leveling Up as a Data Scientist http://shorelinechurch.org/wp-content/uploa ds/2014/10/level-up-ds.jpg Bias and Variance Error = (expected loss of accuracy) 2 + flexibility of model

More information

Monte Carlo for Spatial Models

Monte Carlo for Spatial Models Monte Carlo for Spatial Models Murali Haran Department of Statistics Penn State University Penn State Computational Science Lectures April 2007 Spatial Models Lots of scientific questions involve analyzing

More information

VARIABLE SELECTION MADE EASY USING GENREG IN JMP PRO

VARIABLE SELECTION MADE EASY USING GENREG IN JMP PRO VARIABLE SELECTION MADE EASY USING GENREG IN JMP PRO Clay Barker Senior Research Statistician Developer JMP Division, SAS Institute THE IMPORTANCE OF VARIABLE SELECTION In 1996, Brad Efron (famous for

More information

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Emily Fox University of Washington February 3, 2017 Simplest approach: Nearest neighbor regression 1 Fit locally to each

More information

Lecture on Modeling Tools for Clustering & Regression

Lecture on Modeling Tools for Clustering & Regression Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into

More information

Problem 1 (20 pt) Answer the following questions, and provide an explanation for each question.

Problem 1 (20 pt) Answer the following questions, and provide an explanation for each question. Problem 1 Answer the following questions, and provide an explanation for each question. (5 pt) Can linear regression work when all X values are the same? When all Y values are the same? (5 pt) Can linear

More information

Supervised vs unsupervised clustering

Supervised vs unsupervised clustering Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful

More information

Regularization Methods. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel

Regularization Methods. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Regularization Methods Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Today s Lecture Objectives 1 Avoiding overfitting and improving model interpretability with the help of regularization

More information

Statistical Consulting Topics Using cross-validation for model selection. Cross-validation is a technique that can be used for model evaluation.

Statistical Consulting Topics Using cross-validation for model selection. Cross-validation is a technique that can be used for model evaluation. Statistical Consulting Topics Using cross-validation for model selection Cross-validation is a technique that can be used for model evaluation. We often fit a model to a full data set and then perform

More information

The Problem of Overfitting with Maximum Likelihood

The Problem of Overfitting with Maximum Likelihood The Problem of Overfitting with Maximum Likelihood In the previous example, continuing training to find the absolute maximum of the likelihood produced overfitted results. The effect is much bigger if

More information

Gradient LASSO algoithm

Gradient LASSO algoithm Gradient LASSO algoithm Yongdai Kim Seoul National University, Korea jointly with Yuwon Kim University of Minnesota, USA and Jinseog Kim Statistical Research Center for Complex Systems, Korea Contents

More information

Multiresponse Sparse Regression with Application to Multidimensional Scaling

Multiresponse Sparse Regression with Application to Multidimensional Scaling Multiresponse Sparse Regression with Application to Multidimensional Scaling Timo Similä and Jarkko Tikka Helsinki University of Technology, Laboratory of Computer and Information Science P.O. Box 54,

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

INTEGRATION OF TREE DATABASE DERIVED FROM SATELLITE IMAGERY AND LIDAR POINT CLOUD DATA

INTEGRATION OF TREE DATABASE DERIVED FROM SATELLITE IMAGERY AND LIDAR POINT CLOUD DATA INTEGRATION OF TREE DATABASE DERIVED FROM SATELLITE IMAGERY AND LIDAR POINT CLOUD DATA S. C. Liew 1, X. Huang 1, E. S. Lin 2, C. Shi 1, A. T. K. Yee 2, A. Tandon 2 1 Centre for Remote Imaging, Sensing

More information

Chapter 6: Linear Model Selection and Regularization

Chapter 6: Linear Model Selection and Regularization Chapter 6: Linear Model Selection and Regularization As p (the number of predictors) comes close to or exceeds n (the sample size) standard linear regression is faced with problems. The variance of the

More information

1. Estimation equations for strip transect sampling, using notation consistent with that used to

1. Estimation equations for strip transect sampling, using notation consistent with that used to Web-based Supplementary Materials for Line Transect Methods for Plant Surveys by S.T. Buckland, D.L. Borchers, A. Johnston, P.A. Henrys and T.A. Marques Web Appendix A. Introduction In this on-line appendix,

More information

Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients

Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients 1 Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients 1,2 Keyue Ding, Ph.D. Nov. 8, 2014 1 NCIC Clinical Trials Group, Kingston, Ontario, Canada 2 Dept. Public

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

NONPARAMETRIC REGRESSION TECHNIQUES

NONPARAMETRIC REGRESSION TECHNIQUES NONPARAMETRIC REGRESSION TECHNIQUES C&PE 940, 28 November 2005 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and other resources available at: http://people.ku.edu/~gbohling/cpe940

More information

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:

More information

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13 CSE 634 - Data Mining Concepts and Techniques STATISTICAL METHODS Professor- Anita Wasilewska (REGRESSION) Team 13 Contents Linear Regression Logistic Regression Bias and Variance in Regression Model Fit

More information

Optimization Plugin for RapidMiner. Venkatesh Umaashankar Sangkyun Lee. Technical Report 04/2012. technische universität dortmund

Optimization Plugin for RapidMiner. Venkatesh Umaashankar Sangkyun Lee. Technical Report 04/2012. technische universität dortmund Optimization Plugin for RapidMiner Technical Report Venkatesh Umaashankar Sangkyun Lee 04/2012 technische universität dortmund Part of the work on this technical report has been supported by Deutsche Forschungsgemeinschaft

More information

Sparse Linear Models

Sparse Linear Models November 2015 Trevor Hastie, Stanford Statistics 1 Sparse Linear Models Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani and many students November 2015 Trevor Hastie,

More information

More on Neural Networks. Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.

More on Neural Networks. Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5. More on Neural Networks Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.6 Recall the MLP Training Example From Last Lecture log likelihood

More information

Detection of Smoke in Satellite Images

Detection of Smoke in Satellite Images Detection of Smoke in Satellite Images Mark Wolters Charmaine Dean Shanghai Center for Mathematical Sciences Western University December 15, 2014 TIES 2014, Guangzhou Summary Application Smoke identification

More information

SCGLR - An R Package for Supervised Component Generalized Linear Regression

SCGLR - An R Package for Supervised Component Generalized Linear Regression SCGLR - An R Package for Supervised Component Generalized Linear Regression Frédéric Mortier, Catherine Trottier, Guillaume Cornu and Xavier Bry March 7, 2016 Summary: The objective of this paper is to

More information

INF 4300 Classification III Anne Solberg The agenda today:

INF 4300 Classification III Anne Solberg The agenda today: INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15

More information

Region-based Segmentation and Object Detection

Region-based Segmentation and Object Detection Region-based Segmentation and Object Detection Stephen Gould Tianshi Gao Daphne Koller Presented at NIPS 2009 Discussion and Slides by Eric Wang April 23, 2010 Outline Introduction Model Overview Model

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Lecture 20: Bagging, Random Forests, Boosting

Lecture 20: Bagging, Random Forests, Boosting Lecture 20: Bagging, Random Forests, Boosting Reading: Chapter 8 STATS 202: Data mining and analysis November 13, 2017 1 / 17 Classification and Regression trees, in a nut shell Grow the tree by recursively

More information

Lecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM)

Lecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM) School of Computer Science Probabilistic Graphical Models Structured Sparse Additive Models Junming Yin and Eric Xing Lecture 7, April 4, 013 Reading: See class website 1 Outline Nonparametric regression

More information

Module 4. Non-linear machine learning econometrics: Support Vector Machine

Module 4. Non-linear machine learning econometrics: Support Vector Machine Module 4. Non-linear machine learning econometrics: Support Vector Machine THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Introduction When the assumption of linearity

More information

Statistics 202: Data Mining. c Jonathan Taylor. Outliers Based in part on slides from textbook, slides of Susan Holmes.

Statistics 202: Data Mining. c Jonathan Taylor. Outliers Based in part on slides from textbook, slides of Susan Holmes. Outliers Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Concepts What is an outlier? The set of data points that are considerably different than the remainder of the

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

Model Inference and Averaging. Baging, Stacking, Random Forest, Boosting

Model Inference and Averaging. Baging, Stacking, Random Forest, Boosting Model Inference and Averaging Baging, Stacking, Random Forest, Boosting Bagging Bootstrap Aggregating Bootstrap Repeatedly select n data samples with replacement Each dataset b=1:b is slightly different

More information

Lasso. November 14, 2017

Lasso. November 14, 2017 Lasso November 14, 2017 Contents 1 Case Study: Least Absolute Shrinkage and Selection Operator (LASSO) 1 1.1 The Lasso Estimator.................................... 1 1.2 Computation of the Lasso Solution............................

More information

Updates and Errata for Statistical Data Analytics (1st edition, 2015)

Updates and Errata for Statistical Data Analytics (1st edition, 2015) Updates and Errata for Statistical Data Analytics (1st edition, 2015) Walter W. Piegorsch University of Arizona c 2018 The author. All rights reserved, except where previous rights exist. CONTENTS Preface

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.cs.toronto.edu/~rsalakhu/ Lecture 3 Parametric Distribu>ons We want model the probability

More information

3D Convolutional Neural Networks for Landing Zone Detection from LiDAR

3D Convolutional Neural Networks for Landing Zone Detection from LiDAR 3D Convolutional Neural Networks for Landing Zone Detection from LiDAR Daniel Mataruna and Sebastian Scherer Presented by: Sabin Kafle Outline Introduction Preliminaries Approach Volumetric Density Mapping

More information

Topics in Machine Learning-EE 5359 Model Assessment and Selection

Topics in Machine Learning-EE 5359 Model Assessment and Selection Topics in Machine Learning-EE 5359 Model Assessment and Selection Ioannis D. Schizas Electrical Engineering Department University of Texas at Arlington 1 Training and Generalization Training stage: Utilizing

More information

Non-Linearity of Scorecard Log-Odds

Non-Linearity of Scorecard Log-Odds Non-Linearity of Scorecard Log-Odds Ross McDonald, Keith Smith, Matthew Sturgess, Edward Huang Retail Decision Science, Lloyds Banking Group Edinburgh Credit Scoring Conference 6 th August 9 Lloyds Banking

More information

Lasso.jl Documentation

Lasso.jl Documentation Lasso.jl Documentation Release 0.0.1 Simon Kornblith Jan 07, 2018 Contents 1 Lasso paths 3 2 Fused Lasso and trend filtering 7 3 Indices and tables 9 i ii Lasso.jl Documentation, Release 0.0.1 Contents:

More information

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or STA 450/4000 S: February 2 2005 Flexible modelling using basis expansions (Chapter 5) Linear regression: y = Xβ + ɛ, ɛ (0, σ 2 ) Smooth regression: y = f (X) + ɛ: f (X) = E(Y X) to be specified Flexible

More information

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall

More information

Unsupervised: no target value to predict

Unsupervised: no target value to predict Clustering Unsupervised: no target value to predict Differences between models/algorithms: Exclusive vs. overlapping Deterministic vs. probabilistic Hierarchical vs. flat Incremental vs. batch learning

More information

CSSS 510: Lab 2. Introduction to Maximum Likelihood Estimation

CSSS 510: Lab 2. Introduction to Maximum Likelihood Estimation CSSS 510: Lab 2 Introduction to Maximum Likelihood Estimation 2018-10-12 0. Agenda 1. Housekeeping: simcf, tile 2. Questions about Homework 1 or lecture 3. Simulating heteroskedastic normal data 4. Fitting

More information

arxiv: v1 [stat.me] 29 May 2015

arxiv: v1 [stat.me] 29 May 2015 MIMCA: Multiple imputation for categorical variables with multiple correspondence analysis Vincent Audigier 1, François Husson 2 and Julie Josse 2 arxiv:1505.08116v1 [stat.me] 29 May 2015 Applied Mathematics

More information

Nina Zumel and John Mount Win-Vector LLC

Nina Zumel and John Mount Win-Vector LLC SUPERVISED LEARNING IN R: REGRESSION Logistic regression to predict probabilities Nina Zumel and John Mount Win-Vector LLC Predicting Probabilities Predicting whether an event occurs (yes/no): classification

More information

Modeling and Monitoring Crop Disease in Developing Countries

Modeling and Monitoring Crop Disease in Developing Countries Modeling and Monitoring Crop Disease in Developing Countries John Quinn 1, Kevin Leyton-Brown 2, Ernest Mwebaze 1 1 Department of Computer Science 2 Department of Computer Science Makerere University,

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

Multi-label classification using rule-based classifier systems

Multi-label classification using rule-based classifier systems Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar

More information

Analysis of Different Reference Plane Setups for the Calibration of a Mobile Laser Scanning System

Analysis of Different Reference Plane Setups for the Calibration of a Mobile Laser Scanning System Analysis of Different Reference Plane Setups for the Calibration of a Mobile Laser Scanning System 18. Internationaler Ingenieurvermessungskurs Graz, Austria, 25-29 th April 2017 Erik Heinz, Christian

More information

Parthy. A test of neutrality using species abundance evenness, and parameter inference by Approximate Bayesian Computation

Parthy. A test of neutrality using species abundance evenness, and parameter inference by Approximate Bayesian Computation Parthy A test of neutrality using species abundance evenness, and parameter inference by Approximate Bayesian Computation http://www.edb.ups tlse.fr/equipe1/tetame.htm Franck Jabot Jérôme Chave Laboratoire

More information

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer Model Assessment and Selection Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Model Training data Testing data Model Testing error rate Training error

More information

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data

More information

Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP

Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP Linear regression Linear Basis FuncDon Models (1) Example: Polynomial Curve FiLng Linear Basis FuncDon Models (2) Generally

More information

A Versatile Dependent Model for Heterogeneous Cellular Networks

A Versatile Dependent Model for Heterogeneous Cellular Networks 1 A Versatile Dependent Model for Heterogeneous Cellular Networks Martin Haenggi University of Notre Dame July 7, 1 Abstract arxiv:135.97v [cs.ni] 7 May 13 We propose a new model for heterogeneous cellular

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

LECTURE 12: LINEAR MODEL SELECTION PT. 3. October 23, 2017 SDS 293: Machine Learning

LECTURE 12: LINEAR MODEL SELECTION PT. 3. October 23, 2017 SDS 293: Machine Learning LECTURE 12: LINEAR MODEL SELECTION PT. 3 October 23, 2017 SDS 293: Machine Learning Announcements 1/2 Presentation of the CS Major & Minors TODAY @ lunch Ford 240 FREE FOOD! Announcements 2/2 CS Internship

More information

Image Registration + Other Stuff

Image Registration + Other Stuff Image Registration + Other Stuff John Ashburner Pre-processing Overview fmri time-series Motion Correct Anatomical MRI Coregister m11 m 21 m 31 m12 m13 m14 m 22 m 23 m 24 m 32 m 33 m 34 1 Template Estimate

More information

Monocular Human Motion Capture with a Mixture of Regressors. Ankur Agarwal and Bill Triggs GRAVIR-INRIA-CNRS, Grenoble, France

Monocular Human Motion Capture with a Mixture of Regressors. Ankur Agarwal and Bill Triggs GRAVIR-INRIA-CNRS, Grenoble, France Monocular Human Motion Capture with a Mixture of Regressors Ankur Agarwal and Bill Triggs GRAVIR-INRIA-CNRS, Grenoble, France IEEE Workshop on Vision for Human-Computer Interaction, 21 June 2005 Visual

More information

How to carry out secondary validation of climatic data

How to carry out secondary validation of climatic data World Bank & Government of The Netherlands funded Training module # SWDP -17 How to carry out secondary validation of climatic data New Delhi, November 1999 CSMRS Building, 4th Floor, Olof Palme Marg,

More information

Model selection Outline for today

Model selection Outline for today Model selection Outline for today The problem of model selection Choose among models by a criterion rather than significance testing Criteria: Mallow s C p and AIC Search strategies: All subsets; stepaic

More information

Chapter 7: Numerical Prediction

Chapter 7: Numerical Prediction Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 7: Numerical Prediction Lecture: Prof. Dr.

More information

Collaborative Filtering Applied to Educational Data Mining

Collaborative Filtering Applied to Educational Data Mining Collaborative Filtering Applied to Educational Data Mining KDD Cup 200 July 25 th, 200 BigChaos @ KDD Team Dataset Solution Overview Michael Jahrer, Andreas Töscher from commendo research Dataset Team

More information

Last time... Bias-Variance decomposition. This week

Last time... Bias-Variance decomposition. This week Machine learning, pattern recognition and statistical data modelling Lecture 4. Going nonlinear: basis expansions and splines Last time... Coryn Bailer-Jones linear regression methods for high dimensional

More information

MEDICAL IMAGE COMPUTING (CAP 5937) LECTURE 4: Pre-Processing Medical Images (II)

MEDICAL IMAGE COMPUTING (CAP 5937) LECTURE 4: Pre-Processing Medical Images (II) SPRING 2016 1 MEDICAL IMAGE COMPUTING (CAP 5937) LECTURE 4: Pre-Processing Medical Images (II) Dr. Ulas Bagci HEC 221, Center for Research in Computer Vision (CRCV), University of Central Florida (UCF),

More information

Classification and Detection in Images. D.A. Forsyth

Classification and Detection in Images. D.A. Forsyth Classification and Detection in Images D.A. Forsyth Classifying Images Motivating problems detecting explicit images classifying materials classifying scenes Strategy build appropriate image features train

More information

Conquering Massive Clinical Models with GPU. GPU Parallelized Logistic Regression

Conquering Massive Clinical Models with GPU. GPU Parallelized Logistic Regression Conquering Massive Clinical Models with GPU Parallelized Logistic Regression M.D./Ph.D. candidate in Biomathematics University of California, Los Angeles Joint Statistical Meetings Vancouver, Canada, July

More information

Optimization Models for Machine Learning: A Survey

Optimization Models for Machine Learning: A Survey Optimization Models for Machine Learning: A Survey arxiv:1901.05331v1 [math.oc] 16 Jan 2019 Claudio Gambella 1 Bissan Ghaddar 2 Joe Naoum-Sawaya 2 1 IBM Research Ireland, Mulhuddart, Dublin 15, Ireland

More information