Species distribution modelling for combined data sources

Size: px

Start display at page:

Download "Species distribution modelling for combined data sources"

Eileen Evans
5 years ago
Views:

1 Species distribution modelling for combined data sources Ian Renner and Olivier Gimenez. oaggimenez oliviergimenez.github.io

2 Ian Renner - Australia 1

3 Outline Background (Species Distribution Models) Combining Data Sources LASSO Regularisation More to Explore! Ian W. Renner SDM with combined data sources EURING / 48

4 Species Distribution Models Species Data e.g. Reported locations of Eurasian lynx near the Jura mountains in France Ian W. Renner SDM with combined data sources EURING / 48

5 Species Distribution Models Species Distribution Modelling Ian W. Renner SDM with combined data sources EURING / 48

6 Species Distribution Models SDM methods Different species distribution modelling methods are appropriate for different sources of species data: Data source Presence-only Systematic survey Repeated surveys SDM method point process model (PPM) logistic regression occupancy modelling Ian W. Renner SDM with combined data sources EURING / 48

7 Species Distribution Models Poisson point process models Simplest useful model: inhomogeneous Poisson point process model with intensity µ(s) defined over region A fitted to presence locations s P. Intensity modelled as a log-linear function of environmental variables: ln µ(s) = β 0 + β 1 rain(s) + β 2 temp(s) α 1 dist road(s) +... Maximise log-likelihood (using GLM software): l ppm (β, α; s P ) = m i=1 ln µ(s i ) µ(s)ds s A Ian W. Renner SDM with combined data sources EURING / 48

8 Species Distribution Models What is the intensity measuring? Intensity is not a probability, but is related to abundance, but abundance of what? What we want: Ian W. Renner What we get: SDM with combined data sources EURING / 48

9 Species Distribution Models Occupancy Modelling Occupancy models have been developed to account for imperfect detection. They rely on repeated visits to a set of sites at which presence/non-detection is recorded at each site for each visit. Ian W. Renner SDM with combined data sources EURING / 48

10 Species Distribution Models Occupancy Data Detection of species across all sites during Visit 1: Ian W. Renner SDM with combined data sources EURING / 48

11 Species Distribution Models Occupancy Data Detection of species across all sites during Visit 2: Ian W. Renner SDM with combined data sources EURING / 48

12 Species Distribution Models Occupancy Data Detection of species across all sites during Visit 3: Ian W. Renner SDM with combined data sources EURING / 48

13 Species Distribution Models Occupancy Data Detection of species across all sites during Visit 4: Ian W. Renner SDM with combined data sources EURING / 48

14 Species Distribution Models Occupancy Data Detection of species across all sites during Visit 5: Ian W. Renner SDM with combined data sources EURING / 48

15 Species Distribution Models Occupancy Data Total detections of species across all sites during all visits: Problem: We don t know whether sites with 0 detections indicate the species is absent or whether it was present but undetected. Ian W. Renner SDM with combined data sources EURING / 48

16 Species Distribution Models Occupancy Model Fit by maximizing l occ (α O, β) = ln N i=1 P (Y i = y i ) What we want: What we get (more or less): Ian W. Renner SDM with combined data sources EURING / 48

17 Combining Data Sources Multiple sources In many situations, there is more than one source of data. 364 Sightings in the wild (s W ) 242 Domestic interferences (s D ) 73 Camera traps (y O ) Ian W. Renner SDM with combined data sources EURING / 48

18 Combining Data Sources One-source models Common approach: choose only one set of data. Available covariates: Altitude (alt) Forest cover (fc%) Distance to nearest water source (d.wat) Distance to nearest urban area (d.urb) Distance to nearest road (d.rd) Distance to nearest farm (d.farm) Human population density (h.dens) Ian W. Renner SDM with combined data sources EURING / 48

19 Combining Data Sources Point process model for wild sightings Maximise l ppm (α W, β; s W ) using: β: Linear, quadratic, and interaction terms of {alt, fc%, d.wat, d.urb} α W = d.rd Output µ W : intensity of wild reportings per unit area Ian W. Renner SDM with combined data sources EURING / 48

20 Combining Data Sources Point process model for domestic sightings Maximise l ppm (α D, β; s D ) using: β: Linear, quadratic, and interaction terms of {alt, fc%, d.wat, d.urb} α D = d.farm Output µ W : intensity of domestic reportings per unit area Ian W. Renner SDM with combined data sources EURING / 48

21 Combining Data Sources Occupancy model for camera traps Maximise l occ (α O, β; y O ) using: β: Linear, quadratic, and interaction terms of {alt, fc%, d.wat, d.urb} α O = h.dens Output µ occ : intensity of species per unit area Ian W. Renner SDM with combined data sources EURING / 48

22 Combining Data Sources Combined Approach How might we build a model using multiple sources of data? Presence-only and presence-absence : l(α, β, γ, δ) = l ppm (α, β, γ, δ) + l PA (β, γ) Presence-only and occupancy : l(α P O, α Occ, β, γ) = l ppm (α P O, β) + l Occ (α Occ, β) Fithian, W., Elith, J., Hastie, T., & Keith, D.A. (2015) Bias correction in species distribution models: pooling survey and collection data for multiple species. Methods in Ecology and Evolution 6, Dorazio, R.M. (2014) Accounting for imperfect detection and survey bias in statistical analysis of presence-only data. Global Ecology and Biogeography 23, Ian W. Renner SDM with combined data sources EURING / 48

23 Combining Data Sources Combined model Maximise l ppm (α W, β; s W ) + l ppm (α D, β; s D ) + l occ (α O, β; y O ) using: β: Linear, quadratic, and interaction terms of {alt, fc%, d.wat, d.urb} α W = d.rd α D = d.farm α O = h.dens Output µ combined :? Ian W. Renner SDM with combined data sources EURING / 48

24 Combining Data Sources Comparing models Ian W. Renner SDM with combined data sources EURING / 48

25 LASSO Regularisation Regularisation with the LASSO LASSO: Least Absolute Selection and Shrinkage Operator p β = argmax l(β) λ β j. j=1 Ian W. Renner SDM with combined data sources EURING / 48

26 Lasso vs. ridge regression, graphically 9

27 LASSO Regularisation The LASSO in Action: Regularization Paths Regularization paths for the three individual models: The occupancy model appears to be greatly overfitted with 15 covariates. Ian W. Renner SDM with combined data sources EURING / 48

28 LASSO Regularisation Regularized Individual Models Ian W. Renner SDM with combined data sources EURING / 48

29 LASSO Regularisation Regularized Combined Model Ian W. Renner SDM with combined data sources EURING / 48

30 Future Work Weighted Likelihood The combined model puts presence-only and survey data on equal footing. One way to acknowledge superior quality of survey data: weighted likelihood. Model RSS (survey data) Occupancy Wild P-O Domestic P-O Ian W. Renner SDM with combined data sources EURING / 48

31 Future Work Residual-weighted Combined Model Maximise w W l ppm (α W, β; s W ) + w D l ppm (α D, β; s D ) + w O l occ (α O, β; y O ). Ian W. Renner SDM with combined data sources EURING / 48

32 Future Work Model checking There are many tools for diagnostics of point process models. K-envelopes (to diagnose conditional independence of point locations): Ian W. Renner SDM with combined data sources EURING / 48

33 Future Work Model checking Spatial residual plots: Ian W. Renner SDM with combined data sources EURING / 48

34 Future Work More to explore Some next steps: Other weighting approaches Developing diagnostic tools Combinations involving non-poisson PPMs Please come see me if you are interested in contributing! Ian W. Renner SDM with combined data sources EURING / 48

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form