1. Estimation equations for strip transect sampling, using notation consistent with that used to

Size: px
Start display at page:

Download "1. Estimation equations for strip transect sampling, using notation consistent with that used to"

Transcription

1 Web-based Supplementary Materials for Line Transect Methods for Plant Surveys by S.T. Buckland, D.L. Borchers, A. Johnston, P.A. Henrys and T.A. Marques Web Appendix A. Introduction In this on-line appendix, we provide:. Estimation equations for strip transect sampling, using notation consistent with that used to develop the line transect methods of the main paper.. Comments on how a full likelihood approach can be implemented, and on why we would not usually wish to do this. 3. A simulation study to explore the performance of the proposed bootstrapping method. 4. Application of the methods to a population of known size.. Strip transect sampling For strip transect sampling, and considering E/W data (indicated by E) only, plant abundance is estimated as AnE Nˆ E = wl E where n E is the total number of plants detected in the E/W strips of half-width w and total length L E, and A is the size of the survey region. We can analyse N/S (N) data similarly, to give Nˆ N. Corresponding variance estimates would typically be obtained by assuming that the systematic sample of strips in each direction is a simple random sample. If there is a strong trend in density through the region, then the

2 stratification methods considered by Fewster et al. (in prep.) may be used to reduce the resulting upward bias in variance estimates. We can combine the two data sets, by including the squares at which two strips intersect just once, along with the number of unique plants detected in these squares. To estimate variance, systematically-spaced sampling units can be defined as shown in Fig. 3 of the main paper, and a simple random sampling variance estimate calculated. The finite population correction is one strip). a / A, where a is the size of the covered region (the total area covered by at least 3. A full likelihood approach Using any of the methods outlined in the main paper or above, we could add a further component to our likelihoods, corresponding to inference about population size N, given the data in our survey strips. For example, if we believed that all plants within the survey strips are detected, and conducted a strip transect analysis based on total number of plants detected within the combined area A + B + C (Fig. ), then the following binomial likelihood allows us to draw inference on abundance N: L ( N n N; n) = P c Pc n ( ) N n where n + = na + nb nc and c P is the (known) proportion of the survey region covered by the survey strips. This approach relies on the assumption that plants are uniformly and independently distributed through the survey region (beyond as well as within the searched strips), and inference is not robust to failures of this assumption, with variances typically underestimated; hence we rely on more robust design-based extrapolation from the covered region to the entire survey region (Borchers and Burnham, 004). An alternative approach to the methods of this paper would be to formulate spatial models for plant density, for example extending the methods of Hedley et al. (004).

3 4. Application to population of known size Yellow split peas were used to simulate a population of plants on a square of mown grass with sides 5m long. They were placed within the survey region in clusters, where cluster sizes were drawn from a Poisson distribution with mean 3. True number of individuals was 4 in 67 clusters. Here, we estimate cluster abundance only. The clusters were distributed through the survey region so as to have a markedly non-uniform distribution with respect to distance from the closest transect line. A crossed design was used, with five lines running in the N/S direction, and five lines E/W. Truncation distance w was set at.5m, so that each strip was 5m wide. Thus strips were contiguous, with five strips spanning the entire survey region. After pooling across each set of five strips, the actual distribution of split peas is shown in Web Fig., together with the distribution of detected peas. A half-normal model was found to provide a good fit to the distance data (Kolmogorov- Smirnov and Cramér-von Mises tests, p > 0.7). AIC indicated that data could be pooled across the two sets of lines (AIC=87.06, compared with summed AIC=87.9 for separate fits; AIC=0.86). For each set of lines, the standard analytic variance was corrected for a finite population correction as described by Buckland et al. (00:87), to allow for the fact that the whole survey region coincided with the covered area. Because for the two sets of lines combined, qe = qn = A/ a =, estimators ˆN and ˆN of the main paper are equivalent. Further, because we assume P = P P say, then we obtain E N = n n Nˆ E + N = and the approximate result (using the Pˆ delta method) [ cv( Nˆ )] Pˆ)] = [ cv( ne + nn )] + [ cv(. To evaluate ( n E nn ) cv +, the variances of n E and n N were again evaluated using a finite population correction. The above analyses gave an estimate of 63 clusters using the data recorded from the N/S lines, with 95% confidence interval (48,83), compared with true abundance of 67. The E/W data 3

4 gave an estimate of 56, with 95% confidence interval (4,76). The above estimator for the pooled data yielded an estimate of 59 clusters with 95% confidence interval (47,74). The distance of objects from the nearest line did not differ significantly from a uniform distribution for either set of lines (Kolmogorov-Smirnov, Cramér-von Mises and χ tests, p > 0.8 in each case). This is despite the fact that the split peas were positioned within strips markedly non-uniformly; pooling of data across five strips in each direction was sufficient to remove evidence of this non-uniformity. Similarly, data were consistent with the assumption that detection on the line was certain ( p ( 0) = ), so results are not given for these cases. 5. Simulation study to assess the performance of the proposed bootstrap method The purpose of this simulation study is to assess the validity of the proposed bootstrapping method. We concentrate therefore on one estimation method: conventional distance sampling as defined in Section.3 of the main paper. We assume that the bootstrap sampling units are as shown in the left-hand side of Fig. 3. These units are systematically spaced through the study region. If objects are uniformly and independently distributed through the region, this should be indistinguishable from simple random sampling, and we explore this scenario first. We then consider the case of a population in which objects occur in clusters, for which the spatial scale of autocorrelation is smaller than the size of a single bootstrap unit. The assumption of independence between units is thus likely to be only slightly compromised, so that a priori, we would still expect the method to do well. The third scenario considers a more extreme case of clustering, with fewer larger clusters, and with the spatial correlation occurring over a greater distance. The final scenario is where there is a linear trend in object density, with zero density along one edge. In this case, systematic sampling gives greater precision than simple random sampling, but by using simple bootstrapping, we fail to exploit this better precision, so that we 4

5 expect the bootstrap variance estimates to be biased high. Fewster et al. (in prep.) explore the issue of variance estimation when transects are laid down according to a systematic design. We thus investigate the following four scenarios, which are designed to be broadly comparable with the Fleecefaulds study area:. A population of N = 000 objects is randomly distributed with constant rate through a square area of side 08m. A systematic grid of lines is positioned over the survey area at random. Line separation is 9m in both the E/W and the N/S directions, and the half-width of each strip centred on each line is w =.5 m. Sampling extends into a bufferzone, extending a distance w =.5 m beyond the survey region boundaries, to avoid bias due to an edge effect (Strindberg et al., 004:00). Defining the Cartesian coordinates of the corners of the study region to be (0,0), (0,08), (08,0) and (08,08), then whatever distribution we give the population of objects, we can ensure a uniform distribution of distances of objects from both the E/W and the N/S lines in a design-based sense by starting the first N/S line at random in the interval (- 4.5,4.5) along the x-axis, and the first E/W line at random in the interval (-4.5,4.5) along the y-axis. Note that we do not achieve this assumption in general if strips are given fixed positions even if we then select a simple random sample of these strips. Detections within the strip are simulated from a half-normal detection function with scale parameter σ = w/ = 0. 75, so that expected sample size from the E/W lines is approximately 00, and similarly for the N/S lines. For each simulated population, the parameter is estimated by maximum likelihood, assuming a truncated halfnormal. (Note that, by generating the position of the first line at random between (- 4.5,4.5), and continuing sampling up to 4.5m outside the opposite boundary, we generate 3 lines. If one of these is more than.5m outside the study region, it is not 5

6 surveyed. If line say is inside the bufferzone, then part of the strip of half-width.5m will be inside the study region, and by surveying this, we exactly compensate for a corresponding part of the strip centred on line 3, which falls outside the study region, given the above set-up. The reason for extending out to 4.5m from the study region boundary is to ensure that partial bootstrap sampling units at the edge are all identified.). As for scenario, except that the population of N = 000 objects is now in 40 clusters, each of 5 objects. The 40 cluster locations are randomly distributed with constant rate through the study region, and the 5 objects associated with each location are given normally distributed x and y coordinates, centred on the cluster location and with standard deviation of m. Objects falling outside the study region are wrapped around to the opposite side. 3. As for scenario, except that there are now just 8 clusters, with 5 objects per cluster, and the standard deviation of each normal distribution is now m. 4. As for scenario, except that the population of N = 000 objects is independently distributed, but with linearly increasing rate with distance along the y-axis, starting at zero density along the x-axis. A single simulated population corresponding to each scenario is shown in Web Fig.. We summarise in Web Table results obtained from 300 simulations of each scenario, using 399 bootstrap replications for each of the 300 populations (Buckland, 984). The following two estimators of population size N were considered: ˆ A n n N = ˆ E + a PE P ˆN N (eqn (3)) and ˆ n n N = 0.5 E + ˆ N qp qpˆ E E N N (eqn (4)). Two implementations of the bootstrap were considered: one in which sampling units were resampled without regard for their size, and one in which the 6

7 resamples were constrained so that each edge configuration appeared in the resample the same number of times as in the original sample. There are two consequences of this second strategy, one beneficial and one detrimental. The beneficial consequence is that the size of the study region, the size of the covered area, and the total line length in each direction are all constant across bootstrap resamples, while the negative consequence is that each of the four corner configurations is unique, so each appears exactly once in each bootstrap resample, which can be expected to generate bias in the variance estimates. Results suggest that the constrained bootstrap performs less well than the unconstrained method for clustered data, and especially poorly for the extreme clustering of scenario 3. Also, estimator ˆN is more precise than ˆN for all but scenario 3, and in that case, the bootstrap variance of ˆN is biased high, so that the improved precision would not be apparent. Using either ˆN or ˆN together with a finite population correction, the means of the unconstrained bootstrap standard errors differ only by small amounts, explicable by Monte Carlo variation, from the standard deviations of estimated abundances for scenarios, and 4, confirming that the proposed bootstrap method works well. For these three cases, incorporating covariance into the variance estimator for ˆN makes little difference. For scenario 3, in which object distribution is very clustered, bootstrap variances are biased high, especially when covariance is allowed for; this possibly reflects instability in the bootstrap estimator for such extreme object distributions (see Web Fig. ), which might be made worse by estimating additional covariance terms. 7

8 References Borchers, D.L. and Burnham, K.P. (004). General formulation for distance sampling. In Advanced Distance Sampling, pp6-30. S.T. Buckland, D.R. Anderson, K.P. Burnham, J.L. Laake, D.L. Borchers and L. Thomas (eds). Oxford University Press, Oxford. Buckland, S.T. (984). Monte Carlo confidence intervals. Biometrics 40, Fewster, R.M., Buckland, S.T., Burnham, K.P., Borchers, D.L., Laake, J.L. and Thomas, L. (in prep.). Estimating the encounter rate variance in distance sampling. Hedley, S.L., Buckland, S.T. and Borchers, D.L. (004). Spatial distance sampling models. In Advanced Distance Sampling, pp S.T. Buckland, D.R. Anderson, K.P. Burnham, J.L. Laake, D.L. Borchers and L. Thomas (eds). Oxford University Press, Oxford. Strindberg, S., Buckland, S.T. and Thomas, L. (004). Design of distance sampling surveys and Geographic Information Systems. In Advanced Distance Sampling, pp90-8. S.T. Buckland, D.R. Anderson, K.P. Burnham, J.L. Laake, D.L. Borchers and L. Thomas (eds). Oxford University Press, Oxford. 8

9 Web Table. Mean bootstrap estimates of standard error for estimators ˆN and ˆN, using both a standard and a constrained bootstrap, averaged across 300 simulations, with a finite population correction ( se ( N ˆ BF ) and se ( N ˆ BF ) ) and without ( se ( N ˆ B ) and se ( N ˆ B ) ). For estimator ˆN, we also show se ˆ BFC ( N ), the bootstrap estimate of standard error with finite population correction and incorporating covariance between n E and n N and between Pˆ E and Pˆ N (see eqn (7)). Also shown is the sample standard deviation of the estimates of N corresponding to each estimator from the 300 simulations of each scenario ( sd( N ˆ,sim ) and sd( N ˆ,sim) ). Standard bootstrap: Scenario Scenario Scenario 3 Scenario 4 se ( N ˆ ) B se ( N ˆ ) BF se ˆ BFC ( N ) sd( N ˆ ) ,sim se ( N ˆ ) B se ˆ BF ( N ) sd( N ˆ ) ,sim Constrained bootstrap: se ( N ˆ ) B se ˆ BF ( N ) sd( N ˆ ) ,sim se ( N ˆ ) B se ˆ BF ( N ) sd( N ˆ ) ,sim 9

10 Web Fig.. Histograms of numbers of split peas by distance from the nearest line. The shaded bars correspond to detected peas. Left-hand-side: N/S lines. Right-hand-side: E/W lines. Frequency Frequency Distance from line Distance from line 0

11 Web Fig.. A single simulated population is shown corresponding to each of scenarios -4. Scenario

12 Scenario

13 Scenario

14 Scenario

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:

More information

Package distance.sample.size

Package distance.sample.size Type Package Package distance.sample.size January 26, 2016 Title Calculates Study Size Required for Distance Sampling Version 0.0 Date 2015-12-17 Author Robert Clark Maintainer Robert Clark

More information

Bootstrap Confidence Interval of the Difference Between Two Process Capability Indices

Bootstrap Confidence Interval of the Difference Between Two Process Capability Indices Int J Adv Manuf Technol (2003) 21:249 256 Ownership and Copyright 2003 Springer-Verlag London Limited Bootstrap Confidence Interval of the Difference Between Two Process Capability Indices J.-P. Chen 1

More information

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea Chapter 3 Bootstrap 3.1 Introduction The estimation of parameters in probability distributions is a basic problem in statistics that one tends to encounter already during the very first course on the subject.

More information

Fast Automated Estimation of Variance in Discrete Quantitative Stochastic Simulation

Fast Automated Estimation of Variance in Discrete Quantitative Stochastic Simulation Fast Automated Estimation of Variance in Discrete Quantitative Stochastic Simulation November 2010 Nelson Shaw njd50@uclive.ac.nz Department of Computer Science and Software Engineering University of Canterbury,

More information

A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995)

A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995) A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995) Department of Information, Operations and Management Sciences Stern School of Business, NYU padamopo@stern.nyu.edu

More information

Estimation bias under model selection for distance sampling detection functions

Estimation bias under model selection for distance sampling detection functions Environ Ecol Stat (2017) 24:399 414 DOI 10.1007/s10651-017-0376-0 Estimation bias under model selection for distance sampling detection functions Rocio Prieto Gonzalez 1 Len Thomas 1 Tiago A. Marques 1,2

More information

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer Model Assessment and Selection Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Model Training data Testing data Model Testing error rate Training error

More information

Driven Cavity Example

Driven Cavity Example BMAppendixI.qxd 11/14/12 6:55 PM Page I-1 I CFD Driven Cavity Example I.1 Problem One of the classic benchmarks in CFD is the driven cavity problem. Consider steady, incompressible, viscous flow in a square

More information

Robotics. Lecture 5: Monte Carlo Localisation. See course website for up to date information.

Robotics. Lecture 5: Monte Carlo Localisation. See course website  for up to date information. Robotics Lecture 5: Monte Carlo Localisation See course website http://www.doc.ic.ac.uk/~ajd/robotics/ for up to date information. Andrew Davison Department of Computing Imperial College London Review:

More information

Cross-validation. Cross-validation is a resampling method.

Cross-validation. Cross-validation is a resampling method. Cross-validation Cross-validation is a resampling method. It refits a model of interest to samples formed from the training set, in order to obtain additional information about the fitted model. For example,

More information

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data Spatial Patterns We will examine methods that are used to analyze patterns in two sorts of spatial data: Point Pattern Analysis - These methods concern themselves with the location information associated

More information

The Bootstrap and Jackknife

The Bootstrap and Jackknife The Bootstrap and Jackknife Summer 2017 Summer Institutes 249 Bootstrap & Jackknife Motivation In scientific research Interest often focuses upon the estimation of some unknown parameter, θ. The parameter

More information

Resampling Methods. Levi Waldron, CUNY School of Public Health. July 13, 2016

Resampling Methods. Levi Waldron, CUNY School of Public Health. July 13, 2016 Resampling Methods Levi Waldron, CUNY School of Public Health July 13, 2016 Outline and introduction Objectives: prediction or inference? Cross-validation Bootstrap Permutation Test Monte Carlo Simulation

More information

Dealing with Categorical Data Types in a Designed Experiment

Dealing with Categorical Data Types in a Designed Experiment Dealing with Categorical Data Types in a Designed Experiment Part II: Sizing a Designed Experiment When Using a Binary Response Best Practice Authored by: Francisco Ortiz, PhD STAT T&E COE The goal of

More information

Response to API 1163 and Its Impact on Pipeline Integrity Management

Response to API 1163 and Its Impact on Pipeline Integrity Management ECNDT 2 - Tu.2.7.1 Response to API 3 and Its Impact on Pipeline Integrity Management Munendra S TOMAR, Martin FINGERHUT; RTD Quality Services, USA Abstract. Knowing the accuracy and reliability of ILI

More information

CHAPTER 4. Numerical Models. descriptions of the boundary conditions, element types, validation, and the force

CHAPTER 4. Numerical Models. descriptions of the boundary conditions, element types, validation, and the force CHAPTER 4 Numerical Models This chapter presents the development of numerical models for sandwich beams/plates subjected to four-point bending and the hydromat test system. Detailed descriptions of the

More information

Machine Learning: An Applied Econometric Approach Online Appendix

Machine Learning: An Applied Econometric Approach Online Appendix Machine Learning: An Applied Econometric Approach Online Appendix Sendhil Mullainathan mullain@fas.harvard.edu Jann Spiess jspiess@fas.harvard.edu April 2017 A How We Predict In this section, we detail

More information

Chapter 6 Normal Probability Distributions

Chapter 6 Normal Probability Distributions Chapter 6 Normal Probability Distributions 6-1 Review and Preview 6-2 The Standard Normal Distribution 6-3 Applications of Normal Distributions 6-4 Sampling Distributions and Estimators 6-5 The Central

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION Introduction CHAPTER 1 INTRODUCTION Mplus is a statistical modeling program that provides researchers with a flexible tool to analyze their data. Mplus offers researchers a wide choice of models, estimators,

More information

Descriptive Statistics, Standard Deviation and Standard Error

Descriptive Statistics, Standard Deviation and Standard Error AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.

More information

INVESTIGATIONS OF CROSS-CORRELATION AND EUCLIDEAN DISTANCE TARGET MATCHING TECHNIQUES IN THE MPEF ENVIRONMENT. Greg And Ken Holmlund # ABSTRACT

INVESTIGATIONS OF CROSS-CORRELATION AND EUCLIDEAN DISTANCE TARGET MATCHING TECHNIQUES IN THE MPEF ENVIRONMENT. Greg And Ken Holmlund # ABSTRACT INVESTIGATIONS OF CROSS-CORRELATION AND EUCLIDEAN DISTANCE TARGET MATCHING TECHNIQUES IN THE MPEF ENVIRONME Greg Dew @ And Ken Holmlund # @ Logica # EUMETSAT ABSTRACT Cross-Correlation and Euclidean Distance

More information

Table of Contents (As covered from textbook)

Table of Contents (As covered from textbook) Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression

More information

CHAPTER 2 Modeling Distributions of Data

CHAPTER 2 Modeling Distributions of Data CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Density Curves

More information

Introduction to Mplus

Introduction to Mplus Introduction to Mplus May 12, 2010 SPONSORED BY: Research Data Centre Population and Life Course Studies PLCS Interdisciplinary Development Initiative Piotr Wilk piotr.wilk@schulich.uwo.ca OVERVIEW Mplus

More information

STATS PAD USER MANUAL

STATS PAD USER MANUAL STATS PAD USER MANUAL For Version 2.0 Manual Version 2.0 1 Table of Contents Basic Navigation! 3 Settings! 7 Entering Data! 7 Sharing Data! 8 Managing Files! 10 Running Tests! 11 Interpreting Output! 11

More information

Lab 5 - Risk Analysis, Robustness, and Power

Lab 5 - Risk Analysis, Robustness, and Power Type equation here.biology 458 Biometry Lab 5 - Risk Analysis, Robustness, and Power I. Risk Analysis The process of statistical hypothesis testing involves estimating the probability of making errors

More information

Fathom Dynamic Data TM Version 2 Specifications

Fathom Dynamic Data TM Version 2 Specifications Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other

More information

EE 8591 Homework 4 (10 pts) Fall 2018 SOLUTIONS Topic: SVM classification and regression GRADING: Problems 1,2,4 3pts each, Problem 3 1 point.

EE 8591 Homework 4 (10 pts) Fall 2018 SOLUTIONS Topic: SVM classification and regression GRADING: Problems 1,2,4 3pts each, Problem 3 1 point. 1 EE 8591 Homework 4 (10 pts) Fall 2018 SOLUTIONS Topic: SVM classification and regression GRADING: Problems 1,2,4 3pts each, Problem 3 1 point. Problem 1 (problem 7.6 from textbook) C=10e- 4 C=10e- 3

More information

Bootstrapping Method for 14 June 2016 R. Russell Rhinehart. Bootstrapping

Bootstrapping Method for  14 June 2016 R. Russell Rhinehart. Bootstrapping Bootstrapping Method for www.r3eda.com 14 June 2016 R. Russell Rhinehart Bootstrapping This is extracted from the book, Nonlinear Regression Modeling for Engineering Applications: Modeling, Model Validation,

More information

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &

More information

Modelling and Quantitative Methods in Fisheries

Modelling and Quantitative Methods in Fisheries SUB Hamburg A/553843 Modelling and Quantitative Methods in Fisheries Second Edition Malcolm Haddon ( r oc) CRC Press \ y* J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of

More information

Generating random samples from user-defined distributions

Generating random samples from user-defined distributions The Stata Journal (2011) 11, Number 2, pp. 299 304 Generating random samples from user-defined distributions Katarína Lukácsy Central European University Budapest, Hungary lukacsy katarina@phd.ceu.hu Abstract.

More information

BIOL Gradation of a histogram (a) into the normal curve (b)

BIOL Gradation of a histogram (a) into the normal curve (b) (التوزيع الطبيعي ( Distribution Normal (Gaussian) One of the most important distributions in statistics is a continuous distribution called the normal distribution or Gaussian distribution. Consider the

More information

Predictive Interpolation for Registration

Predictive Interpolation for Registration Predictive Interpolation for Registration D.G. Bailey Institute of Information Sciences and Technology, Massey University, Private bag 11222, Palmerston North D.G.Bailey@massey.ac.nz Abstract Predictive

More information

Optimizing Speech Recognition Evaluation Using Stratified Sampling

Optimizing Speech Recognition Evaluation Using Stratified Sampling INTERSPEECH 01 September 1, 01, San Francisco, USA Optimizing Speech Recognition Evaluation Using Stratified Sampling Janne Pylkkönen, Thomas Drugman, Max Bisani Amazon {jannepyl, drugman, bisani}@amazon.com

More information

Measures of Dispersion

Measures of Dispersion Lesson 7.6 Objectives Find the variance of a set of data. Calculate standard deviation for a set of data. Read data from a normal curve. Estimate the area under a curve. Variance Measures of Dispersion

More information

Part 2 Introductory guides to the FMSP stock assessment software

Part 2 Introductory guides to the FMSP stock assessment software Part 2 Introductory guides to the FMSP stock assessment software 127 6. LFDA software Length Frequency Data Analysis G.P. Kirkwood and D.D. Hoggarth The LFDA (Length Frequency Data Analysis) package was

More information

Quasi-Monte Carlo Methods Combating Complexity in Cost Risk Analysis

Quasi-Monte Carlo Methods Combating Complexity in Cost Risk Analysis Quasi-Monte Carlo Methods Combating Complexity in Cost Risk Analysis Blake Boswell Booz Allen Hamilton ISPA / SCEA Conference Albuquerque, NM June 2011 1 Table Of Contents Introduction Monte Carlo Methods

More information

Estimating Map Accuracy without a Spatially Representative Training Sample

Estimating Map Accuracy without a Spatially Representative Training Sample Estimating Map Accuracy without a Spatially Representative Training Sample David A. Patterson, Mathematical Sciences, The University of Montana-Missoula Brian M. Steele, Mathematical Sciences, The University

More information

Further Simulation Results on Resampling Confidence Intervals for Empirical Variograms

Further Simulation Results on Resampling Confidence Intervals for Empirical Variograms University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Information Sciences 2010 Further Simulation Results on Resampling Confidence

More information

Chapter 5. Track Geometry Data Analysis

Chapter 5. Track Geometry Data Analysis Chapter Track Geometry Data Analysis This chapter explains how and why the data collected for the track geometry was manipulated. The results of these studies in the time and frequency domain are addressed.

More information

Chapter 2 Modeling Distributions of Data

Chapter 2 Modeling Distributions of Data Chapter 2 Modeling Distributions of Data Section 2.1 Describing Location in a Distribution Describing Location in a Distribution Learning Objectives After this section, you should be able to: FIND and

More information

Variance Estimation in Presence of Imputation: an Application to an Istat Survey Data

Variance Estimation in Presence of Imputation: an Application to an Istat Survey Data Variance Estimation in Presence of Imputation: an Application to an Istat Survey Data Marco Di Zio, Stefano Falorsi, Ugo Guarnera, Orietta Luzi, Paolo Righi 1 Introduction Imputation is the commonly used

More information

UNIT 15 GRAPHICAL PRESENTATION OF DATA-I

UNIT 15 GRAPHICAL PRESENTATION OF DATA-I UNIT 15 GRAPHICAL PRESENTATION OF DATA-I Graphical Presentation of Data-I Structure 15.1 Introduction Objectives 15.2 Graphical Presentation 15.3 Types of Graphs Histogram Frequency Polygon Frequency Curve

More information

IQC monitoring in laboratory networks

IQC monitoring in laboratory networks IQC for Networked Analysers Background and instructions for use IQC monitoring in laboratory networks Modern Laboratories continue to produce large quantities of internal quality control data (IQC) despite

More information

Bootstrapping Methods

Bootstrapping Methods Bootstrapping Methods example of a Monte Carlo method these are one Monte Carlo statistical method some Bayesian statistical methods are Monte Carlo we can also simulate models using Monte Carlo methods

More information

RESAMPLING METHODS. Chapter 05

RESAMPLING METHODS. Chapter 05 1 RESAMPLING METHODS Chapter 05 2 Outline Cross Validation The Validation Set Approach Leave-One-Out Cross Validation K-fold Cross Validation Bias-Variance Trade-off for k-fold Cross Validation Cross Validation

More information

Physics 736. Experimental Methods in Nuclear-, Particle-, and Astrophysics. - Statistical Methods -

Physics 736. Experimental Methods in Nuclear-, Particle-, and Astrophysics. - Statistical Methods - Physics 736 Experimental Methods in Nuclear-, Particle-, and Astrophysics - Statistical Methods - Karsten Heeger heeger@wisc.edu Course Schedule and Reading course website http://neutrino.physics.wisc.edu/teaching/phys736/

More information

Analog input to digital output correlation using piecewise regression on a Multi Chip Module

Analog input to digital output correlation using piecewise regression on a Multi Chip Module Analog input digital output correlation using piecewise regression on a Multi Chip Module Farrin Hockett ECE 557 Engineering Data Analysis and Modeling. Fall, 2004 - Portland State University Abstract

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

Written by Donna Hiestand-Tupper CCBC - Essex TI 83 TUTORIAL. Version 3.0 to accompany Elementary Statistics by Mario Triola, 9 th edition

Written by Donna Hiestand-Tupper CCBC - Essex TI 83 TUTORIAL. Version 3.0 to accompany Elementary Statistics by Mario Triola, 9 th edition TI 83 TUTORIAL Version 3.0 to accompany Elementary Statistics by Mario Triola, 9 th edition Written by Donna Hiestand-Tupper CCBC - Essex 1 2 Math 153 - Introduction to Statistical Methods TI 83 (PLUS)

More information

demonstrate an understanding of the exponent rules of multiplication and division, and apply them to simplify expressions Number Sense and Algebra

demonstrate an understanding of the exponent rules of multiplication and division, and apply them to simplify expressions Number Sense and Algebra MPM 1D - Grade Nine Academic Mathematics This guide has been organized in alignment with the 2005 Ontario Mathematics Curriculum. Each of the specific curriculum expectations are cross-referenced to the

More information

Chapter 3 Analysis of Original Steel Post

Chapter 3 Analysis of Original Steel Post Chapter 3. Analysis of original steel post 35 Chapter 3 Analysis of Original Steel Post This type of post is a real functioning structure. It is in service throughout the rail network of Spain as part

More information

IQR = number. summary: largest. = 2. Upper half: Q3 =

IQR = number. summary: largest. = 2. Upper half: Q3 = Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number

More information

Land Cover Stratified Accuracy Assessment For Digital Elevation Model derived from Airborne LIDAR Dade County, Florida

Land Cover Stratified Accuracy Assessment For Digital Elevation Model derived from Airborne LIDAR Dade County, Florida Land Cover Stratified Accuracy Assessment For Digital Elevation Model derived from Airborne LIDAR Dade County, Florida FINAL REPORT Submitted October 2004 Prepared by: Daniel Gann Geographic Information

More information

Input and Structure Selection for k-nn Approximator

Input and Structure Selection for k-nn Approximator Input and Structure Selection for k- Approximator Antti Soramaa ima Reyhani and Amaury Lendasse eural etwork Research Centre Helsinki University of Technology P.O. Box 5400 005 spoo Finland {asorama nreyhani

More information

MAT 110 WORKSHOP. Updated Fall 2018

MAT 110 WORKSHOP. Updated Fall 2018 MAT 110 WORKSHOP Updated Fall 2018 UNIT 3: STATISTICS Introduction Choosing a Sample Simple Random Sample: a set of individuals from the population chosen in a way that every individual has an equal chance

More information

IBL and clustering. Relationship of IBL with CBR

IBL and clustering. Relationship of IBL with CBR IBL and clustering Distance based methods IBL and knn Clustering Distance based and hierarchical Probability-based Expectation Maximization (EM) Relationship of IBL with CBR + uses previously processed

More information

Acknowledgments. Acronyms

Acknowledgments. Acronyms Acknowledgments Preface Acronyms xi xiii xv 1 Basic Tools 1 1.1 Goals of inference 1 1.1.1 Population or process? 1 1.1.2 Probability samples 2 1.1.3 Sampling weights 3 1.1.4 Design effects. 5 1.2 An introduction

More information

HOW TO PROVE AND ASSESS CONFORMITY OF GUM-SUPPORTING SOFTWARE PRODUCTS

HOW TO PROVE AND ASSESS CONFORMITY OF GUM-SUPPORTING SOFTWARE PRODUCTS XX IMEKO World Congress Metrology for Green Growth September 9-14, 2012, Busan, Republic of Korea HOW TO PROVE AND ASSESS CONFORMITY OF GUM-SUPPORTING SOFTWARE PRODUCTS N. Greif, H. Schrepf Physikalisch-Technische

More information

Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM

Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM * Which directories are used for input files and output files? See menu-item "Options" and page 22 in the manual.

More information

Using the DATAMINE Program

Using the DATAMINE Program 6 Using the DATAMINE Program 304 Using the DATAMINE Program This chapter serves as a user s manual for the DATAMINE program, which demonstrates the algorithms presented in this book. Each menu selection

More information

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order. Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good

More information

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Examples: Mixture Modeling With Cross-Sectional Data CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Mixture modeling refers to modeling with categorical latent variables that represent

More information

Testing Random- Number Generators

Testing Random- Number Generators Testing Random- Number Generators Raj Jain Washington University Saint Louis, MO 63131 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse574-06/ 27-1 Overview

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 13: The bootstrap (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 30 Resampling 2 / 30 Sampling distribution of a statistic For this lecture: There is a population model

More information

Distribution-free Predictive Approaches

Distribution-free Predictive Approaches Distribution-free Predictive Approaches The methods discussed in the previous sections are essentially model-based. Model-free approaches such as tree-based classification also exist and are popular for

More information

Spatial Interpolation & Geostatistics

Spatial Interpolation & Geostatistics (Z i Z j ) 2 / 2 Spatial Interpolation & Geostatistics Lag Lag Mean Distance between pairs of points 1 Tobler s Law All places are related, but nearby places are related more than distant places Corollary:

More information

Categorical Data in a Designed Experiment Part 2: Sizing with a Binary Response

Categorical Data in a Designed Experiment Part 2: Sizing with a Binary Response Categorical Data in a Designed Experiment Part 2: Sizing with a Binary Response Authored by: Francisco Ortiz, PhD Version 2: 19 July 2018 Revised 18 October 2018 The goal of the STAT COE is to assist in

More information

Developments in harmonised scoring systems and data presentation 'The ABC of EQA

Developments in harmonised scoring systems and data presentation 'The ABC of EQA UK NEQAS UK NEQAS FOR CLINICAL CHEMISTRY UNITED KINGDOM NATIONAL EXTERNAL QUALITY ASSESSMENT SCHEMES Developments in harmonised scoring systems and data presentation 'The ABC of EQA Dr David G Bullock

More information

Topics in Machine Learning-EE 5359 Model Assessment and Selection

Topics in Machine Learning-EE 5359 Model Assessment and Selection Topics in Machine Learning-EE 5359 Model Assessment and Selection Ioannis D. Schizas Electrical Engineering Department University of Texas at Arlington 1 Training and Generalization Training stage: Utilizing

More information

Random Number Generation and Monte Carlo Methods

Random Number Generation and Monte Carlo Methods James E. Gentle Random Number Generation and Monte Carlo Methods With 30 Illustrations Springer Contents Preface vii 1 Simulating Random Numbers from a Uniform Distribution 1 1.1 Linear Congruential Generators

More information

Kernel Density Estimation (KDE)

Kernel Density Estimation (KDE) Kernel Density Estimation (KDE) Previously, we ve seen how to use the histogram method to infer the probability density function (PDF) of a random variable (population) using a finite data sample. In this

More information

More Summer Program t-shirts

More Summer Program t-shirts ICPSR Blalock Lectures, 2003 Bootstrap Resampling Robert Stine Lecture 2 Exploring the Bootstrap Questions from Lecture 1 Review of ideas, notes from Lecture 1 - sample-to-sample variation - resampling

More information

The Effect of Changing Grid Size in the Creation of Laser Scanner Digital Surface Models

The Effect of Changing Grid Size in the Creation of Laser Scanner Digital Surface Models The Effect of Changing Grid Size in the Creation of Laser Scanner Digital Surface Models Smith, S.L 1, Holland, D.A 1, and Longley, P.A 2 1 Research & Innovation, Ordnance Survey, Romsey Road, Southampton,

More information

BASIC SIMULATION CONCEPTS

BASIC SIMULATION CONCEPTS BASIC SIMULATION CONCEPTS INTRODUCTION Simulation is a technique that involves modeling a situation and performing experiments on that model. A model is a program that imitates a physical or business process

More information

Supervised vs unsupervised clustering

Supervised vs unsupervised clustering Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful

More information

Use of Extreme Value Statistics in Modeling Biometric Systems

Use of Extreme Value Statistics in Modeling Biometric Systems Use of Extreme Value Statistics in Modeling Biometric Systems Similarity Scores Two types of matching: Genuine sample Imposter sample Matching scores Enrolled sample 0.95 0.32 Probability Density Decision

More information

Lecture 3: Chapter 3

Lecture 3: Chapter 3 Lecture 3: Chapter 3 C C Moxley UAB Mathematics 12 September 16 3.2 Measurements of Center Statistics involves describing data sets and inferring things about them. The first step in understanding a set

More information

The organization of the human cerebral cortex estimated by intrinsic functional connectivity

The organization of the human cerebral cortex estimated by intrinsic functional connectivity 1 The organization of the human cerebral cortex estimated by intrinsic functional connectivity Journal: Journal of Neurophysiology Author: B. T. Thomas Yeo, et al Link: https://www.ncbi.nlm.nih.gov/pubmed/21653723

More information

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe Introduction to the Practice of Statistics Fifth Edition Moore, McCabe Section 1.3 Homework Answers Assignment 5 1.80 If you ask a computer to generate "random numbers between 0 and 1, you uniform will

More information

Spatial Interpolation - Geostatistics 4/3/2018

Spatial Interpolation - Geostatistics 4/3/2018 Spatial Interpolation - Geostatistics 4/3/201 (Z i Z j ) 2 / 2 Spatial Interpolation & Geostatistics Lag Distance between pairs of points Lag Mean Tobler s Law All places are related, but nearby places

More information

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR Detecting Missing and Spurious Edges in Large, Dense Networks Using Parallel Computing Samuel Coolidge, sam.r.coolidge@gmail.com Dan Simon, des480@nyu.edu Dennis Shasha, shasha@cims.nyu.edu Technical Report

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1

More information

On Pairwise Connectivity of Wireless Multihop Networks

On Pairwise Connectivity of Wireless Multihop Networks On Pairwise Connectivity of Wireless Multihop Networks Fangting Sun and Mark Shayman Department of Electrical and Computer Engineering University of Maryland, College Park, MD 2742 {ftsun, shayman}@eng.umd.edu

More information

Box-Cox Transformation for Simple Linear Regression

Box-Cox Transformation for Simple Linear Regression Chapter 192 Box-Cox Transformation for Simple Linear Regression Introduction This procedure finds the appropriate Box-Cox power transformation (1964) for a dataset containing a pair of variables that are

More information

ACCURACY AND EFFICIENCY OF MONTE CARLO METHOD. Julius Goodman. Bechtel Power Corporation E. Imperial Hwy. Norwalk, CA 90650, U.S.A.

ACCURACY AND EFFICIENCY OF MONTE CARLO METHOD. Julius Goodman. Bechtel Power Corporation E. Imperial Hwy. Norwalk, CA 90650, U.S.A. - 430 - ACCURACY AND EFFICIENCY OF MONTE CARLO METHOD Julius Goodman Bechtel Power Corporation 12400 E. Imperial Hwy. Norwalk, CA 90650, U.S.A. ABSTRACT The accuracy of Monte Carlo method of simulating

More information

This work is about a new method for generating diffusion curve style images. Although this topic is dealing with non-photorealistic rendering, as you

This work is about a new method for generating diffusion curve style images. Although this topic is dealing with non-photorealistic rendering, as you This work is about a new method for generating diffusion curve style images. Although this topic is dealing with non-photorealistic rendering, as you will see our underlying solution is based on two-dimensional

More information

Bias-Variance Analysis of Ensemble Learning

Bias-Variance Analysis of Ensemble Learning Bias-Variance Analysis of Ensemble Learning Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 http://www.cs.orst.edu/~tgd Outline Bias-Variance Decomposition

More information

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable. 5-number summary 68-95-99.7 Rule Area principle Bar chart Bimodal Boxplot Case Categorical data Categorical variable Center Changing center and spread Conditional distribution Context Contingency table

More information

A noninformative Bayesian approach to small area estimation

A noninformative Bayesian approach to small area estimation A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported

More information

cse 252c Fall 2004 Project Report: A Model of Perpendicular Texture for Determining Surface Geometry

cse 252c Fall 2004 Project Report: A Model of Perpendicular Texture for Determining Surface Geometry cse 252c Fall 2004 Project Report: A Model of Perpendicular Texture for Determining Surface Geometry Steven Scher December 2, 2004 Steven Scher SteveScher@alumni.princeton.edu Abstract Three-dimensional

More information

Supplementary Figure 1. Decoding results broken down for different ROIs

Supplementary Figure 1. Decoding results broken down for different ROIs Supplementary Figure 1 Decoding results broken down for different ROIs Decoding results for areas V1, V2, V3, and V1 V3 combined. (a) Decoded and presented orientations are strongly correlated in areas

More information

How Random is Random?

How Random is Random? "!$#%!&(' )*!$#+, -/.(#2 cd4me 3%46587:9=?46@A;CBEDGF 7H;>I846=?7H;>JLKM7ONQPRKSJL4T@8KM4SUV7O@8W X 46@A;u+4mg^hb@8ub;>ji;>jk;t"q(cufwvaxay6vaz

More information

Integration. Volume Estimation

Integration. Volume Estimation Monte Carlo Integration Lab Objective: Many important integrals cannot be evaluated symbolically because the integrand has no antiderivative. Traditional numerical integration techniques like Newton-Cotes

More information

6-1 THE STANDARD NORMAL DISTRIBUTION

6-1 THE STANDARD NORMAL DISTRIBUTION 6-1 THE STANDARD NORMAL DISTRIBUTION The major focus of this chapter is the concept of a normal probability distribution, but we begin with a uniform distribution so that we can see the following two very

More information

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and

More information

Data: a collection of numbers or facts that require further processing before they are meaningful

Data: a collection of numbers or facts that require further processing before they are meaningful Digital Image Classification Data vs. Information Data: a collection of numbers or facts that require further processing before they are meaningful Information: Derived knowledge from raw data. Something

More information

Today s Lecture. Factors & Sampling. Quick Review of Last Week s Computational Concepts. Numbers we Understand. 1. A little bit about Factors

Today s Lecture. Factors & Sampling. Quick Review of Last Week s Computational Concepts. Numbers we Understand. 1. A little bit about Factors Today s Lecture Factors & Sampling Jarrett Byrnes September 8, 2014 1. A little bit about Factors 2. Sampling 3. Describing your sample Quick Review of Last Week s Computational Concepts Numbers we Understand

More information