Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

Size: px
Start display at page:

Download "Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data"

Transcription

1 Spatial Patterns We will examine methods that are used to analyze patterns in two sorts of spatial data: Point Pattern Analysis - These methods concern themselves with the location information associated with point data (not attributes associated with those locations, just where they are found) Geographic Patterns in Areal Data -These methods are used to examine the pattern of attribute values associated with polygon representations of geographic phenomena (i.e. is there a pattern in the attributes of a set of adjacent polygons?)

2 Geographic Patterns in Areal Data Given a set of geographic areas, whether they are represented using vector polygons or collections of raster cells, that have some accompanying variable or attribute information, we can ask questions like: Does the pattern of values show a spatial organization that differs from what we might expect if the values were distributed randomly? In general, the way we will answer this sort of question by forming a descriptive statistic that compares the observed pattern to some expected pattern

3 Geographic Patterns in Areal Data We can assess this sort of thing in different ways, depending on the type of data we have: If we can count occurrences of a nominal variable per area, we can form a contingency table and use a χ 2 test to compare the observed values to those expected OR We can compare pairs of polygons that share a common boundary, computing the joint count statistic for binary nominal data, or Moran s I statistic when we have interval or ratio data that we want to examine for pattern

4 Contingency Tables and the χ 2 Test Any time we can have a pair of nominal variables that can be cross-tabulated this method can be applied E.g. suppose we conducted a survey of all students taking a Geography course, and asked them to indicate their year {freshman, sophomore, junior, senior} and what county they live in {Orange, Durham, Chatham, Alamance} We can use this data to form a 4x4 table, where each cell indicates the count of students in a particular year that live in a particular county This sort of table is called a contingency table, and it can be applied to spatial patterns if one of our nominal variables represents location information (e.g. county)

5 County vs. Year Example Table Freshman Sophomore Junior Senior Totals Alamance Chatham Durham Orange Totals Contingency tables can be built for any data set where we have two nominal variables that we can use to categorize the values into the cells of the table the application does not have to be spatial, but membership in a particular spatial unit (i.e. inside of a certain polygon) is a convenient approach for spatial analysis

6 Contingency Tables and the χ 2 Test Furthermore, we can use the data in a contingency table to assess the presence of a spatial pattern by first forming an expectation of how values of one of the nominal variables should be distributed with respect to the other E.g. if our hypothesis is that the distribution of ages of geography students shouldn t according to their county of residence, then the relative proportions of freshmen : sophomores : juniors : seniors should be the same for each of our five counties (even if the total number of students per county is different) We can use the observed frequency counts in each cell of our contingency table to generate expected frequency counts, based on the rule suggested above

7 County vs. Year Example Table Freshman Sophomore Junior Senior Totals Alamance Chatham Durham Orange Totals Expected values are calculated by multiplying the row total by the column total for each cell, and dividing by the grand total, e.g. for the Freshmen in Alamance County 45 * 28 / 200 = 6.3, and so one for all the cells This creates expected frequencies that are proportionate to one another across rows and columns

8 Contingency Tables and the χ 2 Test Once we have observed and expected frequencies for each cell in the contingency table, we can use those values to calculate the χ 2 test statistic: χ 2 = n Σi = 1 (O - E) 2 E where: O is the observed freq. E is the expected freq. n is the number of cells This χ 2 test statistic has (r -1) * (c - 1) degrees of freedom, where are r & c are the number of rows and columns in the contingency table If the observed frequencies are very different from the expected frequencies, χ 2 test will be larger than the 1- tailed critical value it will be compared it to, thus detecting the presence of a spatial pattern

9 Contingency Table χ 2 Test Example Research question: Is there a spatial pattern in the distribution of student years in counties of residence 1. H 0 : O ~ E (Frequencies are the same, no pattern) 2. H A : O E (Frequencies different, pattern present) 3. Select α = 0.05, one-tailed because of how χ 2 test is used here 4. We calculate the χ 2 test statistic using the formula χ 2 = n Σi = 1 (O - E) 2 E (4-6.3) 2 = (9-11.4) (7-6.4) ( )2 =

10 Contingency Table χ 2 Test Example 5. We now need to find the critical χ 2 values, first calculating the degrees of freedom: df = (r -1) * (c - 1) = (4-1) * (4-1) = 3 * 3 = 9 We can now look up our χ 2 crit values for our α = 0.05, which we will apply here in a one tailed fashion, thus we look in the χ 2 table for p = 0.05 to provide the critical value:

11 Contingency Table χ 2 Test Example 6. Finally, we must compare the χ 2 test value to the χ 2 critical value, finding that χ 2 test > χ2 crit, therefore we reject H 0 and accept H A, which tell us that the null hypothesis of no pattern has been rejected because based on the comparison between the expected and observed frequencies, there appears to be some pattern in which counties geography students in different years reside Notably, this test cannot tell us anything about the pattern s nature, only that the distribution is significantly different from the expected null, even distribution and thus there is evidence of spatial autocorrelation, meaning that geography students in certain years tend to live in certain counties

12 The Joint Count Statistic The contingency table approach, while it can be applied to spatial analyses in the fashion described, does not actually include any spatial relationship information in its formulation, beyond the encoding the coincidence of two nominal variables (when one of those variables represents location information) We can also formulate descriptive statistics that do include spatial relationships, specifically by finding all the regions that share a boundary in a set of polygons, and then comparing attribute values from the pairs to assess the pattern of that attribute

13 The Joint Count Statistic The first step in this method is to enumerate all of the pairs of polygons that share a boundary by creating a binary connectivity table (a.k.a. a spatial matrix). For example using the following five region system: A C B D E 1. Label the regions 2. Create a table with the same row & column labels A B C D E A B C D E Fill in the table with 1s and 0s to indicate which regions share a boundary

14 The Joint Count Statistic We can now take the sum of all the 1 s in the binary connectivity table and divide by 2 to calculate the total number of shared boundaries in the system (J): J = n Σi = 1 x i 2 Next, we are ready to look at the attribute information associated with the polygons to determine if each pair of polygons that shares a boundary has the same values or different values The joint count statistic is designed to be used with binary nominal attributes, i.e. the attribute values need to be reduced to some 2 class description for use in this statistic

15 The Joint Count Statistic The binary attributes in question can be any number of possible representations: The example in the text uses positive or negative residuals in polygons from spatially-mapped regression results It could be any sort of presence/absence data Another possibility is a reclassification of other sorts of data (e.g. nominal or ordinal schemes reclassified to two classes, or interval/ratio data transformed to binary data in any number of ways -- above and below the mean, for example) It can be any scheme in which each polygon is assigned either attribute A or attribute B

16 The Joint Count Statistic We will use the suggested example in the text, where each of our five regions is assigned either a + attribute or a - attribute (possibly describing regression residuals): We now have three types of boundaries: ++ boundaries (2) +- boundaries (5) -- boundaries (0) The joint count statistic compares the observed number of +- boundaries (where the value on either side of the boundary is different) to the number that we would expect to find if the values in the polygons did not exhibit any spatial autocorrelation

17 The Joint Count Statistic The expected number of +- boundaries is calculated as: E [+-] = 2JPM N(N - 1) where: J is the total number of shared boundaries P is the number of + polygons M is the number - polygons N is the total number of polygons For our example, E [+-] is calculated as: E [+-] = 2JPM N(N - 1) = 2*7*3*2 5(5-1) = = 4.2 We will form a statistic by comparing the expected number of +- boundaries to the observed number of +-, which we obtain by simply counting the number of shared boundaries with this characteristic (being careful not to double count)

18 The Joint Count Statistic For our example five region system, the observed number of shared +- boundaries is 5 The last ingredient we need to be able to build a test statistic is an estimate of the variance in E[+-], and unfortunately, calculating this quantity requires a somewhat involved expression: Σ L i (L i -1)PM N(N - 1) 4[J(J -1)- Σ L i (L i -1)]P(P -1)M(M -1) N(N - 1)(N - 2)(N - 3) V [+-] = E [+-] + E [+-] where L i is the total number of boundaries shared by region i In our example V [+-] = 0.56

19 The Joint Count Statistic We can now calculate a test statistic to compare the observed number of +- boundaries to the expected number of +- boundaries as a Z-statistic: (Obs. +- ) - E [+-] Z test = V [+-] This test statistic is normally distributed with mean 0 and variance 1, thus we can use the standard normal distribution to assess its significance An exceptional Z-statistic value would indicate a level of spatial autocorrelation that exceeds the expected amount for our system

20 Z-test for the Joint Count Statistic Example Research question: Is the areal pattern of + and - values randomly distributed amongst the polygons? 1. H 0 : O[+-] ~ E[+-] (Areal pattern is random) 2. H A : O[+-] E[+-] (Pattern is spatially autocorrelated) 3. Select α = 0.05, two-tailed because of H 0 4. We will calculate the test statistic using: Z test = (Obs. +- ) - E [+-] V [+-] = = 1.07

21 Z-test for the Joint Count Statistic Example 5. For an α = 0.05 and a two-tailed test, Z crit = Z test < Z crit, therefore we accept H 0, finding that the areal pattern of +- values in the polygons is not significantly different from a random areal pattern; there is no evidence of spatial autocorrelation in this system that exceeds that which would normally expect were the values of + and - simply assigned randomly to polygons

22 Moran s I Statistic While the joint count statistic does include spatial information (shared boundaries between polygons) in its assessment of autocorrelation, it does so for very limited sorts of attribute data We can use the joint count statistic with binary nominal information, whereas in many situations, we have measurements that are considerably more detailed (i.e. interval or ratio data) We may want to assess spatial patterns of interval or ratio data in a fashion that allows to take full advantage of the detail inherent in those sorts of measurements, checking to see if the pattern of those values exhibits spatial autocorrelation

23 Moran s I Statistic For this purpose we can make use of Moran s I statistic, which we can view as an expansion of the ideas implemented in the joint count statistic Moran s I statistic considers the spatial relationships between each pair of polygons in an areal data set, and encodes the relationships in a connectivity table, just as is done for the join count statistic However, there is much greater flexibility in the nature of how neighborhood information is included in the Moran s I statistic:

24 Moran s I Statistic The computation of Moran s I statistic includes a weight term, where the weights express the degree to which any two elements of the polygon coverage are considered to be spatially related or proximal: In the simplest case, two polygons that share boundary have a weight of 1, and polygons that do not share a boundary have a weight of 0 (binary connectivity case) However, we can imagine all sorts of other schemes: We might weight by the length of boundary that is shared, as a function of a distance between the polygons, or using an expression that indicates how many neighbors apart they are (i.e. 1st order neighbors are adjacent, 2nd order neighbors are separated by one other polygon etc.)

25 Moran s I Statistic Thus, for each and every pair of polygons in the system, a weight expresses the degree to which they are spatiallyrelated (close to each other, connected, etc.) This weight term is multiplied by an expression that compares the attribute values of each and every pair of polygons, by calculating the mean and standard deviation for the whole data set, and then comparing the z-scores of the variable values for each polygon to that of the other: Moran s I = n ΣΣ w ij z i z i j j (n -1) ΣΣ w ij i j where n is the number of polygons w ij is the weight for combinations of the polygon in column i and the polygon in row j of the connectivity matrix z i and z j are z-scores

26 Moran s I Statistic Moran s I statistic is a normalized statistic that can be interpreted much like a correlation coefficient: It produces values between +1, that indicate a very strong spatial pattern, to values near -1 that are extremely rare because it is incredibly unusual to find patterns that exhibit strong negative spatial autocorrelation from real data we can certainly produce simulated patterns that exhibit strong negative autocorrelation, but finding such things in nature is all but unheard of, which is more or less what Tobler s Law predicts Values around 0 indicate an absence of spatial pattern, neither showing organization where nearby values are similar, nor the ultra-rare opposite of that condition

27 Moran s I Statistic The value of a Moran s I statistic depends strongly on the particular weighting method used: Given the same data, depending on how the spatial relationships between pairs of polygons are encoded, one can produce Moran s I values of varying magnitude, despite the fact that the inherent data and pattern is the same: This is an expression of the strong influence on how the conceptual choice made in how to describe spatial relationships will impact the results here For conceptual ease, we will use the same definition we used in the joint count example: If two polygons share a boundary, they will be assigned a weight of 1 in the binary connectivity table, otherwise they will be given a value of 0, indicating that the comparison of their values has no impact on the statistic because they are not David adjacent Tenenbaum GEOG 090 UNC-CH Spring 2005

28 Moran s I Statistic Example A B C D E A C B D E W = {w ij } = A B C D E j rows Polygon Value Z-Score A B C D E Mean 14 Std. Dev Moran s I = i columns n ΣΣ w ij z i z i j j (n -1) ΣΣ w ij i j

29 Moran s I Statistic Example To calculate the statistic, substitute the appropriate values into the equation: Moran s I = n ΣΣ w ij z i z i j j (n -1) ΣΣ w ij i j = 5 ΣΣ w ij z i z i j j (5-1) 14 ΣΣ w ij z i z j = 2 [(1.33)*(-0.88)+(1.33)*(0.22)+ (-0.88)*(0.22) i j +(-0.88)*(0.44)+(0.22)*(0.44)+(0.22)*(-1.11) +(0.44)*(-1.11)] = 2.24 = 5 (2.24) (5-1) 14 = 0.2

Robust Linear Regression (Passing- Bablok Median-Slope)

Robust Linear Regression (Passing- Bablok Median-Slope) Chapter 314 Robust Linear Regression (Passing- Bablok Median-Slope) Introduction This procedure performs robust linear regression estimation using the Passing-Bablok (1988) median-slope algorithm. Their

More information

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data. Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Creation & Description of a Data Set * 4 Levels of Measurement * Nominal, ordinal, interval, ratio * Variable Types

More information

Spatial Data Models. Raster uses individual cells in a matrix, or grid, format to represent real world entities

Spatial Data Models. Raster uses individual cells in a matrix, or grid, format to represent real world entities Spatial Data Models Raster uses individual cells in a matrix, or grid, format to represent real world entities Vector uses coordinates to store the shape of spatial data objects David Tenenbaum GEOG 7

More information

Table of Contents (As covered from textbook)

Table of Contents (As covered from textbook) Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression

More information

Analysis of Variance in R

Analysis of Variance in R nalysis of Variance in R Dale arr R Training: University of Glasgow Dale arr (R Training: University of Glasgow) nalysis of Variance in R 1 / 19 When is NOV applicable? When you wish to assess the independent/joint

More information

[spa-temp.inf] Spatial-temporal information

[spa-temp.inf] Spatial-temporal information [spa-temp.inf] Spatial-temporal information VI Table of Contents for Spatial-temporal information I. Spatial-temporal information........................................... VI - 1 A. Cohort-survival method.........................................

More information

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown Z-TEST / Z-STATISTIC: used to test hypotheses about µ when the population standard deviation is known and population distribution is normal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses

More information

Topic 3: GIS Models 10/2/2017. What is a Model? What is a GIS Model. Geography 38/42:477 Advanced Geomatics

Topic 3: GIS Models 10/2/2017. What is a Model? What is a GIS Model. Geography 38/42:477 Advanced Geomatics Geography 38/42:477 Advanced Geomatics Topic 3: GIS Models What is a Model? Simplified representation of real world Physical, Schematic, Mathematical Map GIS database Reduce complexity and help us understand

More information

Data Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology

Data Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology ❷Chapter 2 Basic Statistics Business School, University of Shanghai for Science & Technology 2016-2017 2nd Semester, Spring2017 Contents of chapter 1 1 recording data using computers 2 3 4 5 6 some famous

More information

Multiple Regression White paper

Multiple Regression White paper +44 (0) 333 666 7366 Multiple Regression White paper A tool to determine the impact in analysing the effectiveness of advertising spend. Multiple Regression In order to establish if the advertising mechanisms

More information

Data Mining. 2.4 Data Integration. Fall Instructor: Dr. Masoud Yaghini. Data Integration

Data Mining. 2.4 Data Integration. Fall Instructor: Dr. Masoud Yaghini. Data Integration Data Mining 2.4 Fall 2008 Instructor: Dr. Masoud Yaghini Data integration: Combines data from multiple databases into a coherent store Denormalization tables (often done to improve performance by avoiding

More information

Correctly Compute Complex Samples Statistics

Correctly Compute Complex Samples Statistics SPSS Complex Samples 15.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample

More information

StatCalc User Manual. Version 9 for Mac and Windows. Copyright 2018, AcaStat Software. All rights Reserved.

StatCalc User Manual. Version 9 for Mac and Windows. Copyright 2018, AcaStat Software. All rights Reserved. StatCalc User Manual Version 9 for Mac and Windows Copyright 2018, AcaStat Software. All rights Reserved. http://www.acastat.com Table of Contents Introduction... 4 Getting Help... 4 Uninstalling StatCalc...

More information

Announcements. Data Sources a list of data files and their sources, an example of what I am looking for:

Announcements. Data Sources a list of data files and their sources, an example of what I am looking for: Data Announcements Data Sources a list of data files and their sources, an example of what I am looking for: Source Map of Bangor MEGIS NG911 road file for Bangor MEGIS Tax maps for Bangor City Hall, may

More information

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition Online Learning Centre Technology Step-by-Step - Minitab Minitab is a statistical software application originally created

More information

Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding

Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding In the previous lecture we learned how to incorporate a categorical research factor into a MLR model by using

More information

IQR = number. summary: largest. = 2. Upper half: Q3 =

IQR = number. summary: largest. = 2. Upper half: Q3 = Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number

More information

STATS PAD USER MANUAL

STATS PAD USER MANUAL STATS PAD USER MANUAL For Version 2.0 Manual Version 2.0 1 Table of Contents Basic Navigation! 3 Settings! 7 Entering Data! 7 Sharing Data! 8 Managing Files! 10 Running Tests! 11 Interpreting Output! 11

More information

MINITAB 17 BASICS REFERENCE GUIDE

MINITAB 17 BASICS REFERENCE GUIDE MINITAB 17 BASICS REFERENCE GUIDE Dr. Nancy Pfenning September 2013 After starting MINITAB, you'll see a Session window above and a worksheet below. The Session window displays non-graphical output such

More information

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel Research Methods for Business and Management Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel A Simple Example- Gym Purpose of Questionnaire- to determine the participants involvement

More information

Week 4: Simple Linear Regression II

Week 4: Simple Linear Regression II Week 4: Simple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Algebraic properties

More information

1 More configuration model

1 More configuration model 1 More configuration model In the last lecture, we explored the definition of the configuration model, a simple method for drawing networks from the ensemble, and derived some of its mathematical properties.

More information

Spatial Interpolation & Geostatistics

Spatial Interpolation & Geostatistics (Z i Z j ) 2 / 2 Spatial Interpolation & Geostatistics Lag Lag Mean Distance between pairs of points 1 Tobler s Law All places are related, but nearby places are related more than distant places Corollary:

More information

Data analysis using Microsoft Excel

Data analysis using Microsoft Excel Introduction to Statistics Statistics may be defined as the science of collection, organization presentation analysis and interpretation of numerical data from the logical analysis. 1.Collection of Data

More information

Probability An Example

Probability An Example Probability An Example For example, suppose we have a data set where in six cities, we count the number of malls located in that city present: Each count of the # of malls in a city is an event # of Malls

More information

Descriptive Statistics, Standard Deviation and Standard Error

Descriptive Statistics, Standard Deviation and Standard Error AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.

More information

STA 570 Spring Lecture 5 Tuesday, Feb 1

STA 570 Spring Lecture 5 Tuesday, Feb 1 STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row

More information

One Factor Experiments

One Factor Experiments One Factor Experiments 20-1 Overview Computation of Effects Estimating Experimental Errors Allocation of Variation ANOVA Table and F-Test Visual Diagnostic Tests Confidence Intervals For Effects Unequal

More information

Spatial Interpolation - Geostatistics 4/3/2018

Spatial Interpolation - Geostatistics 4/3/2018 Spatial Interpolation - Geostatistics 4/3/201 (Z i Z j ) 2 / 2 Spatial Interpolation & Geostatistics Lag Distance between pairs of points Lag Mean Tobler s Law All places are related, but nearby places

More information

Statistical Good Practice Guidelines. 1. Introduction. Contents. SSC home Using Excel for Statistics - Tips and Warnings

Statistical Good Practice Guidelines. 1. Introduction. Contents. SSC home Using Excel for Statistics - Tips and Warnings Statistical Good Practice Guidelines SSC home Using Excel for Statistics - Tips and Warnings On-line version 2 - March 2001 This is one in a series of guides for research and support staff involved in

More information

Selected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13.

Selected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13. Minitab@Oneonta.Manual: Selected Introductory Statistical and Data Manipulation Procedures Gordon & Johnson 2002 Minitab version 13.0 Minitab@Oneonta.Manual: Selected Introductory Statistical and Data

More information

Laboratory for Two-Way ANOVA: Interactions

Laboratory for Two-Way ANOVA: Interactions Laboratory for Two-Way ANOVA: Interactions For the last lab, we focused on the basics of the Two-Way ANOVA. That is, you learned how to compute a Brown-Forsythe analysis for a Two-Way ANOVA, as well as

More information

GRAPHS AND STATISTICS Residuals Common Core Standard

GRAPHS AND STATISTICS Residuals Common Core Standard B Graphs and Statistics, Lesson 7, Residuals (r. 2018) GRAPHS AND STATISTICS Residuals Common Core Standard Next Generation Standard S-ID.B.6b Informally assess the fit of a function by plotting and analyzing

More information

+ = Spatial Analysis of Raster Data. 2 =Fault in shale 3 = Fault in limestone 4 = no Fault, shale 5 = no Fault, limestone. 2 = fault 4 = no fault

+ = Spatial Analysis of Raster Data. 2 =Fault in shale 3 = Fault in limestone 4 = no Fault, shale 5 = no Fault, limestone. 2 = fault 4 = no fault Spatial Analysis of Raster Data 0 0 1 1 0 0 1 1 1 0 1 1 1 1 1 1 2 4 4 4 2 4 5 5 4 2 4 4 4 2 5 5 4 4 2 4 5 4 3 5 4 4 4 2 5 5 5 3 + = 0 = shale 1 = limestone 2 = fault 4 = no fault 2 =Fault in shale 3 =

More information

Enterprise Miner Tutorial Notes 2 1

Enterprise Miner Tutorial Notes 2 1 Enterprise Miner Tutorial Notes 2 1 ECT7110 E-Commerce Data Mining Techniques Tutorial 2 How to Join Table in Enterprise Miner e.g. we need to join the following two tables: Join1 Join 2 ID Name Gender

More information

Introductory Applied Statistics: A Variable Approach TI Manual

Introductory Applied Statistics: A Variable Approach TI Manual Introductory Applied Statistics: A Variable Approach TI Manual John Gabrosek and Paul Stephenson Department of Statistics Grand Valley State University Allendale, MI USA Version 1.1 August 2014 2 Copyright

More information

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &

More information

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010 THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE

More information

Week 7 Picturing Network. Vahe and Bethany

Week 7 Picturing Network. Vahe and Bethany Week 7 Picturing Network Vahe and Bethany Freeman (2005) - Graphic Techniques for Exploring Social Network Data The two main goals of analyzing social network data are identification of cohesive groups

More information

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value. Calibration OVERVIEW... 2 INTRODUCTION... 2 CALIBRATION... 3 ANOTHER REASON FOR CALIBRATION... 4 CHECKING THE CALIBRATION OF A REGRESSION... 5 CALIBRATION IN SIMPLE REGRESSION (DISPLAY.JMP)... 5 TESTING

More information

Subset Selection in Multiple Regression

Subset Selection in Multiple Regression Chapter 307 Subset Selection in Multiple Regression Introduction Multiple regression analysis is documented in Chapter 305 Multiple Regression, so that information will not be repeated here. Refer to that

More information

Splines and penalized regression

Splines and penalized regression Splines and penalized regression November 23 Introduction We are discussing ways to estimate the regression function f, where E(y x) = f(x) One approach is of course to assume that f has a certain shape,

More information

For our example, we will look at the following factors and factor levels.

For our example, we will look at the following factors and factor levels. In order to review the calculations that are used to generate the Analysis of Variance, we will use the statapult example. By adjusting various settings on the statapult, you are able to throw the ball

More information

Attribute Accuracy. Quantitative accuracy refers to the level of bias in estimating the values assigned such as estimated values of ph in a soil map.

Attribute Accuracy. Quantitative accuracy refers to the level of bias in estimating the values assigned such as estimated values of ph in a soil map. Attribute Accuracy Objectives (Entry) This basic concept of attribute accuracy has been introduced in the unit of quality and coverage. This unit will teach a basic technique to quantify the attribute

More information

Slides 11: Verification and Validation Models

Slides 11: Verification and Validation Models Slides 11: Verification and Validation Models Purpose and Overview The goal of the validation process is: To produce a model that represents true behaviour closely enough for decision making purposes.

More information

Map Analysis of Raster Data I 3/8/2018

Map Analysis of Raster Data I 3/8/2018 Map Analysis of Raster Data I /8/8 Spatial Analysis of Raster Data What is Spatial Analysis? = shale = limestone 4 4 4 4 5 5 4 4 4 4 5 5 4 4 4 5 4 5 4 4 4 5 5 5 + = = fault =Fault in shale 4 = no fault

More information

Spatial Outlier Detection

Spatial Outlier Detection Spatial Outlier Detection Chang-Tien Lu Department of Computer Science Northern Virginia Center Virginia Tech Joint work with Dechang Chen, Yufeng Kou, Jiang Zhao 1 Spatial Outlier A spatial data point

More information

Spatial Interpolation & Geostatistics

Spatial Interpolation & Geostatistics (Z i Z j ) 2 / 2 Spatial Interpolation & Geostatistics Lag Lag Mean Distance between pairs of points 11/3/2016 GEO327G/386G, UT Austin 1 Tobler s Law All places are related, but nearby places are related

More information

Watershed Sciences 4930 & 6920 GEOGRAPHIC INFORMATION SYSTEMS

Watershed Sciences 4930 & 6920 GEOGRAPHIC INFORMATION SYSTEMS Watershed Sciences 4930 & 6920 GEOGRAPHIC INFORMATION SYSTEMS WATS 4930/6920 WHERE WE RE GOING WATS 6915 welcome to tag along for any, all or none WEEK FIVE Lecture VECTOR ANALYSES Joe Wheaton HOUSEKEEPING

More information

STAT 113: Lab 9. Colin Reimer Dawson. Last revised November 10, 2015

STAT 113: Lab 9. Colin Reimer Dawson. Last revised November 10, 2015 STAT 113: Lab 9 Colin Reimer Dawson Last revised November 10, 2015 We will do some of the following together. The exercises with a (*) should be done and turned in as part of HW9. Before we start, let

More information

1. What specialist uses information obtained from bones to help police solve crimes?

1. What specialist uses information obtained from bones to help police solve crimes? Mathematics: Modeling Our World Unit 4: PREDICTION HANDOUT VIDEO VIEWING GUIDE H4.1 1. What specialist uses information obtained from bones to help police solve crimes? 2.What are some things that can

More information

8 th Grade Pre Algebra Pacing Guide 1 st Nine Weeks

8 th Grade Pre Algebra Pacing Guide 1 st Nine Weeks 8 th Grade Pre Algebra Pacing Guide 1 st Nine Weeks MS Objective CCSS Standard I Can Statements Included in MS Framework + Included in Phase 1 infusion Included in Phase 2 infusion 1a. Define, classify,

More information

Applied Multivariate Analysis

Applied Multivariate Analysis Department of Mathematics and Statistics, University of Vaasa, Finland Spring 2017 Choosing Statistical Method 1 Choice an appropriate method 2 Cross-tabulation More advance analysis of frequency tables

More information

Interactive Math Glossary Terms and Definitions

Interactive Math Glossary Terms and Definitions Terms and Definitions Absolute Value the magnitude of a number, or the distance from 0 on a real number line Addend any number or quantity being added addend + addend = sum Additive Property of Area the

More information

ASSIGNMENT 6 Final_Tracts.shp Phil_Housing.mat lnmv %Vac %NW Final_Tracts.shp Philadelphia Housing Phil_Housing_ Using Matlab Eire

ASSIGNMENT 6 Final_Tracts.shp Phil_Housing.mat lnmv %Vac %NW Final_Tracts.shp Philadelphia Housing Phil_Housing_ Using Matlab Eire ESE 502 Tony E. Smith ASSIGNMENT 6 This final study is a continuation of the analysis in Assignment 5, and will use the results of that analysis. It is assumed that you have constructed the shapefile,

More information

A toolbox for analyzing the effect of infra-structural facilities on the distribution of activity points

A toolbox for analyzing the effect of infra-structural facilities on the distribution of activity points A toolbox for analyzing the effect of infra-structural facilities on the distribution of activity points Railway Station : Infra-structural Facilities Tohru YOSHIKAWA Department of Architecture Tokyo Metropolitan

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 1 Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks in Data Preprocessing Data Cleaning Data Integration Data

More information

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable. 5-number summary 68-95-99.7 Rule Area principle Bar chart Bimodal Boxplot Case Categorical data Categorical variable Center Changing center and spread Conditional distribution Context Contingency table

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Exploratory model analysis

Exploratory model analysis Exploratory model analysis with R and GGobi Hadley Wickham 6--8 Introduction Why do we build models? There are two basic reasons: explanation or prediction [Ripley, 4]. Using large ensembles of models

More information

1. Estimation equations for strip transect sampling, using notation consistent with that used to

1. Estimation equations for strip transect sampling, using notation consistent with that used to Web-based Supplementary Materials for Line Transect Methods for Plant Surveys by S.T. Buckland, D.L. Borchers, A. Johnston, P.A. Henrys and T.A. Marques Web Appendix A. Introduction In this on-line appendix,

More information

Getting to Know Your Data

Getting to Know Your Data Chapter 2 Getting to Know Your Data 2.1 Exercises 1. Give three additional commonly used statistical measures (i.e., not illustrated in this chapter) for the characterization of data dispersion, and discuss

More information

CS229 Lecture notes. Raphael John Lamarre Townshend

CS229 Lecture notes. Raphael John Lamarre Townshend CS229 Lecture notes Raphael John Lamarre Townshend Decision Trees We now turn our attention to decision trees, a simple yet flexible class of algorithms. We will first consider the non-linear, region-based

More information

Two-Stage Least Squares

Two-Stage Least Squares Chapter 316 Two-Stage Least Squares Introduction This procedure calculates the two-stage least squares (2SLS) estimate. This method is used fit models that include instrumental variables. 2SLS includes

More information

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL SPSS QM II SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform some statistical analyses in SPSS. Details around a certain function/analysis method not covered

More information

Multivariate Capability Analysis

Multivariate Capability Analysis Multivariate Capability Analysis Summary... 1 Data Input... 3 Analysis Summary... 4 Capability Plot... 5 Capability Indices... 6 Capability Ellipse... 7 Correlation Matrix... 8 Tests for Normality... 8

More information

Nonparametric Testing

Nonparametric Testing Nonparametric Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

More information

WELCOME! Lecture 3 Thommy Perlinger

WELCOME! Lecture 3 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important

More information

TELCOM2125: Network Science and Analysis

TELCOM2125: Network Science and Analysis School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 2 Part 4: Dividing Networks into Clusters The problem l Graph partitioning

More information

- 1 - Fig. A5.1 Missing value analysis dialog box

- 1 - Fig. A5.1 Missing value analysis dialog box WEB APPENDIX Sarstedt, M. & Mooi, E. (2019). A concise guide to market research. The process, data, and methods using SPSS (3 rd ed.). Heidelberg: Springer. Missing Value Analysis and Multiple Imputation

More information

Example 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1

Example 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1 Panel data set Consists of n entities or subjects (e.g., firms and states), each of which includes T observations measured at 1 through t time period. total number of observations : nt Panel data have

More information

Product Catalog. AcaStat. Software

Product Catalog. AcaStat. Software Product Catalog AcaStat Software AcaStat AcaStat is an inexpensive and easy-to-use data analysis tool. Easily create data files or import data from spreadsheets or delimited text files. Run crosstabulations,

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

Spatial Statistics With R: Getting Started

Spatial Statistics With R: Getting Started Spatial Statistics With R: Getting Started Introduction In the last practical, you saw how to handle geographical data in R, and how to carry out some basic, and more advanced statistical analysis on the

More information

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS Cognitive Robotics Original: David G. Lowe, 004 Summary: Coen van Leeuwen, s1460919 Abstract: This article presents a method to extract

More information

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student Organizing data Learning Outcome 1. make an array 2. divide the array into class intervals 3. describe the characteristics of a table 4. construct a frequency distribution table 5. constructing a composite

More information

SPSS TRAINING SPSS VIEWS

SPSS TRAINING SPSS VIEWS SPSS TRAINING SPSS VIEWS Dataset Data file Data View o Full data set, structured same as excel (variable = column name, row = record) Variable View o Provides details for each variable (column in Data

More information

Modelling Proportions and Count Data

Modelling Proportions and Count Data Modelling Proportions and Count Data Rick White May 4, 2016 Outline Analysis of Count Data Binary Data Analysis Categorical Data Analysis Generalized Linear Models Questions Types of Data Continuous data:

More information

ECLT 5810 Data Preprocessing. Prof. Wai Lam

ECLT 5810 Data Preprocessing. Prof. Wai Lam ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate

More information

Influence Maximization in Location-Based Social Networks Ivan Suarez, Sudarshan Seshadri, Patrick Cho CS224W Final Project Report

Influence Maximization in Location-Based Social Networks Ivan Suarez, Sudarshan Seshadri, Patrick Cho CS224W Final Project Report Influence Maximization in Location-Based Social Networks Ivan Suarez, Sudarshan Seshadri, Patrick Cho CS224W Final Project Report Abstract The goal of influence maximization has led to research into different

More information

1 Homophily and assortative mixing

1 Homophily and assortative mixing 1 Homophily and assortative mixing Networks, and particularly social networks, often exhibit a property called homophily or assortative mixing, which simply means that the attributes of vertices correlate

More information

Neighbourhood Operations Specific Theory

Neighbourhood Operations Specific Theory Neighbourhood Operations Specific Theory Neighbourhood operations are a method of analysing data in a GIS environment. They are especially important when a situation requires the analysis of relationships

More information

Frequency Distributions

Frequency Distributions Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data so that it is possible to get a general overview of the results. Remember,

More information

SPSS INSTRUCTION CHAPTER 9

SPSS INSTRUCTION CHAPTER 9 SPSS INSTRUCTION CHAPTER 9 Chapter 9 does no more than introduce the repeated-measures ANOVA, the MANOVA, and the ANCOVA, and discriminant analysis. But, you can likely envision how complicated it can

More information

Genotype x Environmental Analysis with R for Windows

Genotype x Environmental Analysis with R for Windows Genotype x Environmental Analysis with R for Windows Biometrics and Statistics Unit Angela Pacheco CIMMYT,Int. 23-24 Junio 2015 About GEI In agricultural experimentation, a large number of genotypes are

More information

Box-Cox Transformation for Simple Linear Regression

Box-Cox Transformation for Simple Linear Regression Chapter 192 Box-Cox Transformation for Simple Linear Regression Introduction This procedure finds the appropriate Box-Cox power transformation (1964) for a dataset containing a pair of variables that are

More information

Predict Outcomes and Reveal Relationships in Categorical Data

Predict Outcomes and Reveal Relationships in Categorical Data PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Features and Patterns The Curse of Size and

More information

Serial Correlation and Heteroscedasticity in Time series Regressions. Econometric (EC3090) - Week 11 Agustín Bénétrix

Serial Correlation and Heteroscedasticity in Time series Regressions. Econometric (EC3090) - Week 11 Agustín Bénétrix Serial Correlation and Heteroscedasticity in Time series Regressions Econometric (EC3090) - Week 11 Agustín Bénétrix 1 Properties of OLS with serially correlated errors OLS still unbiased and consistent

More information

Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D.

Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D. Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D. Introduction to Minitab The interface for Minitab is very user-friendly, with a spreadsheet orientation. When you first launch Minitab, you will see

More information

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables

More information

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to

More information

Modelling Proportions and Count Data

Modelling Proportions and Count Data Modelling Proportions and Count Data Rick White May 5, 2015 Outline Analysis of Count Data Binary Data Analysis Categorical Data Analysis Generalized Linear Models Questions Types of Data Continuous data:

More information

Frequency Tables. Chapter 500. Introduction. Frequency Tables. Types of Categorical Variables. Data Structure. Missing Values

Frequency Tables. Chapter 500. Introduction. Frequency Tables. Types of Categorical Variables. Data Structure. Missing Values Chapter 500 Introduction This procedure produces tables of frequency counts and percentages for categorical and continuous variables. This procedure serves as a summary reporting tool and is often used

More information

Differentiation of Cognitive Abilities across the Lifespan. Online Supplement. Elliot M. Tucker-Drob

Differentiation of Cognitive Abilities across the Lifespan. Online Supplement. Elliot M. Tucker-Drob 1 Differentiation of Cognitive Abilities across the Lifespan Online Supplement Elliot M. Tucker-Drob This online supplement reports the results of an alternative set of analyses performed on a single sample

More information

Erdős-Rényi Model for network formation

Erdős-Rényi Model for network formation Network Science: Erdős-Rényi Model for network formation Ozalp Babaoglu Dipartimento di Informatica Scienza e Ingegneria Università di Bologna www.cs.unibo.it/babaoglu/ Why model? Simpler representation

More information

Grade 7 Mathematics Performance Level Descriptors

Grade 7 Mathematics Performance Level Descriptors Limited A student performing at the Limited Level demonstrates a minimal command of Ohio s Learning Standards for Grade 7 Mathematics. A student at this level has an emerging ability to work with expressions

More information

2003/2010 ACOS MATHEMATICS CONTENT CORRELATION GRADE ACOS 2010 ACOS

2003/2010 ACOS MATHEMATICS CONTENT CORRELATION GRADE ACOS 2010 ACOS CURRENT ALABAMA CONTENT PLACEMENT 5.1 Demonstrate number sense by comparing, ordering, rounding, and expanding whole numbers through millions and decimals to thousandths. 5.1.B.1 2003/2010 ACOS MATHEMATICS

More information