Spatial Statistics With R: Getting Started
|
|
- Rosa Rosanna Howard
- 6 years ago
- Views:
Transcription
1 Spatial Statistics With R: Getting Started Introduction In the last practical, you saw how to handle geographical data in R, and how to carry out some basic, and more advanced statistical analysis on the data. However, even the more advanced Poisson modelling carried out did not take into consideration any spatial dependencies in the data. The breach of peace counts in each of the census blocks were modelled as independent Poisson counts, and the number of counts in each block was considered only in terms of other properties of that block, ignoring anything happening in surrounding blocks. However, there is a large area of statistical analysis devoted to processes in which events in nearby areas are related. In this practical you will learn how to use R libraries devoted to this kind of analysis - in particular spdep. The spdep Package The name of this package is a shortened form of spatial dependencies and contains a number of statistical routines for testing for spatial dependencies in random variables, as well as other routines for allowing for such dependencies when fitting models. To begin this practical, start up R by opening your working folder and clicking on the pract.rdata file and then load the packages GISTools. Enter library(gistools) to load these - and then enter data(newhaven) which will make the New Haven data visible again. Note that when you loaded the spdep package the printout shows that a number of 'helper' libraries were also loaded. You will see something like: Loading required package: tripack Loading required package: sp Loading required package: maptools Loading required package: foreign Loading required package: boot Loading required package: spam The topology of a spatial data set is the term usually described the spatial arrangement of geographical items within it - in particular, for a polygon data set the topology is a list of polygons that touch one another. Here, touching can mean the sharing of a common edge, and in some cases it can also mean the sharing of a common single point (for example when two census block areas are joined only at their corners. spdep has a function to extract the topology information from a polygon object - called poly2nb. The nb here stands for neighbours - since it is basically a list of which polygon neighbours which other ones. Enter blocks.nb = poly2nb(blocks) Spatial Statistics with R: Page 1 of 12
2 to store this information in a variable called blocks.nb. It is possible to plot this information as a kind of network. The nodes on the network are the so-called label points for the polygon file. Each polygon has a label point - a point somewhere inside the polygon where any text used to label the polygon may be placed. They are useful useful as node points on a network representing polygon neighbours. To extract the label points, as a point object, enter blocks.labs = poly.labels(blocks) and then it is possible to plot the neighbour information. Here, this is done on a backdrop of the census block polygons: plot(blocks,col='grey') plot(blocks.nb,coordinates(blocks.labs),col='red',add=true) the default for poly2nb is to define neighbours as having points, as well as edges, in common. This is sometimes called queen's case topology because connection at edges and corners corresponds to the legal moves of the queen in chess. It is also possible to extract rook's case topology - where only common edges define neighbours. This corresponds to legal moves of the rook in chess. To extract rook's case moves, add the argument queen=false to the poly2nb function: blocks.nb = poly2nb(blocks,queen=false) plot(blocks,col='grey') plot(blocks.nb,coordinates(blocks.labs),col='red',add=true) This repeats the network map from before, but now only polygons with common edges are connected. In this case, as few polygon pairs are connected only at the corners, the result is fairly similar. An alternative definition of topology (based on nearness of polygons rather than contiguity) is to defined two polygons as neighbours if their label points are within some distance d of one another. R can define these kinds of neighbours using the dnearneigh function. For example, to define census blocks as being neighbours if they are within 1.2 miles apart, enter the following: blocks.nb2 = dnearneigh(poly.labels(blocks),0,miles2ft(1.2)) It is then possible to plot the neighbour network under this definition: plot(blocks,col='grey') plot(blocks.nb2,coordinates(blocks.labs),col='red',add=true) Note that this demonstrates that under different definitions of neighbour, patterns of network can occur. quite different Computing and Testing Moran's I Having defined contiguity for this census block example, it is now possible to investigate the degree of spatial dependency there is in the attribute data. A typical way of doing this is to compute the Moran's-I coefficient. Moran s-i is defined as Spatial Statistics with R: Page 2 of 12
3 I = i N j w ij i j w ij i ( Xi X )( X j X ) ( Xi X ) 2 Where: X i N w ij X Is the attribute attached to polygon i Is the number of polygons Indicates whether polygons i and j are neigbours Is the average polygon attribute value The formula may seem complex, but essentially it measures the degree to which similarvalued attributes occur near to each other. If above average valued attributes tend to be near other above-average attributes, this gives a positive value of Moran s-i. If, on the other hand, above average values tend to occur near to below average values - in a checker-board pattern - this gives a negative Moran s-i. Moran s I is typically between -1 and 1, and in some ways is similar to a correlation coefficient. A value of zero suggests no spatial dependency. It is sometimes referred to as a measure of autocorrelation as it measures the variable X s correlation to itself, in a geographical sense. To illustrate this, choropleth maps corresponding to four values of Moran s-i are given below: I = I = I = I = Spatial Statistics with R: Page 3 of 12
4 R compute Moran s-i. To do this, it needs to convert a neigbourhood list to a w-list. This is really just another way of storing the polygon adjacency data. The conversion is done with the nb2listw function. Enter blocks.lw = nb2listw(blocks.nb) which stores the w-list in blocks.lw. Having done this, it is possible to investigate spatial dependency of some of the New Haven data. To test whether the percent vacant properties variable P_VACANT exhibits spatial dependency, we first attach the data frame from the blocks object: attach(data.frame(blocks)) To compute the Moran s-i statistic, now enter: moran.test(p_vacant,blocks.lw) which produces the following output: Moran's I test under randomisation data: P_VACANT weights: blocks.lw Moran I statistic standard deviate = , p-value = alternative hypothesis: greater sample estimates: Moran I statistic Expectation Variance This needs some explanation. The first number of the last line printed gives the Moran s-i statistic itself - about The other information relates to a statistical test as to whether the Moran s-i is equal to zero. If this is the case, then the theoretical values for the expected value of Moran s-i and its sample variance are estimated using the following formulae: E(I) = 1 N 1 V ar(i) = where: ND 6EC 2 (N + 1)(N 1)C 2 A = 1 (w ij + w ji ) 2, i j 2 B = k i j j w jk + i w ik 2 Spatial Statistics with R: Page 4 of 12
5 C = i w ij, i j j D =(N 2 3N + 3)A NB +3C 2 E = i (X i X) 4 /N ( i (X i X) 2 /N ) 2 These (very) complicated formulae can be used to create a test statistic z = I E(I) {V ar(i)} 1 2 which is approximately Normally distributed, and can be looked up against a p-value. The last line of the printout from moran.test tells you that the value of E(I) for P_VACANT is about (labelled Expectation ) and that for Var(I) is about (labelled Variance ). These can be used to compute z in the formula above, which is then used to test the hypothesis that I=0. In the printout from moran.test this is labelled as Moran I statistic standard deviate and takes the value of around Finally the p-value for the statistic is computed, and shown in the printout to be about Recall that the p- value is the probability of obtaining a value at least as extreme as the one observed from the data, given that the null hypothesis is true. Thus, the lower the value, the more evidence against the null hypothesis. Here the smallness of the p-value suggests strong evidence against the null hypothesis - ie we should reject the hypothesis that I=0, implying that some degree of spatial dependency is present. We can now do the same test in terms of density of breach of peace events - firstly compute the density values in events per square mile: density = poly.counts(breach,blocks)/ ft2miles(ft2miles(poly.areas(blocks))) and then carry out the Moran s-i test: moran.test(density,blocks.lw) this gives a print-out similar to that before. In this case the Moran s-i statistic is As a self test you should be able to find the p-value for this and decide whether Moran s-i differs significantly from zero. Simulation-Based Tests The basis for the significance tests in the last section was to compute the expected value and variance of the Moranʼs-I statistic under the assumption that there is no spatial dependency in the attribute X. Here, this is done by assuming that if there was no spatial dependency, then any of the observed X-values could have occurred with equal chance at any of the polygons. In other words, any permutation of polygon attributes to the polygons is equally likely. The formulae for E(I) and Var(I) were theoretically derived given Spatial Statistics with R: Page 5 of 12
6 this assumption. However, the assumption that Moranʼs-I is normally distributed in this case is only approximate. In times when computers were a lot slower than they are now, this approach was probably the most appropriate but now there is an alternative approach. This is simply to permute the attributes randomly amongst the polygons a large number of times, and note the values of Moranʼs-I each time. By comparing the actual Moranʼs-I against these, we can see how extreme the true value is compared to those generated under the assumption that any permutation is equally likely. If there are n simulations, and m of these have a larger value than the true Moranʼs-I, then the experimental p-value is m/(n+1). The theoretical approach of the previous section is relatively easy to compute (although seven formulae may seem complex to a human, they can be calculated in a fraction of a second by a computer) but it is only approximate. The simulation approach - also called the Monte-Carlo approach - outlined here requires more computer time (usually n should be around 10,000) but the simulations are of the true model. R can can carry out simulationbased tests with the moran.mc function: moran.mc(p_vacant,blocks.lw,nsim=10000) The extra argument nsim tells the function how many simulations to carry out - that is, the number n mentioned above. The result will be something like: Monte-Carlo simulation of Moran's I data: P_VACANT weights: blocks.lw number of simulations + 1: statistic = , observed rank = 9909, p-value = alternative hypothesis: greater Note that the p-value here - although slightly different from that obtained from moran.test still suggests that the hypothesis of zero Moran s-i should be rejected. Also note that when you run moran.mc you may well obtain slightly different results, as this approach is based on random simulation, and so no two runs of the function will have identical outcomes. As another self-test, try running moran.mc on the density variable. Regression Models with Spatial Autoregression In this section the idea of spatial dependency is taken a step further, by considering its effect when calibrating regression models. A standard bivariate regression model has the form Y i = β 0 + β 1 X i + ɛ i where the Y variable is to be predicted by the X variable. The beta {β 0, β 1} values are the regression coefficients (intercept and slope respectively) and the final epsilon {ɛ i } term is an error term. In a standard model it is assumed that these are normally distributed, with a mean of zero. It is also assumed that all errors have the same standard deviation, and that they are independent. However, in many geographical situations, the last assumption is dubious. The error term in a model is essentially related to factors influencing the Y Spatial Statistics with R: Page 6 of 12
7 variable that are not reflected in the predictor variable X. If such factors relate to a geographical phenomenon, it is possible that their effects might spill over, so that error terms in adjacent regions will depend on one another. In this case, the model above will be inappropriate, and models allowing for dependency in the epsilons should be considered instead. To consider this kind of model, we will look at two new New Haven crime variables related to residential burglaries. These are both point objects, called burgres.f and burgres.n. burgres.f is a list of burglaries occurring between 1st august 2007 and 31st january 2008 where entry was forced into the property, and burgres.n is a list of burglaries from the same time period where no entry was forced. In the case of non-forced entry, this suggests that the property was left insecure, perhaps by leaving a door or window open. Both variables are point objects. One interesting question is whether both kinds of residential burglary occur in the same places - that is, if a place is a high risk area for nonforced entry, does it imply that it is also a high risk for forced entry? To investigate this, we will use a bivariate regression model that attempts to predict the density of forced burglaries from the density of non-forced ones. The indicators needed for this are the rates of burglary given the number of properties at risk. Here we use the variable OCCUPIED from the data frame in the census blocks object to estimate the number of properties at risk. If we were to compute rates per 1,000 households, this would be 1000*(number of burglaries in block)/occupied and since this is over a six-month period, doubling this quantity gives the number of burglaries per 1,000 households per year. However, typing in OCCUPIED shows that some blocks have no occupied housing, so the above quantity is not defined. To overcome this problem we select a subset of the blocks object consisting only of blocks with greater than zero occupied dwellings. For polygon spatial objects, each individual polygon can be treated like a row in a data frame for purposes of subset selection. Thus to select only the blocks where the variable OCCUPIED is greater than zero, enter blocks2 = blocks[occupied > 0,] to stored the subset census block data in the object blocks2. We can now compute the burglary rates for forced and non-forced entries by first counting the burglaries in each block in blocks2 (with the poly.counts function), dividing these numbers by the OCCUPIED counts and then multiplying by 2,000 (to get yearly rates per 1,000 households). However, before we do this, remember that we need the OCCUPIED column from blocks2 and not blocks - but at the moment the one from blocks is attached. To sort this out, firstly detach the data frame associated with blocks and then attach the one associated with blocks2: detach(data.frame(blocks)) attach(data.frame(blocks2)) now the two rate variables can be calculated: forced.rate = 2000*poly.counts(burgres.f,blocks2)/OCCUPIED notforced.rate = 2000*poly.counts(burgres.n,blocks2)/OCCUPIED Spatial Statistics with R: Page 7 of 12
8 so we now have the two rates stored in forced.rate and notforced.rate. A first attempt at modelling the relationship between the two rates could be via simple bivariate regression - ignoring any spatial dependencies in the error term. This is done using the lm function, which creates a simple regression model object. model1 = lm(forced.rate~notforced.rate) this stores the basic model in model1 - to see the regression coefficients, enter summary(model1) which produces the following output: Call: lm(formula = forced.rate ~ notforced.rate) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-10 *** notforced.rate * --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 125 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 125 DF, p-value: the key things to note here are that the forced rate is related to the not-forced rate by the formula expected(forced rate) = *(not forced rate) and that the coefficient for the not forced rate is statistically different from zero - so there is evidence that the two rates are related. One possible explanation is that if a burglar is active in an area, they will only use force to enter dwellings when it is necessary, making use of an insecure window or door if they spot the opportunity. Thus in areas where burglars are active, both kinds of burglary could potentially occur. However, in areas where they are less active it is less likely for either kind burglary to occur. However, this regression model could possibly be improved if, instead of assuming that the error terms are independent, we assume a spatial dependency. This can be done in a number of ways, but the approach we will use here is the spatially autocorrelated regression (SAR) model: y i = ρ i w ij y j + β 0 + β 1 x i + ɛ i Spatial Statistics with R: Page 8 of 12
9 The difference between this and the standard model is the first term on the left hand side. Here, w ij is equal to 1 if polygons i and j are neighbours and zero otherwise. coefficient ρ control;s the degree of spatial dependency. Effectively the variable y for a given polygon is predicted not just by x but also by the y-variables of polygons neighbouring y. Calibrating a SAR model involves estimating the regression coefficients, as before, but also involves estimating ρ. In R, SAR models can be calibrated using the function spautolm. This works in a similar way to lm, but also needs the contiguity information in listw form. Since we are now working with blocks2 rather than blocks we need to extract the information for the newer object: blocks2.nb = poly2nb(blocks2) blocks2.lw = nb2listw(blocks2.nb) Now it is possible to fit the SAR model - model2 = spautolm(forced.rate~notforced.rate,listw=blocks2.lw) This stores the result in model2 - more information can be found by entering summary(model2) giving the following output - Call: spautolm(formula = forced.rate ~ notforced.rate, listw = blocks2.lw) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) e-10 notforced.rate Lambda: LR test value: p-value: Log likelihood: ML residual variance (sigma squared): , (sigma: ) Number of observations: 127 Number of parameters estimated: 4 AIC: This shows that the model calibrated in this way gives the model expected(forced rate) = *(not forced rate) which differs only very slightly from the model obtained with a standard regression model. The section marked 'lambda' in the output shows that the estimated value of the dependency coefficient is 0.139, but the test of a null hypothesis of zero dependency has The Spatial Statistics with R: Page 9 of 12
10 a p-value of around so we fail to reject the null hypothesis. This suggests that, in this case, one does not need to allow for spatial dependency of the error term. The Modifiable Areal Unit Problem In the previous section, data was summarised and then analysed at the US Census block level. One important issue with spatial analytical models of this kind is their dependency on the set of areal units used. For example, if we were to work with US Census tracts instead of blocks, would we obtain similar results? With the data set here, it is possible to test this. Firstly, included in the library is an object called tracts which consists of the polygon outlines of the US Census tracts for New Haven. To see the relationship between the tracts and the blocks, enter: plot(blocks,border= red ) plot(tracts,lwd=2,add=true) the parameter lwd controls the line width being drawn. The Census blocks are nested within the tracts. Next, compute the burglary rates for the tracts; first off detach the data frame associated with blocks2 and the attach the one for tracts: detach(data.frame(blocks2)) attach(data.frame(tracts)) now, compute burglary rates for the tracts: forced.rate.t = 2000*poly.counts(burgres.f,tracts)/OCCUPIED notforced.rate.t = 2000*poly.counts(burgres.n,tracts)/OCCUPIED and run a basic model: model1.t=lm(forced.rate.t~notforced.rate.t) summary(model1.t) you should now be familiar with the format of the output - working with data based on census tracts, we obtain the model expected(forced rate) = *(not forced rate) Notice that the difference in calibrating the model brought about by altering the areal units used for the analysis is notably larger than the difference made by the inclusion of a spatial dependency term in the error model. This is referred to as the Modifiable Areal Unit Problem - first identified in the 1930s, and extensively research by Stan Openshaw in the 1970s and beyond. This variability in results is often the case, and illustrates the importance of the Modifiable Areal Unit Problem as an issue in spatial analysis. A Zone-Free Approach An alternative approach to mapping these crime patterns is to use kernel density estimation. Here we model the relative density of the points as a density surface - essentially a function of location (x,y) representing the relative likelihood of occurrence of an event at that point. If we think of locations in space as a very fine pixel grid, then summing the pixels making up an arbitrary region on the map gives the probability that an event occurs in that area. Spatial Statistics with R: Page 10 of 12
11 For the more mathematically-minded, if f(x,y) is the density function, then the probability that an even occurs in an area A is: f(x, y) dydx (x,y) A Kernel density estimators operate by averaging a small 'bump' (a probability distributioin in 2D, in fact) centred on each observed point. Thus, the approximation to f is given by: ˆf(x, y) = 1 ( x xi k, y y ) i h 1 h 2 h 1 h 2 i in mathematical terms. The function k is the kernel function - that is, the 'bump' described earlier. The h parameters control the smoothness of the estimate. Very small values give rise to very 'spikey' surfaces, and large values to very flat ones. Typically, they are chosen automatically, from the distribution of the points. Here, the function to compute a kernel density estimation is kde.points. This estimates the value of the density over a grid of points, and returns the result as a grid object. It can take two arguments - the set of points to use, and another geographical object, whose bounding box will be the extent of the grid object to be created. The points object breach will be used to produce a kernel surface: breach.dens = kde.points(breach,lims=tracts) This stores the kernel density estimate of breach of peace in a grid object called breach.dens. A quick way of drawing the density is to use the level.plot function: level.plot(breach.dens) This draws a shaded contour plot of the density function. One thing to notice is that this covers a rectangular area - but to give context it would be helpful to add a map of New Haven. For example, to add the Census tracts, type plot(tracts,add=true) Another approach might be to mask out the information outside of the study area. The kde.points function always computes values on a rectangular grid, but part of the grid lies outside of the New Haven area. To overcome this, it is possible to create a mask polygon object. This is simply a normal polygon object, shaped like the rectangle that kde.points produces, but with a hole in it the shape of the study area. In this case the hole is shaped like New Haven. If the mask polygon is plotted over the level plot of the grid data, with both its edges and fill colour being white, the effect is to erase the parts of the density surface lying outside of the study area. This can be achieved using the poly.outer function: masker = poly.outer(breach.dens,tracts,extend=100) The first two parameters give the outer rectangle and the hole shape, respectively. The third parameter actually causes the outer rectangle to extend by a small amount in each direction - sometimes this is useful, since occasionally their is a very slight mismatch between the coordinates of the outer edge of the grid, and the outer edge of the mask Spatial Statistics with R: Page 11 of 12
12 polygon. The erasing technique set out above might then fail to erase a small amount of information on the edge of the grid. The extend parameter avoids this by making the mask polygon s outer edges slightly exceed those of the grid. Here, we extend the edges by 100 feet. Now we have a masking polygon, called masker we can plot this on the map. The quickest way to do this is to use the add.masking command - this is more or less the same as the plot command, but defaults to drawing white filled polygons with white boundaries. Enter add.masking(masker) This erases the part of the density map outside of New Haven. However it has also partly erased the external boundaries of the census tracts. It would probably have been more sensible to draw the tracts after the mask polygon was drawn. A better map can be achieved by entering the commands in this order: level.plot(breach.dens) add.masking(masker) plot(tracts,add=true) Finally, it is also possible to use shading schemes (as seen in practical 2) to draw level plots with different intervals or colours. To do this, the auto.shading function is used as before. The variable to define the shading scheme is the kernel density estimate of the breach.dens object - accessed by breach.dens$kde. The following gives a level plot with 7 levels, drawn as shades of green: breach.dens.shades = auto.shading(breach.dens$kde, n=7,cols=brewer.pal(7,"greens"),cutter=range.cuts) level.plot(breach.dens,shades=breach.dens.shades) add.masking(masker) plot(tracts,add=true) Note the first command is split over two lines. End of Practical At this stage, the practical has finished. To exit R, enter save.image(file='rpract.rdata') detach(data.frame(blocks)) q() Which will save your current variables into a file in your working folder, undo the attach command entered earlier, and quit R. Spatial Statistics with R: Page 12 of 12
Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data
Spatial Patterns We will examine methods that are used to analyze patterns in two sorts of spatial data: Point Pattern Analysis - These methods concern themselves with the location information associated
More informationMultiple Regression White paper
+44 (0) 333 666 7366 Multiple Regression White paper A tool to determine the impact in analysing the effectiveness of advertising spend. Multiple Regression In order to establish if the advertising mechanisms
More informationSection 2.3: Simple Linear Regression: Predictions and Inference
Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple
More informationExercise: Graphing and Least Squares Fitting in Quattro Pro
Chapter 5 Exercise: Graphing and Least Squares Fitting in Quattro Pro 5.1 Purpose The purpose of this experiment is to become familiar with using Quattro Pro to produce graphs and analyze graphical data.
More informationData Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski
Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...
More information( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.
Calibration OVERVIEW... 2 INTRODUCTION... 2 CALIBRATION... 3 ANOTHER REASON FOR CALIBRATION... 4 CHECKING THE CALIBRATION OF A REGRESSION... 5 CALIBRATION IN SIMPLE REGRESSION (DISPLAY.JMP)... 5 TESTING
More informationSoftware Tutorial Session Universal Kriging
Software Tutorial Session Universal Kriging The example session with PG2000 which is described in this and Part 1 is intended as an example run to familiarise the user with the package. This documented
More informationASSIGNMENT 6 Final_Tracts.shp Phil_Housing.mat lnmv %Vac %NW Final_Tracts.shp Philadelphia Housing Phil_Housing_ Using Matlab Eire
ESE 502 Tony E. Smith ASSIGNMENT 6 This final study is a continuation of the analysis in Assignment 5, and will use the results of that analysis. It is assumed that you have constructed the shapefile,
More informationFurther Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables
Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables
More informationUsing the DATAMINE Program
6 Using the DATAMINE Program 304 Using the DATAMINE Program This chapter serves as a user s manual for the DATAMINE program, which demonstrates the algorithms presented in this book. Each menu selection
More informationIntroduction to hypothesis testing
Introduction to hypothesis testing Mark Johnson Macquarie University Sydney, Australia February 27, 2017 1 / 38 Outline Introduction Hypothesis tests and confidence intervals Classical hypothesis tests
More informationRegression Analysis and Linear Regression Models
Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical
More information1. What specialist uses information obtained from bones to help police solve crimes?
Mathematics: Modeling Our World Unit 4: PREDICTION HANDOUT VIDEO VIEWING GUIDE H4.1 1. What specialist uses information obtained from bones to help police solve crimes? 2.What are some things that can
More informationRaster Data. James Frew ESM 263 Winter
Raster Data 1 Vector Data Review discrete objects geometry = points by themselves connected lines closed polygons attributes linked to feature ID explicit location every point has coordinates 2 Fields
More informationMultiple Linear Regression
Multiple Linear Regression Rebecca C. Steorts, Duke University STA 325, Chapter 3 ISL 1 / 49 Agenda How to extend beyond a SLR Multiple Linear Regression (MLR) Relationship Between the Response and Predictors
More information1 Introduction to Using Excel Spreadsheets
Survey of Math: Excel Spreadsheet Guide (for Excel 2007) Page 1 of 6 1 Introduction to Using Excel Spreadsheets This section of the guide is based on the file (a faux grade sheet created for messing with)
More informationGeneral Program Description
General Program Description This program is designed to interpret the results of a sampling inspection, for the purpose of judging compliance with chosen limits. It may also be used to identify outlying
More informationChapter 6: DESCRIPTIVE STATISTICS
Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling
More informationRobust Linear Regression (Passing- Bablok Median-Slope)
Chapter 314 Robust Linear Regression (Passing- Bablok Median-Slope) Introduction This procedure performs robust linear regression estimation using the Passing-Bablok (1988) median-slope algorithm. Their
More informationYear 10 General Mathematics Unit 2
Year 11 General Maths Year 10 General Mathematics Unit 2 Bivariate Data Chapter 4 Chapter Four 1 st Edition 2 nd Edition 2013 4A 1, 2, 3, 4, 6, 7, 8, 9, 10, 11 1, 2, 3, 4, 6, 7, 8, 9, 10, 11 2F (FM) 1,
More informationMultivariate Capability Analysis
Multivariate Capability Analysis Summary... 1 Data Input... 3 Analysis Summary... 4 Capability Plot... 5 Capability Indices... 6 Capability Ellipse... 7 Correlation Matrix... 8 Tests for Normality... 8
More information9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10
St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 9: R 9.1 Random coefficients models...................... 1 9.1.1 Constructed data........................
More informationNearest Neighbor Predictors
Nearest Neighbor Predictors September 2, 2018 Perhaps the simplest machine learning prediction method, from a conceptual point of view, and perhaps also the most unusual, is the nearest-neighbor method,
More informationEDSL Guide for Revit gbxml Files
EDSL Guide for Revit gbxml Files Introduction This guide explains how to create a Revit model in such a way that it will create a good gbxml file. Many geometry issues with gbxml files can be fixed within
More informationFathom Dynamic Data TM Version 2 Specifications
Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other
More informationSection 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business
Section 3.2: Multiple Linear Regression II Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Multiple Linear Regression: Inference and Understanding We can answer new questions
More informationSection 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business
Section 3.4: Diagnostics and Transformations Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Regression Model Assumptions Y i = β 0 + β 1 X i + ɛ Recall the key assumptions
More informationGeostatistics 2D GMS 7.0 TUTORIALS. 1 Introduction. 1.1 Contents
GMS 7.0 TUTORIALS 1 Introduction Two-dimensional geostatistics (interpolation) can be performed in GMS using the 2D Scatter Point module. The module is used to interpolate from sets of 2D scatter points
More informationSpatial Data Analysis
Spatial Data Analysis Laboratory Exercises by Luc Anselin University of Illinois, Urbana-Champaign anselin@uiuc.edu Prepared for the Workshop in Spatial Analysis Wharton School University of Pennsylvania
More information1. Estimation equations for strip transect sampling, using notation consistent with that used to
Web-based Supplementary Materials for Line Transect Methods for Plant Surveys by S.T. Buckland, D.L. Borchers, A. Johnston, P.A. Henrys and T.A. Marques Web Appendix A. Introduction In this on-line appendix,
More informationIntroduction. Product List. Design and Functionality 1/10/2013. GIS Seminar Series 2012 Division of Spatial Information Science
Introduction Open GEODA GIS Seminar Series 2012 Division of Spatial Information Science University of Tsukuba H.Malinda Siriwardana The GeoDa Center for Geospatial Analysis and Computation develops state
More informationVOID FILL ACCURACY MEASUREMENT AND PREDICTION USING LINEAR REGRESSION VOID FILLING METHOD
VOID FILL ACCURACY MEASUREMENT AND PREDICTION USING LINEAR REGRESSION J. Harlan Yates, Mark Rahmes, Patrick Kelley, Jay Hackett Harris Corporation Government Communications Systems Division Melbourne,
More informationFUNCTIONS AND MODELS
1 FUNCTIONS AND MODELS FUNCTIONS AND MODELS In this section, we assume that you have access to a graphing calculator or a computer with graphing software. FUNCTIONS AND MODELS 1.4 Graphing Calculators
More informationD-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview
Chapter 888 Introduction This procedure generates D-optimal designs for multi-factor experiments with both quantitative and qualitative factors. The factors can have a mixed number of levels. For example,
More informationLinear Methods for Regression and Shrinkage Methods
Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors
More informationA quick introduction to First Bayes
A quick introduction to First Bayes Lawrence Joseph October 1, 2003 1 Introduction This document very briefly reviews the main features of the First Bayes statistical teaching package. For full details,
More informationExcel Primer CH141 Fall, 2017
Excel Primer CH141 Fall, 2017 To Start Excel : Click on the Excel icon found in the lower menu dock. Once Excel Workbook Gallery opens double click on Excel Workbook. A blank workbook page should appear
More informationMath 7 Glossary Terms
Math 7 Glossary Terms Absolute Value Absolute value is the distance, or number of units, a number is from zero. Distance is always a positive value; therefore, absolute value is always a positive value.
More informationMiddle School Summer Review Packet for Abbott and Orchard Lake Middle School Grade 7
Middle School Summer Review Packet for Abbott and Orchard Lake Middle School Grade 7 Page 1 6/3/2014 Area and Perimeter of Polygons Area is the number of square units in a flat region. The formulas to
More informationMiddle School Summer Review Packet for Abbott and Orchard Lake Middle School Grade 7
Middle School Summer Review Packet for Abbott and Orchard Lake Middle School Grade 7 Page 1 6/3/2014 Area and Perimeter of Polygons Area is the number of square units in a flat region. The formulas to
More informationWeek 4: Simple Linear Regression III
Week 4: Simple Linear Regression III Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Goodness of
More informationSurvey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9
Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9 Contents 1 Introduction to Using Excel Spreadsheets 2 1.1 A Serious Note About Data Security.................................... 2 1.2
More informationSection 2.1: Intro to Simple Linear Regression & Least Squares
Section 2.1: Intro to Simple Linear Regression & Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 Regression:
More informationSalary 9 mo : 9 month salary for faculty member for 2004
22s:52 Applied Linear Regression DeCook Fall 2008 Lab 3 Friday October 3. The data Set In 2004, a study was done to examine if gender, after controlling for other variables, was a significant predictor
More informationGraph Structure Over Time
Graph Structure Over Time Observing how time alters the structure of the IEEE data set Priti Kumar Computer Science Rensselaer Polytechnic Institute Troy, NY Kumarp3@rpi.edu Abstract This paper examines
More informationTable of Contents (As covered from textbook)
Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression
More informationDifference Between Dates Case Study 2002 M. J. Clancy and M. C. Linn
Difference Between Dates Case Study 2002 M. J. Clancy and M. C. Linn Problem Write and test a Scheme program to compute how many days are spanned by two given days. The program will include a procedure
More informationStatistical Good Practice Guidelines. 1. Introduction. Contents. SSC home Using Excel for Statistics - Tips and Warnings
Statistical Good Practice Guidelines SSC home Using Excel for Statistics - Tips and Warnings On-line version 2 - March 2001 This is one in a series of guides for research and support staff involved in
More informationSTATS PAD USER MANUAL
STATS PAD USER MANUAL For Version 2.0 Manual Version 2.0 1 Table of Contents Basic Navigation! 3 Settings! 7 Entering Data! 7 Sharing Data! 8 Managing Files! 10 Running Tests! 11 Interpreting Output! 11
More informationSTA 570 Spring Lecture 5 Tuesday, Feb 1
STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row
More informationExploratory model analysis
Exploratory model analysis with R and GGobi Hadley Wickham 6--8 Introduction Why do we build models? There are two basic reasons: explanation or prediction [Ripley, 4]. Using large ensembles of models
More informationSection 18-1: Graphical Representation of Linear Equations and Functions
Section 18-1: Graphical Representation of Linear Equations and Functions Prepare a table of solutions and locate the solutions on a coordinate system: f(x) = 2x 5 Learning Outcome 2 Write x + 3 = 5 as
More informationMean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242
Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Creation & Description of a Data Set * 4 Levels of Measurement * Nominal, ordinal, interval, ratio * Variable Types
More informationBivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017
Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 4, 217 PDF file location: http://www.murraylax.org/rtutorials/regression_intro.pdf HTML file location:
More informationExample 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1
Panel data set Consists of n entities or subjects (e.g., firms and states), each of which includes T observations measured at 1 through t time period. total number of observations : nt Panel data have
More informationBeyond The Vector Data Model - Part Two
Beyond The Vector Data Model - Part Two Introduction Spatial Analyst Extension (Spatial Analysis) What is your question? Selecting a method of analysis Map Algebra Who is the audience? What is Spatial
More informationMath 227 EXCEL / MEGASTAT Guide
Math 227 EXCEL / MEGASTAT Guide Introduction Introduction: Ch2: Frequency Distributions and Graphs Construct Frequency Distributions and various types of graphs: Histograms, Polygons, Pie Charts, Stem-and-Leaf
More informationLecture 2 Map design. Dr. Zhang Spring, 2017
Lecture 2 Map design Dr. Zhang Spring, 2017 Model of the course Using and making maps Navigating GIS maps Map design Working with spatial data Geoprocessing Spatial data infrastructure Digitizing File
More informationWeek 7 Picturing Network. Vahe and Bethany
Week 7 Picturing Network Vahe and Bethany Freeman (2005) - Graphic Techniques for Exploring Social Network Data The two main goals of analyzing social network data are identification of cohesive groups
More informationStatistics Lab #7 ANOVA Part 2 & ANCOVA
Statistics Lab #7 ANOVA Part 2 & ANCOVA PSYCH 710 7 Initialize R Initialize R by entering the following commands at the prompt. You must type the commands exactly as shown. options(contrasts=c("contr.sum","contr.poly")
More informationST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.
ST512 Fall Quarter, 2005 Exam 1 Name: Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. 1. (42 points) A random sample of n = 30 NBA basketball
More informationStatistical Analysis of MRI Data
Statistical Analysis of MRI Data Shelby Cummings August 1, 2012 Abstract Every day, numerous people around the country go under medical testing with the use of MRI technology. Developed in the late twentieth
More informationGlossary Common Core Curriculum Maps Math/Grade 6 Grade 8
Glossary Common Core Curriculum Maps Math/Grade 6 Grade 8 Grade 6 Grade 8 absolute value Distance of a number (x) from zero on a number line. Because absolute value represents distance, the absolute value
More informationdemonstrate an understanding of the exponent rules of multiplication and division, and apply them to simplify expressions Number Sense and Algebra
MPM 1D - Grade Nine Academic Mathematics This guide has been organized in alignment with the 2005 Ontario Mathematics Curriculum. Each of the specific curriculum expectations are cross-referenced to the
More informationQuadratic Functions CHAPTER. 1.1 Lots and Projectiles Introduction to Quadratic Functions p. 31
CHAPTER Quadratic Functions Arches are used to support the weight of walls and ceilings in buildings. Arches were first used in architecture by the Mesopotamians over 4000 years ago. Later, the Romans
More informationComputer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 14
Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 14 Scan Converting Lines, Circles and Ellipses Hello everybody, welcome again
More informationCatalan Numbers. Table 1: Balanced Parentheses
Catalan Numbers Tom Davis tomrdavis@earthlink.net http://www.geometer.org/mathcircles November, 00 We begin with a set of problems that will be shown to be completely equivalent. The solution to each problem
More informationAnima-LP. Version 2.1alpha. User's Manual. August 10, 1992
Anima-LP Version 2.1alpha User's Manual August 10, 1992 Christopher V. Jones Faculty of Business Administration Simon Fraser University Burnaby, BC V5A 1S6 CANADA chris_jones@sfu.ca 1992 Christopher V.
More informationAn Experiment in Visual Clustering Using Star Glyph Displays
An Experiment in Visual Clustering Using Star Glyph Displays by Hanna Kazhamiaka A Research Paper presented to the University of Waterloo in partial fulfillment of the requirements for the degree of Master
More informationChapter 4: Analyzing Bivariate Data with Fathom
Chapter 4: Analyzing Bivariate Data with Fathom Summary: Building from ideas introduced in Chapter 3, teachers continue to analyze automobile data using Fathom to look for relationships between two quantitative
More informationExercise 2.23 Villanova MAT 8406 September 7, 2015
Exercise 2.23 Villanova MAT 8406 September 7, 2015 Step 1: Understand the Question Consider the simple linear regression model y = 50 + 10x + ε where ε is NID(0, 16). Suppose that n = 20 pairs of observations
More informationDecimals should be spoken digit by digit eg 0.34 is Zero (or nought) point three four (NOT thirty four).
Numeracy Essentials Section 1 Number Skills Reading and writing numbers All numbers should be written correctly. Most pupils are able to read, write and say numbers up to a thousand, but often have difficulty
More informationDesign of Experiments
Seite 1 von 1 Design of Experiments Module Overview In this module, you learn how to create design matrices, screen factors, and perform regression analysis and Monte Carlo simulation using Mathcad. Objectives
More informationYear 5 Maths Overview. Autumn Spring Summer
Known Facts Year 5 Maths Overview End of Y 4 Use knowledge of addition and subtraction facts and place value to derive sums and differences of pairs of multiples of 10,100 or 1000 Identify the doubles
More informationYour Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression
Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Objectives: 1. To learn how to interpret scatterplots. Specifically you will investigate, using
More informationLinear Modeling with Bayesian Statistics
Linear Modeling with Bayesian Statistics Bayesian Approach I I I I I Estimate probability of a parameter State degree of believe in specific parameter values Evaluate probability of hypothesis given the
More informationCHAPTER 2 TEXTURE CLASSIFICATION METHODS GRAY LEVEL CO-OCCURRENCE MATRIX AND TEXTURE UNIT
CHAPTER 2 TEXTURE CLASSIFICATION METHODS GRAY LEVEL CO-OCCURRENCE MATRIX AND TEXTURE UNIT 2.1 BRIEF OUTLINE The classification of digital imagery is to extract useful thematic information which is one
More informationRSM Split-Plot Designs & Diagnostics Solve Real-World Problems
RSM Split-Plot Designs & Diagnostics Solve Real-World Problems Shari Kraber Pat Whitcomb Martin Bezener Stat-Ease, Inc. Stat-Ease, Inc. Stat-Ease, Inc. 221 E. Hennepin Ave. 221 E. Hennepin Ave. 221 E.
More informationDownloaded from
UNIT 2 WHAT IS STATISTICS? Researchers deal with a large amount of data and have to draw dependable conclusions on the basis of data collected for the purpose. Statistics help the researchers in making
More informationArcView QuickStart Guide. Contents. The ArcView Screen. Elements of an ArcView Project. Creating an ArcView Project. Adding Themes to Views
ArcView QuickStart Guide Page 1 ArcView QuickStart Guide Contents The ArcView Screen Elements of an ArcView Project Creating an ArcView Project Adding Themes to Views Zoom and Pan Tools Querying Themes
More informationPractical 2: Using Minitab (not assessed, for practice only!)
Practical 2: Using Minitab (not assessed, for practice only!) Instructions 1. Read through the instructions below for Accessing Minitab. 2. Work through all of the exercises on this handout. If you need
More informationLearner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display
CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &
More information36-402/608 HW #1 Solutions 1/21/2010
36-402/608 HW #1 Solutions 1/21/2010 1. t-test (20 points) Use fullbumpus.r to set up the data from fullbumpus.txt (both at Blackboard/Assignments). For this problem, analyze the full dataset together
More informationLab 12: Sampling and Interpolation
Lab 12: Sampling and Interpolation What You ll Learn: -Systematic and random sampling -Majority filtering -Stratified sampling -A few basic interpolation methods Videos that show how to copy/paste data
More informationDATA MODELS IN GIS. Prachi Misra Sahoo I.A.S.R.I., New Delhi
DATA MODELS IN GIS Prachi Misra Sahoo I.A.S.R.I., New Delhi -110012 1. Introduction GIS depicts the real world through models involving geometry, attributes, relations, and data quality. Here the realization
More informationSchool of Computer Science CPS109 Course Notes Set 7 Alexander Ferworn Updated Fall 15 CPS109 Course Notes 7
CPS109 Course Notes 7 Alexander Ferworn Unrelated Facts Worth Remembering The most successful people in any business are usually the most interesting. Don t confuse extensive documentation of a situation
More informationnumber Understand the equivalence between recurring decimals and fractions
number Understand the equivalence between recurring decimals and fractions Using and Applying Algebra Calculating Shape, Space and Measure Handling Data Use fractions or percentages to solve problems involving
More informationPoints Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked
Plotting Menu: QCExpert Plotting Module graphs offers various tools for visualization of uni- and multivariate data. Settings and options in different types of graphs allow for modifications and customizations
More information2017 Summer Review for Students Entering Pre-Algebra 7 & Pre-Algebra 8
1. Area and Perimeter of Polygons 2. Multiple Representations of Portions 3. Multiplying Fractions and Decimals 4. Order of Operations 5. Writing and Evaluating Algebraic Expressions 6. Simplifying Expressions
More informationChapter 1. Math review. 1.1 Some sets
Chapter 1 Math review This book assumes that you understood precalculus when you took it. So you used to know how to do things like factoring polynomials, solving high school geometry problems, using trigonometric
More information(Refer Slide Time: 00:02:00)
Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 18 Polyfill - Scan Conversion of a Polygon Today we will discuss the concepts
More informationResampling Methods. Levi Waldron, CUNY School of Public Health. July 13, 2016
Resampling Methods Levi Waldron, CUNY School of Public Health July 13, 2016 Outline and introduction Objectives: prediction or inference? Cross-validation Bootstrap Permutation Test Monte Carlo Simulation
More informationEnter your UID and password. Make sure you have popups allowed for this site.
Log onto: https://apps.csbs.utah.edu/ Enter your UID and password. Make sure you have popups allowed for this site. You may need to go to preferences (right most tab) and change your client to Java. I
More informationBIOL 458 BIOMETRY Lab 10 - Multiple Regression
BIOL 458 BIOMETRY Lab 0 - Multiple Regression Many problems in biology science involve the analysis of multivariate data sets. For data sets in which there is a single continuous dependent variable, but
More information(Refer Slide Time: 00:03:51)
Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture 17 Scan Converting Lines, Circles and Ellipses Hello and welcome everybody
More information. predict mod1. graph mod1 ed, connect(l) xlabel ylabel l1(model1 predicted income) b1(years of education)
DUMMY VARIABLES AND INTERACTIONS Let's start with an example in which we are interested in discrimination in income. We have a dataset that includes information for about 16 people on their income, their
More informationSlide Copyright 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION. Chapter 13. Statistics Sampling Techniques
SEVENTH EDITION and EXPANDED SEVENTH EDITION Slide - Chapter Statistics. Sampling Techniques Statistics Statistics is the art and science of gathering, analyzing, and making inferences from numerical information
More informationChapter 8: Implementation- Clipping and Rasterization
Chapter 8: Implementation- Clipping and Rasterization Clipping Fundamentals Cohen-Sutherland Parametric Polygons Circles and Curves Text Basic Concepts: The purpose of clipping is to remove objects or
More informationPoisson Regression and Model Checking
Poisson Regression and Model Checking Readings GH Chapter 6-8 September 27, 2017 HIV & Risk Behaviour Study The variables couples and women_alone code the intervention: control - no counselling (both 0)
More informationMeasurement and Geometry (M&G3)
MPM1DE Measurement and Geometry (M&G3) Please do not write in this package. Record your answers to the questions on lined paper. Make notes on new definitions such as midpoint, median, midsegment and any
More information