You will begin by exploring the locations of the long term care facilities in Massachusetts using descriptive statistics.

Similar documents
Introduction to GIS & Mapping: ArcGIS Desktop

Exercise 1: An Overview of ArcMap and ArcCatalog

GEO 580 Lab 4 Geostatistical Analysis

STUDENT PAGES GIS Tutorial Treasure in the Treasure State

Geographical Information Systems Institute. Center for Geographic Analysis, Harvard University. LAB EXERCISE 1: Basic Mapping in ArcMap

An Introduction to EDA with GeoDa

GEOG 487 Lesson 4: Step-by-Step Activity

CHAPTER 3: Data Description

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

-In windows explorer navigate to your Exercise_4 folder and right-click the DEC_10_SF1_P1.csv file and choose Open With > Notepad.

Atmospheric Sciences

Geography 281 Map Making with GIS Project Three: Viewing Data Spatially

Add to the ArcMap layout the Census dataset which are located in your Census folder.

Lab 12: Sampling and Interpolation

Introduction to GIS 2011

In this exercise, you will convert labels into geodatabase annotation so you can edit the text features.

GEO 465/565 Lab 6: Modeling Landslide Susceptibility

Working with Elevation Data URPL 969 Applied GIS Workshop: Rethinking New Orleans After Hurricane Katrina Spring 2006

Using Spatial Data in a Desktop GIS; QGIS 2.8 Practical 2

Chapter 2: The Normal Distribution

1. Start ArcMap by going to the Start menu > All Programs > ArcGIS > ArcMap.

Section 1: Introduction to Arc GIS 10

GIS Exercise - Spring, 2011

Tutorial 1 Exploring ArcGIS

Averages and Variation

Making a Sackville Coastal Flood Map

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

Basic Tasks in ArcGIS 10.3.x

IT 403 Practice Problems (1-2) Answers

Create a Color-Shaded Map

Hot Spot / Kernel Density Analysis: Calculating the Change in Uganda Conflict Zones

Page 1. Graphical and Numerical Statistics

ArcCatalog or the ArcCatalog tab in ArcMap ArcCatalog or the ArcCatalog tab in ArcMap ArcCatalog or the ArcCatalog tab in ArcMap

4. If you are prompted to enable hardware acceleration to improve performance, click

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

GIS LAB 1. Basic GIS Operations with ArcGIS. Calculating Stream Lengths and Watershed Areas.

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

QGIS LAB SERIES GST 102: Spatial Analysis Lab 3: Advanced Attributes and Spatial Queries for Data Exploration

Spatial Data Analysis

Geography 281 Mapmaking with GIS Project One: Exploring the ArcMap Environment

Chapter 6: DESCRIPTIVE STATISTICS

Explore some of the new functionality in ArcMap 10

Intro to GIS (requirements: basic Windows computer skills and a flash drive)

Your Name: Section: 2. To develop an understanding of the standard deviation as a measure of spread.

hvpcp.apr user s guide: set up and tour

DAY 52 BOX-AND-WHISKER

Answer the following general questions: 1. What happens when you right click on an icon on your desktop? When you left double click on an icon?

Geog 459: Geographic Information Systems. Lesson 2 Laboratory Exercise

Exercise One: Estimating The Home Range Of An Individual Animal Using A Minimum Convex Polygon (MCP)

Exercise Producing Thematic Maps for Dissemination

v SMS 12.2 Tutorial Observation Prerequisites Requirements Time minutes

Using Microsoft Word. Text Editing

Creating a Smaller Data Set from a Larger Data Set Vector Data

for ArcSketch Version 1.1 ArcSketch is a sample extension to ArcGIS. It works with ArcGIS 9.1

GIS Virtual Workshop: Buffering

Community Health Maps Lab Series

ASSIGNMENT 3 Cobalt data T:\sys502\arcview\projects\Cobalt_2 T:\sys502\arcview\ projects\cobalt_2\cobalt_2.mxd T:\sys502\arcview\projects\Cobalt_2

I CALCULATIONS WITHIN AN ATTRIBUTE TABLE

Geography 281 Mapmaking with GIS Project One: Exploring the ArcMap Environment

Lab 12: Sampling and Interpolation

INTRODUCTION TO GIS WORKSHOP EXERCISE

Spatial Interpolation & Geostatistics

Raster Suitability Analysis: Siting a Wind Farm Facility North Of Beijing, China

Geocoding Reference USA data in ArcMap 9.3

MIS 0855 Data Science (Section 006) Fall 2017 In-Class Exercise (Day 15) Creating Interactive Dashboards

Microsoft Excel 2007 Lesson 7: Charts and Comments

Geocoding vs. Add XY Data using Reference USA data in ArcMap

Understanding and Comparing Distributions. Chapter 4

Gridding and Contouring in Geochemistry for ArcGIS

STA Module 4 The Normal Distribution

STA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves

Introduction to GIS A Journey Through Gale Crater

GEOGRAPHY 426 LAB 4: Choropleth Maps

Project 2 CIVL 3161 Advanced Editing

Schematics in ArcMap Tutorial

GIS Basics for Urban Studies

Chapter 2 Describing, Exploring, and Comparing Data

Spatial Interpolation - Geostatistics 4/3/2018

Lab 1: Introduction to ArcGIS

Raster Suitability Analysis: Siting a Wind Farm Facility North Of Beijing, China

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

GIS Exercise 10 March 30, 2018 The USGS NCGMP09v11 tools

Name: Date: June 27th, 2011 GIS Boot Camps For Educators Lecture_3

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Lab 1: Exploring ArcMap and ArcCatalog In this lab, you will explore the ArcGIS applications ArcCatalog and ArcMap. You will learn how to use

Introduction to SAGA GIS

Step by Step GIS. Section 1

Tutorial 1: Finding and Displaying Spatial Data Using ArcGIS

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Lesson 8 : How to Create a Distance from a Water Layer

Name: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution

Spatial Interpolation & Geostatistics

Georeferencing and Digitizing

Frequency Distributions

Exercise 1: Getting to know ArcGIS

Using Microsoft Word. Working With Objects

Construction of Water Table Maps Using GIS

Getting to Know ArcGIS Pro

Transcription:

Getting Started 1. Create a folder on the desktop and call it your last name. 2. Copy and paste the data you will need to your folder from the folder specified by the instructor. Exercise 1: Explore the location of your data In this exercise, you will begin to explore the characteristics of your spatial data. You are a developer interested in building a long term care facility. You want to learn more about the locations and sizes (measured by number of beds) of the existing facilities. You will begin by exploring the locations of the long term care facilities in Massachusetts using descriptive statistics. Step 1: Open your data 1. Start ArcMap. 2. Indicate that you want to open a new blank map. 3. Use the add data button ( ) to add longtermcarebeds.shp to your map. If you don t see your folder on the desktop, use the connect to button ( ) to navigate to the desktop. 4. Click on the drop down arrow next to the add data button and select a basemap. If it doesn t appear when you first add it, try zooming in or out. This is a map of long term care facilities in Massachusetts as of 2007. Step 2: Calculate the mean center of your data Now you will calculate the mean center, or average location, of your spatial data. 1. Open the mean center tool by clicking on the tool box icon. If you don t see the toolbox icon, click on the Geoprocessing dropdown menu and select Arc Toolbox. 2. Expand Spatial Statistics > Measuring geographic distributions 3. Double click on Mean Center 4. For the input feature class, select the longtermcarebeds layer. 5. For the output feature class, click Browse. Navigate to your folder on the desktop. 6. Name your output feature class MeanCenter. Click Save. 7. Leave the optional fields blank. 1

8. Click OK to run the tool. When the tool finishes running, the new MeanCenter layer will appear in your table of contents. 9. Change the color and size of your mean center so it stands out on your map by double clicking on the symbol in your table of contents window. Q: Where is the mean center? Near Framingham Q: What does this tell you about the spatial distribution of your dataset? While there are clusters of long term care facilities near cities in western and central Massachusetts, there are more facilities in eastern Massachusetts, causing the mean center to be located in the eastern part of the state. 2

Step 3: Calculate the median center of your data Now that you have calculated the mean center of your data, you will calculate the median center, or middle location. 1. Double click on median center from the tool box. 2. For the input feature class, select your longtermcarebeds layer. 3. For the output feature class, navigate to your desktop folder. Name your output feature class MedianCenter. Click Save. 4. Leave the optional fields blank and click OK to run the tool. When the tool finishes running, the new MedianCenter layer will appear in your table of contents. 5. Change the color and size of your median center so it stands out in your map. Q: Where is the median center located? Near Newton, Needham, and Wellesley Q: How does this compare with your mean center? The median center is further east. Q: What does this tell you about the spatial distribution of your dataset? The long term care facilities in the far western part of the state have less influence on the median than the mean. 3

Step 4: Calculate the directional distribution of your data To see if any directional trends are apparent, you will look at how your data are distributed across Massachusetts. 1. Double-click the Directional Distribution (Standard Deviational Ellipse) tool. 2. For the input feature class, click the arrow to open the drop-down menu and select the longtermcarebeds layer. 3. For the output ellipse feature class, click Browse. Navigate to your folder on the desktop. 4. Name your output feature class DirectionalDistribution. Click Save. 5. Verify that the Ellipse Size is set to 1_STANDARD_DEVIATION. 6. Leave the optional fields blank and click OK to run the tool. When the tool finishes running, the new DirectionalDistribution layer will appear in your table of contents. 1. Double-click DirectionalDistribution to open your layer properties. 2. On the Display tab, set the Transparent value to 50%. 3. Click OK. 4

4. Right-click DirectionalDistribution and select Open Attribute Table. Use the Rotation value to answer the next question. Q: What is the orientation of your ellipse? The rotation is 99.7, which means the ellipse runs roughly east-west in orientation. This means that there is more variation in the x-values than in the y-values of your data. This makes sense, since your data stretches farther west than it stretches to the north or south. 5. Close your attribute table. The directional distribution ellipse is quite large, covering a great deal of your study area. This represents a large variation between locations. 1. Now that you have finished this stage of your data exploration, you will save your map. 2. From the File menu, select Save As. 3. Navigate to your desktop folder. Name your map SpatialPatterns. 4. You can now turn off the mean, median, and directional ellipse layers. Keep your map open for the next exercise. In this exercise, you explored the spatial distribution of your data to answer the question, "What is the spatial/geographical location of my data?" When analyzing spatial data, remember that it is important to evaluate the characteristics of your data locations as well as your data values. 5

Exercise 2: Explore your data distribution Now you want to examine the different sizes of the facilities using the number of beds. This will help you determine the range of sizes and the size of a typical facility, which will inform your development plans. Step 1: Explore your feature attributes In this step, you will look at the descriptive statistics for your data values. This simple step provides a foundational context for exploring your data values. 1. Move longtermcarebeds to the top of your table of contents if it is not already there. 2. In your table of contents, right-click longtermcarebeds and select Open Attribute Table. 3. Right-click the TOTAL_BEDS field name and select Statistics. The Statistics window provides descriptive information about your dataset. In this case, the Statistics window describes the overall number of long term care facilities and information on the number of beds in each. Q: How many facilities are in your dataset? 551 Q: What is the highest number of beds reported in a facility? 366 Q: What is the lowest number of beds reported in a facility? 5 4. Close the Statistics window. 5. Close the Table window. Step 2: Display a histogram 1. From the Customize menu, select Extensions. 2. In the Extensions dialog box, check Geostatistical Analyst. 3. Click Close. 4. From the Customize menu, select Toolbars > Geostatistical Analyst. 5. From the Geostatistical Analyst menu, select Explore Data > Histogram. 6

The Histogram window opens. Make sure longtermcarebeds is selected in the Layer drop down menu. Because there is more than one point at one location, ArcMap asks you how the value at that location should be interpreted. Select Use Mean and click OK. The histogram will display the frequency of your data values across the range of your data. By default it displays the first column in your data table, which is OBJECTID. From the Attribute window, select TOTAL_BEDS. The number of bars you should use in a histogram is based on the number of data points you are examining. For this dataset, you will keep the default setting of 10 bars. Step 3: Interpret your histogram In this step, you will compare the values provided by your histogram against the characteristics of normally distributed data. Using your histogram, answer the following questions. Q: What is the mean value of your data? 96.975 Q: What is the median value of your data? 100. Half of the facilities have more beds than this, while half have fewer beds. Q: Are your mean and median values similar or not similar? similar Q: What is your skewness value? 0.53251 Q: What does this skewness value mean? The data is not perfectly symmetrical. Q: Does the pattern of the histogram follow a bell-shaped (Gaussian) curve? Somewhat, although the first bar is a bit high and there are outliers all the way to the right, indicating a slight right skew. 7

Step 4: Look for outliers Data values that are much smaller or much larger compared to the rest of your data are candidates to be outliers. These values can skew your analysis results. 1. In your histogram graph, click on the data values to the right to select them. This selects the feature in both the histogram window and in your map. This is known as brushing. The selected point will remain selected in the map and in your exploratory analysis windows until you clear the selection. This is known as linking. 2. Move the Histogram window aside to view your map. 3. Use the zoom tools to zoom into the facilities you selected. 4. Identify the two selected facilities. Q: What are the names of your selected long term care facilities? St Patricks Manor and Marian Manor 5. Close the Identify window. 6. Close your Histogram window. 7. From the Tools toolbar, select Clear Selected Features. Outliers are valuable within context. They can often give you additional information about a dataset that you may otherwise overlook. Step 5: Create and Interpret a normal QQ plot 1. From the Geostatistical Analyst menu, select Explore Data > Normal QQPlot. 2. Again, select Use Mean. 3. Select TOTAL_BEDS from the Attribute drop down menu. 8

The normal QQ plot will allow you to compare the distribution of your data against the normal distribution. Using your normal QQ plot, answer the following questions. Q: Does the center of the plot fall close to the reference line? Most of the plot falls close to the reference line. There is more variation from the reference line to the right of the median (standard normal value of zero) than to the left. Q: Are there any values at the tails of the plot that do not fall close to the reference line? The left tail falls close to the reference line. The right tail has a few values that do not fall near the reference line. 1. Draw a box on the QQ plot to select the three data points at the right tail of the plot, in the box in the upper right corner. 2. Close the Normal QQPlot window. 3. Identify the selected facilities on your map. Q: What are the names of your selected long term care facilities? St Patricks Manor, Marian Manor, and Catholic Memorial Home. This is an example of using the normal QQ plot to identify outliers in your data. Potential outliers are values at the tails that do not fall near the reference line. In this case, they are long term care facilities that have a lot more beds than the average facility. 4. Close the Identify window. 9

Step 6: Examine outliers in the attribute table 1. In the table of contents, right-click longtermcarebeds and select Open Attribute Table. 2. Right-click the TOTAL_BEDS field name and select Sort Descending. Q: What do you notice about the top three facilities? There is about a 30 bed different between each of them and also a 35 bed difference between the 3 rd and 4 th facilities. Starting with the 4 th facility, the number of beds are more similar and there is less of a difference between facilities. 3. Close the Table window. 4. Click Clear Selected Features ( ) Next you will explore the data in Geoda. You will come back to ArcMap so leave it open. Step 7: Create a Box Plot in Geoda to look for outliers 1. Open Geoda. 2. Select File > Open Shapefile and choose longtermcarebeds.shp Notice that Geoda does not display backgrounds maps or layer data. Because of this it runs much faster than ArcMap. 3. Select Explore > Box Plot and choose TOTAL_BEDS as the variable. A box plot with hinge 1.5 will be displayed. 4. Draw a box over the 3 points above the top black, horizontal line, which denotes the upper end of the Interquartile Range. 10

5. Click the open table icon ( ) to open the attribute. 6. Right click on the top of the FAC_NAME column and choose Move Selected to Top Q: Are these the same facilities you identified as outliers in ArcMap? Yes! Step 8: Examine maps in Geoda to look for outliers 1. Close the Table. Click in the Box Plot window to deselect the outliers. 2. Select Map > New Percentile Map and choose TOTAL_BEDS as the variable. 3. Repeat this procedure for both types of box map and the standard deviation map. 4. Arrange your maps and your box plot window so that you can see all of them at once. The data in all the windows are connected. If you click on the category for upper outliers in the hinge=1.5 box plot map, it will select the points on this map, in all other maps, in the table, and in the box plot window. Spend some time examining your data. The selected points turn red, which can be difficult to see. You can right click on individual colors in the legend to change them. You can also experiment with the Options. Select Zooming Mode to zoom in and Panning Mode to exit Zooming Mode. You can also change the background color. Q: What are some things you notice? -There are outliers at hinge=1.5, but not hinge=3.0. -10 facilities fall 3 standard deviations above the mean. Outliers may have influenced the number of facilities in the top category. -6 facilities are in the top 1% of facilities, based on the number of beds -4 facilities are in the bottom 1% of facilities based on the number of beds, but there are no low outliers in the box plot map, standard deviation map, or on the box plot graph. -There may be numerous other things you have discovered about this data. Q: Do you think your data are normally distributed or not? -The mean and median of your data are similar. -The data is slightly skewed. -There are outliers in the data. -Apart from the outliers, your histogram displays a pattern similar to a bell-shaped (Gaussian) curve. -Most of your data falls close to the reference line in your normal QQ plot. Based on these characteristics, your data has many of the characteristics of normally distributed data. It will be unusual to have a dataset that exhibits all of the theoretical characteristics of normally distributed data. Part of the data exploration process is determining how close your data fits this pattern. You can close the maps you just created, but keep Geoda open as we will use it again. 11

Exercise 3: Explore the Variation in your data In this exercise, you will investigate the variation in your long term care facilities data using visual techniques and Voronoi maps. Step 1: Symbolize your features using class breaks In this step, you will visually explore the relationship between your data values and your data locations by re-symbolizing your long term care facilities. 1. Double-click longtermcarebeds to open the Layer Properties dialog box. 2. Click the Symbology tab. 3. In the Show column, select Quantities. 4. In the Fields section, change the Value to TOTAL_BEDS. 5. In the Classification section, verify that the classification method is Natural Breaks (Jenks). The natural breaks classification method will classify the number of beds values based on clusters in the data values. 1. Right-click the Color Ramp and uncheck Graphic View. 2. Change the color ramp to Slope. This is a green-to-red color ramp. 3. Click on the Symbol heading and select Properties for All Symbols. 4. Change the symbol size to 10. 5. Click OK twice. 12

Your long term care facilities are now symbolized based on the reported number of beds. Q: Are there any patterns in the data? Facilities of different sizes seem to be distributed throughout Massachusetts. The facilities with the highest number of beds (in red) tend to be near larger cities, such as Pittsfield, Springfield, and Boston. Step 2: Explore maps in Geoda 1. Select Map > Quantile Map and choose TOTAL_BEDS as the variable. 2. Repeat this procedure make equal interval and natural breaks maps. Notice how different categories are made for each type of map. It is more difficult to see where the different types of facilities are located within the state since there is not a map layer in the background, which is an advantage of using ArcMap. 3. Close your maps and maxmize ArcMap. Step 3: Create your Voronoi Map in ArcMap A Voronoi map is a tool that will help you determine how much variation exists in your data. Some analysis tools require that your data to be stationary or that values a certain distance apart should have a similar difference in values. For data to be stationary, the variation in the data should be consistent across the study area. 1. From the Geostatistical Analyst menu, select Explore Data > Voronoi Map. 2. Select TOTAL_BEDS from the Attribute drop down. 13

3. Right-click the Color Ramp and uncheck Graphic View. 4. Change the color ramp to Slope. 5. Change the Type to Entropy. The entropy Voronoi map displays variation in your data. This variation helps you determine if your data is stationary or not. Q: Based on this map, how would you describe the local variation in your data? There is moderate variation in the data, as most areas are light green, yellow, or orange. Q: Would you consider your data stationary or non-stationary? It s hard to say. They may or may not be stationary. 6. Close the Voronoi map window. 14

Exercise 4: Explore spatial patterns in your data In this exercise, you will examine your long term care data for spatial autocorrelation using a semivariogram cloud. Step 1: Create a semivariogram cloud 1. From the Geostatistical Analyst menu, select Explore Data > Semivariogram/Covariance Cloud. 2. Indicate that you want to use the mean and click OK/Yes when the warning messages appear. 3. Select TOTAL_BEDS from the Attribute drop down list. 4. If necessary, drag the corners of the Semivariogram/Covariance Cloud window to resize the graph. Now you will change the lag size and the number of lags in your semivariogram cloud. The number of lags defines the distance you will investigate in the semivariogram cloud. This allows you to zoom in on the range of the semivariogram. You will change the lag size to the average distance between neighboring long term care facilities. This allows you to zoom the semivariogram into the region of the semivariogram cloud where spatial autocorrelation is most likely to exist. 1. Change the lag size to 74000. (This is in meters, calculated using the point distance tool, which calculates the distance from all points to all other points.) 2. Change the number of lags to 8. Use the semivariogram cloud to answer the following questions. 15

Q: Based on distance, does the data appear to show variation? Yes, as in inverse relationship. As the distance, or values on the x-axis, increases, the semivariance, or values on the y-axis, decrease. Q: Is your data spatially autocorrelated? No, it is the opposite. The data shows more variation at shorter distances and less variation at longer distances. This may be because long term care facilities tend to be clustered around cities so there are many facilities in close proximity to each other. With more facilities near one another, there is a greater chance that they will be of different sizes. Step 2: Explore possible outliers in your data In this step, you will examine a group of values that appear to be out of place. Several points in the middle of your semivariogram cloud are relatively close together but have a high variation in number of beds in their facilities. 1. In the semivariogram cloud, select the points shown below. Tip: Your selection does not need to be exact and may be slightly different than the result graphic. Remember, each point in the semivariogram cloud actually represents the relationship between two long term care facilities on your map. 2. Move the Semivariogram/Covariance Cloud window so you can see your map in ArcMap. 16

There appear to be three long term care facilities linked to the high variation. 3. Identify the facilities that seem to be linked to the variation. You will notice that these were the three facilities with the highest number of beds that we identified as outliers in the previous exercises. Knowing that large, long term care facilities aren t typical will help with your development decisions. 4. Clear your selections on the map and close the semivariogram cloud window. What did you learn? The purpose of this exercise was to learn more about the locations and sizes of the existing long term care facilities to help you identify potential areas for building a new facility. 1. The mean center of the facilities is near Framingham. The median center is slightly further east near Newton, Needham, and Wellesley. The majority of long term care facilities are roughly in an east-west orientation stretching from Boston to just past Worcester. 2. The data are slightly skewed to the right and include some outliers with unusually high numbers of long term care beds. Otherwise the data fall close to the reference line on the normal QQ plot and display a pattern similar to a normal curve. 3. The number of long term care beds is moderately varied based on the Voronoi map. 4. There does not appear to be spatial autocorrelation between the points. While there are obviously many factors that would go into the decision to build a new long term care facility, this initial examination has told you some important information about existing facilities. We know where the majority of the facilities are located within the state and that the size of the facilities varies across the state. Facilities of the same size are not necessarily located in similar locations. We also know the average size of the facilities and the sizes of the largest and smallest facilities, which could help us determine an appropriate size for our facility. 17