Put Your Data on the Map: Using the GEOCODE and GMAP Procedures to Create Bubble Maps in SAS

Similar documents
Custom Map Displays Created with SAS/GRAPH Procedures and the Annotate Facility Debra Miller, National Park Service, Denver, CO

INTRODUCTION TO THE SAS ANNOTATE FACILITY

SAS/GRAPH and ANNOTATE Facility More Than Just a Bunch of Labels and Lines

Data Explore Matrix Quality Control by Exploring and Mining Data in Clinical Study

The Basics of Map Creation with SAS/GRAPH. Mike Zdeb School of Public Health Rensselaer, NY

Annotate Dictionary CHAPTER 11

Creating Regional Maps with Drill-Down Capabilities Deb Cassidy Cardinal Distribution, Dublin, OH

Using ANNOTATE MACROS as Shortcuts

Fly over, drill down, and explore

Geospatial Analysis with PROC GMAP

GREMOVE, Reassign, and let s GMAP! A SAS Trick for Generating Contiguous Map Boundaries for Market-Level Research

Creating Maps in SAS/GRAPH

How to Make an Impressive Map of the United States with SAS/Graph for Beginners Sharon Avrunin-Becker, Westat, Rockville, MD

Creating Forest Plots Using SAS/GRAPH and the Annotate Facility

A Dynamic Imagemap Generator Carol Martell, Highway Safety Research Center, Chapel Hill, NC

It s Not All Relative: SAS/Graph Annotate Coordinate Systems

Working With PROC GMAP and a Note About Census ZCTAs

USING PROC GMAP AND DRILL-DOWN GRAPHICS FOR DATA QUALITY ASSURANCE Meghan Arbogast, Computer Sciences Corporation, Corvallis, OR

The GREMOVE Procedure

Multiple Forest Plots and the SAS System

Controlling Titles. Purpose: This chapter demonstrates how to control various characteristics of the titles in your graphs.

Generating Participant Specific Figures Using SAS Graphic Procedures Carry Croghan and Marsha Morgan, EPA, Research Triangle Park, NC

Using SAS to map petroleum distributions. Saskatoon SUCCESS SAS User Group October 18 th, 2012

MWSUG Paper DV04 SAS/GRAPH and GfK Maps: a Subject Matter Expert Winning Combination Louise S. Hadden, Abt Associates Inc.

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

SAS Visual Analytics 8.2: Getting Started with Reports

Leveraging SAS Visualization Technologies to Increase the Global Competency of the U.S. Workforce

Interactive Graphs from the SAS System

Creating and Modifying Charts

Using SAS Graphics Capabilities to Display Air Quality Data Debbie Miller, National Park Service, Denver, CO

Creating a Basic Chart in Excel 2007

Spreadsheet definition: Starting a New Excel Worksheet: Navigating Through an Excel Worksheet

Yandex.Maps API Background theory

Using Annotate Datasets to Enhance Charts of Data with Confidence Intervals: Data-Driven Graphical Presentation

The GANNO Procedure. Overview CHAPTER 12

Converting Annotate to ODS Graphics. Is It Possible?

Leverage custom geographical polygons in SAS Visual Analytics

Data Annotations in Clinical Trial Graphs Sudhir Singh, i3 Statprobe, Cary, NC

Effective Forecast Visualization With SAS/GRAPH Samuel T. Croker, Lexington, SC

Abstract. Introduction

Do SAS users read books? Using SAS graphics to enhance survey research

ArcGIS Online (AGOL) Quick Start Guide Fall 2018

Market Insight Geo Mapping User Guide v1.0

Full Search Map Tab. This map is the result of selecting the Map tab within Full Search.

The GSLIDE Procedure. Overview. About Text Slides CHAPTER 27

L2 VoterMapping - Display Options

Paper Mapping Roanoke Island Revisited: An OpenStreetMap (OSM) Solution. Barbara Okerson, Anthem Inc.

Middle School Math Course 3 Correlation of the ALEKS course Middle School Math 3 to the Illinois Assessment Framework for Grade 8

Customized Flowcharts Using SAS Annotation Abhinav Srivastva, PaxVax Inc., Redwood City, CA

PharmaSUG Paper TT10 Creating a Customized Graph for Adverse Event Incidence and Duration Sanjiv Ramalingam, Octagon Research Solutions Inc.

Understand the concept of volume M.TE Build solids with unit cubes and state their volumes.

Outbreak Maps: Visual Discovery in Your Data. Jeff Phillips, Data Visualization R&D

7 th Grade STAAR Crunch March 30, 2016

Graphing with Microsoft Excel

Figure 1. Paper Ring Charts. David Corliss, Marketing Associates, Bloomfield Hills, MI

Mathematics - LV 5 (with QuickTables)

Create Flowcharts Using Annotate Facility. Priya Saradha & Gurubaran Veeravel

CURVES OF CONSTANT WIDTH AND THEIR SHADOWS. Have you ever wondered why a manhole cover is in the shape of a circle? This

Mathematics GCSE 9 1 Higher Syllabus. Yes. Does the subject set according to ability? Skills Covered. Unit

RAPIDMAP Geocortex HTML5 Viewer Manual

Day 5: Inscribing and Circumscribing Getting Closer to π: Inscribing and Circumscribing Polygons - Archimedes Method. Goals:

Coders' Corner. Paper ABSTRACT GLOBAL STATEMENTS INTRODUCTION

A SAS Macro to Generate Caterpillar Plots. Guochen Song, i3 Statprobe, Cary, NC

HOUR 12. Adding a Chart

Creating Charts in Office 2007 Table of Contents

Key Terms. Symbology. Categorical attributes. Style. Layer file

CHAPTER 1 Introduction to SAS/GRAPH Software

YEAR 10- Mathematics Term 1 plan

1. The Square limit language

MCAS/DCCAS Mathematics Correlation Chart Grade 6

The Plot Thickens from PLOT to GPLOT

Getting Started with MapInfo Professional Hands On Session 1D

Ohio Tutorials are designed specifically for the Ohio Learning Standards to prepare students for the Ohio State Tests and end-ofcourse

... WHERE. AnnotaI8 Data.S... XSYS & YSYS. Harie Annotate: How Not to Lose Your Head When Enhancing BAS/GRAPH output

Eighth Grade Math Assessment Framework Standard 6A Representations and Ordering

New SAS/GRAPH Features. Jack Bulkley, SAS Institute Inc. GPLOT INTRODUCTION

Developing a Dashboard to Aid in Effective Project Management

ADOBE ILLUSTRATOR CS3

TIMSS 2011 Fourth Grade Mathematics Item Descriptions developed during the TIMSS 2011 Benchmarking

Using Big Data to Visualize People Movement Using SAS Basics

Chapter 2: Descriptive Statistics (Part 1)

Tips for Producing Customized Graphs with SAS/GRAPH Software. Perry Watts, Fox Chase Cancer Center, Philadelphia, PA

Mathematics Year 9-11 Skills and Knowledge Checklist. Name: Class: Set : Premier Date Year 9 MEG :

Posters 417. NESUG '92 Proceedings. usinq Annotate Data sets to Enhance Contour Graphics output. Shi Tao Yeh, Environmental Resources Kanaqement, ~nc.

Econ 2148, spring 2019 Data visualization

GEOMETRY CCR MATH STANDARDS

Beal High School. Mathematics Department. Scheme of Work for years 7 and 8

the undefined notions of point, line, distance along a line, and distance around a circular arc.

Macro Method to use Google Maps and SAS to Geocode a Location by Name or Address

Year 8 Set 2 : Unit 1 : Number 1

Getting Started. What is SAS/SPECTRAVIEW Software? CHAPTER 1

8 TH GRADE MATHEMATICS CHECKLIST Goals 6 10 Illinois Learning Standards A-D Assessment Frameworks Calculators Allowed on ISAT

round decimals to the nearest decimal place and order negative numbers in context

Producing Publication Quality Business Graphs with SASIGRAPH Software

Mathematics - LV 6 Correlation of the ALEKS course Mathematics MS/LV 6 to the Massachusetts Curriculum Framework Learning Standards for Grade 5-6

The GEOCODE Procedure and SAS Visual Analytics

Spreadsheet Warm Up for SSAC Geology of National Parks Modules, 2: Elementary Spreadsheet Manipulations and Graphing Tasks

Year 11 Key Performance Indicators Maths (Number)

While You Were Sleeping - Scheduling SAS Jobs to Run Automatically Faron Kincheloe, Baylor University, Waco, TX

Plot Your Custom Regions on SAS Visual Analytics Geo Maps

Transcription:

Paper 10404-2016 Put Your Data on the Map: Using the GEOCODE and GMAP Procedures to Create Bubble Maps in SAS ABSTRACT Caroline Walker, Warren Rogers Associates A bubble map is a useful tool for identifying trends and visualizing the geographic proximity and intensity of events. This paper describes how to use readily available map data sets in SAS along with PROC GEOCODE and PROC GMAP to turn a data set of addresses and events into a bubble map of the United States with scaled bubbles depicting the location and intensity of events. INTRODUCTION A bubble map, also called a cartogram, is a specialized bubble chart useful for displaying threedimensional data in which the first two dimensions of the data represent the geographic location of an event, and the third dimension is any other quantitative variable of interest. The example illustrated in this paper concerns gasoline theft data. In this context, a geographic bubble map of the theft events is useful for highlighting theft hotspots and providing insights into emerging theft trends. When implemented on a large data set, say all gasoline thefts during a time period of interest, an example bubble map might look something like the example in Figure 1. This paper will describe how to create a similar bubble map using a smaller sample data set. Figure 1. A bubble map of fuel theft locations, created using PROC GMAP. 1

PART 1: CREATING THE BACKGROUND MAP THE BASIC MAP The first step in creating the bubble map is to generate a background map, this is the map of the United States onto which the theft bubbles will later be plotted. The MAPS library in SAS provides the data sets (in map data set format) needed to create maps for a variety of geographic regions. You can use the data set MAPS.STATES to create a map of the contiguous 48 U.S. states. A couple of important things to know about this data set before beginning: The variables X and Y in the data set represent longitude and latitude coordinates, respectively, for the points in the map. The coordinates are given in radians, not degrees, and they represent unprojected spherical coordinates. We will need to convert these into Cartesian coordinates in a later step, so that they can be plotted in two-dimensions with minimal distortions. Additionally, the longitude coordinates given by X increase from east to west. Each latitude and longitude coordinate in the data set has a corresponding DENSITY value between zero and four. You can control the level of detail in the map you create by filtering out points with higher density values. The variable STATE gives the two-digit numeric code identifying the state associated with each set of latitude and longitude coordinates. These codes are the Federal Information Processing Standard (FIPS) state codes, and can be converted to more familiar two letter state abbreviations using the SAS function FIPSTATE(). The variable SEGMENT groups coordinates that should be plotted together as contiguous polygons when the map is being drawn. If you wish to create a map showing only the 48 contiguous U.S. states, with the same level of detail as in Figure 1, then filter out all geographical coordinates associated with Alaska, Hawaii, and Puerto Rico, as well as any points with density values greater than 2 when reading in the MAPS.STATES data set: data MyMap; set maps.states ; if fipstate(state) not in ('AK' 'HI' 'PR'); if density <= 2; To take a look at the plot of the data set you ve just created, you can use the GMAP procedure, but you ll need to make one adjustment first. Add in a dummy response variable (for example, DUMMY = 1) so PROC GMAP has something to plot in the choropleth map it will construct. Setting all values of DUMMY to the same value means all states will be filled with the same color in the resulting map. Then use PROC GMAP to plot the map: data MyMap; set MyMap; dummy = 1; proc gmap data = MyMap map = MyMap; choro dummy / statistic = first nolegend; Unfortunately, the resulting map, shown in Figure 2, leaves something to be desired. It is flipped from east-to-west and some geographical areas appear distorted, a result of plotting the longitude/latitude coordinates in a two-dimensional plane. 2

Figure 2. A plot of data from maps.states, using unprojected coordinates. Figure 3. A plot of data from maps.states, after projecting the coordinates with PROC GPROJECT. The reverse orientation in Figure 2 occurred because the longitude coordinates in the MYMAP data set are increasing from east to west. You can correct this by using the GPROJECT procedure with the WESTLONG option. The GPROJECT procedure will also correct the distortion of geographical areas by applying an Albers projection to the X and Y coordinates. This is the default projection applied unless a different option is specified, for clarity it is shown explicitly in the code below: proc gproject data = mymap out = mymap2 westlong project = albers; id; proc gmap data = mymap2 map = mymap2; choro dummy / statistic = first nolegend; Running these statements generates the map shown in Figure 3. GETTING FANCY: COLORS AND SHADING The background map is looking good, but why not make it a little better? For starters, you can change the fill color used for the states. This is done by including a PATTERN statement prior to the GMAP procedure: pattern1 color = gray; proc gmap data = mymap2 map = mymap2; choro dummy / statistic = first nolegend; There are over 16 million colors available in SAS, if you re short on ideas, check out the SAS Knowledge Base paper TS-688 Defining Colors using Hex Values (web address given in references) for inspiration. Next you might consider changing the color of the state outlines. You can do this by adding a COUTLINE argument when calling the GMAP procedure: proc gmap data = mymap2 map = mymap2; choro dummy / statistic = first nolegend coutline = graycc; Finally, you can add a second, shifted, outline around the outside perimeter of the map to create a slight 3D effect. This is a technique described by Mike Zdeb and Robert Allison in their paper Stretching the Bounds of SAS/Graph Software from SUGI-30. It takes a little more code to do this. You ll need to create an annotation data set with instructions for outlining the boundary lines of the map in a different color (gray, in this example). First, take the data set you ve been using for the background map, MYMAP2, and shift the X and Y coordinates of the points just slightly: 3

data mapshift; set mymap2; x = x + 0.001; y = y - 0.001; Next add information about how PROC GMAP should use these shifted coordinates. For each set of coordinates, specify XSYS = 2 and YSYS = 2 indicating that the X and Y variables in this annotation data set should be treated as part of the same coordinate system (same units) as the data values in the MYMAP2 data set. Also specify WHEN = b, this will cause these shifted perimeter values to be drawn first, before the actual map. Many of these shifted points will fall underneath actual map data points, and so won t be seen after the final map is drawn, creating a shadow effect. Finally assign FUNCTION = poly to the first segment of each state, to specify the beginning point of a polygon. For all other segments in the state assign FUNCTION = polycont, instructing the annotation to continue drawing the polygon that was begun in the first segment for that state. data perim; set mapshift; length function $10; by state segment notsorted; xsys='2'; ysys='2'; when='b'; style='solid'; color='gray33'; if first.segment then function='poly'; else function='polycont'; You can now use this data set as an annotation data set in the CHORO statement of PROC GMAP: pattern1 color = gray; proc gmap data = mymap2 map = mymap2; choro dummy / statistic = first nolegend coutline = graycc anno = perim; This produces the map shown in Figure 4. Figure 4. Background map after formatting. 4

PART 2: ANNOTATE THE MAP WITH BUBBLES Now that the background map is ready, you can add bubbles showing where the thefts occurred. This is done by creating another annotation data set giving the theft locations, to be plotted over the background map. Most data sets typically do not arrive on your desk with latitude and longitude coordinates included. More likely, you might begin with something like the example THEFTS data set below, listing the zip code in which each theft occurred, and the amount of that theft, in gallons. ; data thefts; input zip amount; datalines; 89109 278 80110 861 67212 450 34745 85 75202 1471 The GEOCODE procedure can be used to obtain the longitude and latitude coordinates associated with each of these zip codes: proc geocode data = thefts out = thefts2; The output data set, THEFTS2, will contain variables X and Y, giving the respective longitude and latitude coordinates associated with the geographical center of each zip code. Note however, that the longitude coordinates in this data set increase from west to east, you will need to switch them to match the increasing east to west convention used in the map data set MYMAP2. This is done by multiplying each X value by -1. Also note that the X and Y coordinates in THEFTS2 are given in degrees. You will need to convert these to radians for compatibility with the MYMAP2 data set. Recalling that 360 degrees = 2π radians, you can make these conversions using the SAS code: data thefts3; set thefts2; x=constant('pi')/180 * (-1)*x; y=constant('pi')/180 * y; Before you can add these theft locations to the map, you need to make sure they are projected with the same projection that was applied to the original MYMAP data set in Part 1. To do this combine both data sets and then apply the projection. When combining, add an indicator variable distinguishing the observations that came from the THEFTS3 data set, so that they can easily be separated back out again: data both; set mymap thefts3 (in = A); theftdata = A; proc gproject data = both out = bothproj westlong project = albers; id; data theftsproj mapproj; set bothproj; if theftdata = 1 then output theftsproj; else output mapproj; You now have two data sets: MAPPROJ which contains the same background map you created in Part 1, and THEFTPROJ which contains theft information in location coordinates that are compatible with the 5

background map. Now you need to add some annotation directions to the THEFTPROJ data set, indicating what should be plotted on the map at each of those theft locations. To start, specify XSYS = 2 and YSYS = 2 for each observation, just as was done in the annotation data set in Part 1. Also specify HSYS = 1 indicating that any size coordinates associated with the annotations for these observations should be interpreted as an overall percent of the area spanned by the data. You would like the bubbles created at each of these locations to appear on top of the background map, so specify WHEN = a indicating that these annotations should be applied after the background map is drawn. To draw circles at each location specify FUNCTION = pie, ROTATE = 360, and STYLE = psolid. To start with, you can make uniform red circles at each location, with a radius equal to 1% of the total area spanned by the data (SIZE = 1, COLOR = red ): data theftsproj1; set theftsproj; xsys='2'; ysys='2'; hsys='1'; when = 'A'; function = 'pie'; rotate = 360; size = 1; style='psolid'; color = 'red'; To take a look at the bubbles you just made, add THEFTSPROJ1 as an annotation data set in the PROC GMAP statement: pattern1 color =gray; proc gmap data = mapproj map = mapproj anno = theftsproj1; choro dummy / statistic = first nolegend coutline = graycc anno = perim; This creates the map shown in Figure 5. Figure 5. Map with annotated dots indicating theft locations. This bubble map will be more useful if the size and color of each of the bubbles relates to the volume of the associated theft. Modify the annotation data set by adding IF statements specifying which colors to use for thefts of varying magnitudes. Also make the SIZE variable a calculated value based on the theft AMOUNT. When doing this, it is good practice to make SIZE proportional to the square root of AMOUNT, so that the area of the plotted bubble is proportional to the actual theft amount: 6

data theftsproj2; set theftsproj; format color $10.; xsys='2'; ysys='2'; hsys='1'; when = 'A'; function = 'pie'; rotate = 360; style='psolid'; size = 0.115*sqrt(amount); if amount le 500 then color = 'cxe6e600'; else if amount le 1000 then color = 'cxf08229'; else if amount le 2000 then color = 'cxfa0000'; else color = 'cxbe0000'; Finally, you can make the map a little more polished by adding solid outlines around each of the bubbles. This is done by replicating each observation in the annotation data set, then modifying those replicate observation so that the second circle drawn on top of the original circle uses a darker color (COLOR = gray50 ) and is drawn only in outline, with no fill (STYLE = pempty ): data theftsproj3; set theftsproj; format color $10.; xsys='2'; ysys='2'; hsys='1'; when = 'A'; function = 'pie'; rotate = 360; style='psolid'; size = 0.115*sqrt(amount); if amount le 500 then color = 'cxe6e600'; else if amount le 1000 then color = 'cxf08229'; else if amount le 2000 then color = 'cxfa0000'; else color = 'cxbe0000'; style = 'pempty'; color = 'gray50'; pattern1 color =gray; proc gmap data = mapproj map = mapproj anno = theftsproj3; choro dummy / statistic = first nolegend coutline = graycc anno = perim; This gives the map in Figure 6. Figure 6. Map with bubbles scaled in proportion to theft volume. 7

PART 3: ANNOTATING THE LEGEND No good statistical graphic is complete without a legend. To add a legend to this bubble plot, create an annotation data set with the instructions for drawing the legend. Specify XSYS = 3 and YSYS = 3 indicating that the X and Y coordinates for these observations correspond to percentages of the area spanned by the data (so an observation with coordinates x = 50, y = 50 would be plotted in the center of the graph). Assign HSYS = 1 just as when the bubbles were plotted, so that the radii of the legend bubbles will be plotted using the same scale as the bubbles on the map. Then add one observation to the data set for each bubble you d like to display in the legend. It will likely take some experimenting to get the X and Y coordinates exactly where you d like. data legend; format color function $10. text $50.; xsys='3'; ysys='3'; hsys='1'; when='a'; function='pie'; rotate=360; ; style='psolid'; y =14; x=7; color='cxbe0000'; size = 0.115*sqrt(2000); style='psolid'; y = 13; x=16; color= 'cxfa0000'; size = 0.115*sqrt(1000);style='psolid'; y = 12; x=24; color='cxf08229'; size = 0.115*sqrt(500); style='psolid'; y = 11; x=31; color='cxe6e600'; size = 0.115*sqrt(400); style='psolid'; Next add a second set of observations to the LEGEND data set, describing the text you d like to display beneath each bubble. For each of these observations assign FUNCTION = label. To center the text specify POSITION = 5. Set the variables SIZE, COLOR, and STYLE as desired. Again it may take some experimenting to determine which values of X and Y appropriately position the text beneath each bubble. data legend; format color function $10. text $50.; xsys='3'; ysys='3'; hsys='1'; when='a'; function='pie'; rotate=360; ; style='psolid'; y =14; x=7; color='cxbe0000'; size = 0.115*sqrt(2000); style='psolid'; y = 13; x=16; color= 'cxfa0000'; size = 0.115*sqrt(1000);style='psolid'; y = 12; x=24; color='cxf08229'; size = 0.115*sqrt(500); style='psolid'; y = 11; x=31; color='cxe6e600'; size = 0.115*sqrt(400); style='psolid'; function='label'; position='5'; size=2; color = "black"; style='times amt/bold'; y=8; x=7; text="2,000+"; x=16; text="1,000+"; x=24; text="500+"; x=31; text="under 500"; y = 5.5; x=7; text="gallons"; x=16; text="gallons"; x=24; text="gallons"; x=31; text="gallons"; To display the legend on the map, combine it with the existing annotation data set THEFTSPROJ3 created in Part 2, and use the new combined data set as the annotation data set in the GMAP procedure: 8

data thefts_and_legend; set legend theftsproj3; pattern1 color =gray; proc gmap data = mapproj map = mapproj anno = thefts_and_legend; choro dummy / statistic = first nolegend coutline = graycc anno = perim; This generates the map shown in Figure 7. Figure 7. Bubble map with annotated legend. CONCLUSION The steps outlined above can be used to create bubble maps for data from a variety of contexts. Begin by using PROC GMAP to create a background map for the geographical region of interest. Next, use PROC GEOCODE and PROC GPROJECT to convert the locations of the bubble events into coordinates compatible with the background map. Then add instructions for drawing the bubbles as annotations. Finally, add in the instructions needed to draw the legend, and plot that as an annotation data set when drawing the map. 9

REFERENCES AND RECOMMENDED READING Allison, R. 2012. SAS/GRAPH : BEYOND THE BASICS. Cary, NC: SAS Institute Inc. SAS Knowledge Base / Papers. TS-688 Defining Colors using Hex Values. Available at: http://support.sas.com/techsup/technote/ts688/ts688.html SAS Institute Inc 2012. SAS/GRAPH 9.3: Reference, Third Edition. Cary, NC: SAS Institute Inc. SAS Institute Inc 2015. The SAS Training Post Blog. Cary, NC: SAS Institute Inc. Available at: http://blogs.sas.com/content/sastraining/ Zdeb, M. and R. Allison. Stretching the Bounds of SAS/Graph Software. Proceedings of the 30 th Annual SAS Users Group International Conference, Philadelphia, PA. April 2005. Available at: http://www2.sas.com/proceedings/sugi30/137-30.pdf CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Caroline Walker Warren Rogers Associates 401-846-4747 x180 cwalker@warrenrogers.com www.warrenrogers.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 10