Introduction to district compactness using QGIS Mira Bernstein, Metric Geometry and Gerrymandering Group Designed for MIT Day of Engagement, April 18, 2017 1) First things first: before the session Download QGIS (an open-source Geographic Information System): http://www.qgis.org/en/site/forusers/download.html Install QGIS. It will ask if you want to install the NC, SD, and AK sample data files; say no. Download some census data (.zip files): https://www.census.gov/geo/maps-data/data/cbf/cbf_cds.html (1st and 3rd files only) https://www.census.gov/geo/maps-data/data/cbf/cbf_ua.html Extract the three ZIP-files to three folders and remember where you put them (so you can access them!) 2) Launch QGIS and load the data The most common way to store spatial data in any GIS (including QGIS) is as vector data, with points, lines, and shapes defined in terms of (x,y) coordinates. QGIS lets you to overlay several layers on top of each other and decide how you want to display the data from each layer. Open QGIS Desktop ( not QGIS Browser ) Click on the add vector layer icon at the top of the left-hand toolbar: Browse to one of the folders you got from the zip files. You ll see a bunch of files with the same 1 name. Select the one with type SHP file, with extension.shp, then click open. You will see the name of the file appear in the layers panel at the bottom left of your QGIS window. Repeat with the other two folders, so that you now have three layers in your GIS file. Save your work by choosing Project > Save as from the top menu. Keep saving regularly as you work through this tutorial. (QGIS is great, but does crash once in a while.) 3) Explore what you ve got Move : You can move the map around by clicking and dragging in the usual way. You can center it by clicking on the point you want to place at the center. Zoom : 2 You can zoom by pressing the zoom icons in the toolbar at the top. If you ever lose your place and find yourself staring at empty space, click on the button at the top. This will center the map and zoom out to the full layer. To go back to moving the map around, click on the button. 1 There will be another file in the folder with extension.shp, of type XML file. If QGIS is giving you error messages, be sure you are getting the right file. SHP stands for shapefile ; this is a standard format that works with all GIS software, not just QGIS.. 2 I suggest not touching the scale or magnifier on the toolbar at the bottom of the map.
Reorder and deactivate layers: Two of the layers you have loaded (cb_2016_us_cd115_***) are maps of US 3 congressional districts at different resolutions (1:20,000,000 and 1:500,000). To a first approximation, these look very similar and cover each other up completely. The third layer (cb_2016_us_ua10_500k) has maps of US urban areas. Depending on the order in which you loaded the data, this layer may be at the top and visible as a bunch of splotches on the map; or it may be below the other layers and invisible. If it s invisible, drag it to the top of the list in the layers panel at the bottom left. You know you ve done it right if you can see the splotches. If you ever want to make a layer invisible, just uncheck the box next to it. Try it now. Identify specific cities ( optional ): This map shows all US urban areas with population at least 500,000. If you re curious which city a particular splotch corresponds to Click on the Identify features icon: Click on an urban area of your choice and you ll see its name (and other data) displayed. (You can also ask QGIS to label all or some of the cities; we ll see later how to do this.) Why does the map look weird? ( optional ): Every map is a projection from a sphere (the Earth) to a plane. There are many possible projections, each with its pros and cons. Every shapefile comes equipped with a default projection (aka CRS, coordinate reference system ), and the US Census Bureau, for whatever reason, doesn t use the standard Mercator projection that we re all used to. If you want a more familiar-looking map: Click on the current CRS button in the bottom right corner: Be sure the box for Enable On the Fly CRS transformation is checked. Under Filter, enter Mercator, to find all CRS s with the word Mercator in their title. Scroll to the top of the list below and click on Popular visualization CRS/Mercator. The map might look better to you, but fundamentally nothing has changed. QGIS still knows the actual latitude and longitude coordinates of every point and does all its calculations (area, perimeter, etc) on the sphere. Note that as you move around on map, the scale on the bottom toolbar changes. That s because the Mercator projection is not area-preserving. (E.g. Alaska, being far north, looks much larger than it should.) 4) Let s make the map display what we want and how we want it! Everything we do in this section is cosmetic, but it s still very important. The whole point of GIS is to allow you to access data visually The cb_2016_us_ua10_500k layer: We want this layer at the top, so that it s visible, but we don t want the cities to cover up the district boundaries. The solution is to make the layer partly transparent: Double click on the layer name in the layer panel to open the Layer properties dialog box. 3 cd115 in the file name stands for congressional district for the 115th Congress
On the left sidebar of the dialog box, go to Style. This is where you go whenever you want to make a layer look different -- changes colors, outlines, etc. At the bottom of the box, you ll see the header Layer rendering and a slider for layer transparency. Slide it about halfway to the right, to about 40 or 50. (If you slide it all the way to 100, the layer will be completely invisible.) While we re here, let s also change the color of the urban splotches to be minimally distracting. In the fill box, double-click on simple fill. Some new fields will appear below: Fill, Outline, Fill style, and Outline style. Change Fill to black. (Because of the transparency, it will look gray.) Change Outline style to no pen. (We don t need an outline for the urban areas -- the outlines we care about are the boundaries of the congressional districts) Click OK. Hopefully the urban blotches are still visible, but you can also see the district boundaries that go through them. Note that urban areas often lie in small congressional districts, just as you would expect. The cb_2016_us_cd115_500k layer: All we want from this layer are the district boundaries, so that we can compare them to the boundaries in the..._cd115_20m layer. Move this layer to be second in the list in the Layer panel -- below the urban area layer but above..._cd115_20m. As before, open the Layer properties dialog box (by double-clicking on the layer in the layer panel ) and go to Style. Double-click on simple fill. This time change Outline to red and change Fill style to No brush (i.e. transparent). Click OK. Zoom to a twisty district boundary or coastline and compare the red outline (from this layer) to the black outline (from the 20m layer). Since this layer is at a higher resolution, it has a lot more detail. The cb_2016_us_cd115_20m layer: At this point our map is looking pretty good. But wouldn t it be nice if we could color the states different colors, as we usually do in maps? As usual, double click on the layer and go to Style. At the very top of the dialog box, switch from Single symbol to Categorized. Under Column, choose STATEFP. (That s the field in the data table that tells you which state each district is in.) Under Color ramp, select Random colors Below the next window, click Classify, and you ll see a lot of colors appear. Click OK. What a nice-looking map! Don t forget to save it. 5) Computing the Polsby-Popper compactness score To make the map less cluttered for this step, I suggest unchecking the urban area layer to make it invisible. Select the cd115_500k layer by clicking (not double-clicking) on it in the Layer Panel. Click the Open attribute table button on the toolbar at the top of the main QGIS window:
The attribute table has all the non-spatial data associated with each congressional district: its state, its number, its area, etc. We will now add one more attribute: the district s Polsby-Popper compactness score. Click on the Edit button at the top left of the attribute table: Click on the Open field calculator button to create your new field (= column): In the dialog box that opens, set Output field name to PP_500k, and leave Output field type as integer. (The PP score is a real number from 0 to 100, but we might as well round it to the nearest integer.) In the box at the bottom right, type the PP formula: 400*pi()*$area/($perimeter^2) 4 (Note that QGIS computes the area and perimeter of each district for you! ) Click OK. Your new column should now appear as the last column in the table. If you did something wrong and want to delete a column, click on. When you re done editing the attribute table, click on again and save your changes. (If you forget to do this before closing the table, your map will suddenly become covered with red X s! If that happens, just reopen the attribute table and save it.) Let s display the PP_500k score for each district: Double-click on the cd115_500k layer, then click on Labels in the dialog box menu. At the top, change the drop-down from No labels to Show labels for this layer. For Label with, choose PP_500k. Under Placement on bottom menu, check force point inside polygon. Click OK. Do the compactness scores match your intuition? Higher scores should correspond to nicer districts. If you see a district with a lower score than you expect, zoom to the boundary: there may be something driving up the perimeter (like a winding river?) that you can only see if you look closely. 6) Comparing the Polsby-Popper score at different resolutions One problem with compactness measures that rely on perimeter is that many natural boundaries, like coastlines and rivers, have fractal properties: their length can vary widely depending on the scale on which you measure it. We ll see this by comparing the Polsby-Popper scores for congressional districts at two different resolutions. Do the same thing as in Section 5 to create the Polsby-Popper score for the cd115_20m layer; call your new field PP_20m. To compare the two scores, we need to have them in the same table. We transfer PP_500k to 5 the attribute table of the..._cd115_20m layer, using the Join command : Save your file!!! The join command can be buggy. 4 Most likely, the area computed by QGIS will be completely different than what s given in the ALAND field. For most CRS s, QGIS will be computing the area in terms of latitude and longitude (i.e. spherical angles), not square meters. But that s OK, because the perimeter will also be in those units, and PP itself is dimensionless. 5 Don t try to go the other way -- transferring P_20m to the attribute table of cd115_500k; for me, this caused QGIS to stall and crash. This is because the 500k table has more entries than the 20m table, all for US territories. A silly bug!
Double click on the cd115_20m layer, and choose Join from the menu on the left. Click on the green + at the bottom left to create a new join. The join layer (from which you want to import an attribute) is..._cd115_500k. The join field and target field are both GEOID. (This is the unique identifier for each district; it tells QGIS which rows to match up in the two tables.) Check the box next to Choose which fields are joined and check the field PP_500k. Check the box next to Custom_field_name_prefix and erase all the text. (Since we gave the PP fields in the two layers different names, we don t need a prefix.) Click OK in the two open dialog boxes. Open the attribute table for cd115_20m and check that both PP_20m and PP_500k are there. Which one is bigger? Why? Using the button as before, create a new field in the attribute table of cd115_20m to hold the ratio PP_20m / PP_500k. This time, make the output type Decimal number (real) and change precision to 2 (i.e. two digits after the decimal point). Call your field something like PP_RATIO. Let s display the PP_RATIO for those districts where it exceeds some threshold -- say 1.5: First, remove the labels for the cd115_500k layer: double-click on the layer, go to Labels, and change the top drop-down to No labels Now double-click on the cd115_20m layer, go to labels, and from the drop-down at the top choose Rule-based labeling. Click on the green + at the bottom to create a labeling rule. In the dialog that appears: For Filter, put in: PP_RATIO > 1.5 (variable name should be in quotes) For Label with, choose PP_RATIO Under Placement, check the Force point inside polygon box. Click OK in both open dialogs. Take a look at the districts with high PP_RATIO. The very highest values are on the coasts, where the 20m map misses all sorts of features. (Check out, for instance, the coast of SC, where the 20m map includes a huge amount of water!) But there are also lots of districts inland that are screwed up by a meandering river on the state boundary. The point here is not that 500k is high enough resolution while 20m is not. The point is that if we went down to 50K or 10K, we might get yet a different score for some of these districts. We don t want changes of resolution to make such a big difference in the compactness score. 7) More things to try Instead of (or in addition to) displaying PP_RATIO or PP_500k as numerical labels, we can color the districts in a gradient, based on the values of those fields. Double-click on a layer, go to Style, and change the top drop-down to Graduated. See if you can make a map where problematic districts (high RR_RATIO or low PP_500k) are darker while unproblematic districts are white. Try creating a new field for the convex hull measure of compactness: area($geometry)/area(convex_hull($geometry)) (Here $geometry just stands for the current district ). Compare this measure to PP. Does it flag the same bad districts? Is the measure as sensitive as PP to the resolution of the map?