Visualizing Crime in San Francisco During the 2014 World Series

Visualizing Crime in San Francisco During the 2014 World Series John Semerdjian and Bill Chambers Description The scope and focus of our project evolved over the course of its lifetime. Our goals were to continue building experience with d3.js and to create a standalone data driven web application. We envisioned a tool that would allow for creating different kinds of comparisons using spatial data. This led us to explore the crime incident dataset published by the city of San Francisco. Initially, we liked the idea of a tool that would allow a user to draw a geographic area and quickly analyze crime trends against a normalized crime rate. The front end of our first prototype was built with d3.js and leaflet.js, which leveraged our RESTful API built with Flask. The UI allowed a user to create and analyze data for events of their choosing.

Figure 0: First version of our visualization/tool. Users draw a polygon on a map, define specific dates, and analyze data. Given all the possible types of analyses one may conduct, we had difficulty deciding upon the specific types of interactions we wanted to support. We eventually pivoted our project to focus on just a single set of events: all seven games of the 2014 World Series between the San Francisco Giants and Kansas City Royals. Shifting the focus of our project from analysis tool to a story driven visualization simplified our project tremendously. We felt that a single narrative was more conducive to a visualization, but we still wanted to preserve some aspects of our original idea of an analysis tool. The data itself had lacked context around the incidents and only offered a few fields that merited deeper analysis. For example, we knew the time and day and incident was reported, but we didn t know how many individuals were or the severity of the incident. We were still able to represent the same data in various ways based on visualization concepts from the course. We started our visualization with references to news reports of crime during the World Series. We used a simple map along with markers indicating points of interest and a short narrative. Figure 1: Story Introduction We presented the area of our analysis, and wanted to show a high level comparison of the different types of crime that came up during the world Series. After experimenting with some different graph types we selected a spider graph.

Figure 2: Radar chart of crime types for each World Series game. The radar chart seemed the most conducive to comparative analysis of different games. By hovering over the legend or the chart area, a user could see all of the general shapes of each game while still being able to select and focus on one specific game. We then used a small multiple visualization with total crimes on the y axis and hour of day as the x axis. This allowed the user to compare points in time across every game, simply by hovering over a point in time. Additionally, we provided a table for the user to reference characteristics of the games. Scanning the visualization horizontally shows how different Game 7 is from the other games. The significant rise in crime was something that many of our users noted. They suggested we include a vertical line in the graphs to highlight the start and end of each game. Unfortunately, we weren t able to include it in our final version.

Figure 3: Small multiples total crimes per hour Our visualization ends with our exploratory dashboard. We were able to preserve this aspect of our initial design. It brings out a lot of interactivity and opportunities for further exploration. After reading our short analysis, users quickly analyzed the data for themselves. Frequently, users started by filtering by crime types. Vandalism and assault were usually the first filters that people selected given the references from the new reports we linked to. Combining the filters of crime type with specific games also yielded interesting insights for our users. Users can jump between specific regions of the map and combine multiple filters in order to test multiple hypotheses. Figure 4: Interactive dashboard

Related Work San Francisco crime data has been analyzed and visualized by numerous individuals and groups. After some research, we discovered that most visualization merely reduce the data to points on a map, choropleths, or aggregate statistics for the entire city. We provide a few examples of these visualizations below. Trulia created a heatmap and provided aggregate crime statistics. This approach doesn t lend itself to user driven analysis as it s mostly just summary statistics. Figure 5: Trula crime heatmap Crimespotting improves their crime visualization by providing the user additional tools for exploration. Users can filter the data by time, day/week, and type. However, it s not designed for comparing specific dates/times with each other, nor do they provide any aggregate level data. Lastly, the user doesn t have the ability to define a specific geographic region of interest. Figure 6: Crimespotting

Some experimental crime data visualizations express the crime rate through 3D representations of elevation. The shadows cast by the high crime rates can evoke an emotional response that a bar chart or heatmap can t. This, although informative, didn t allow for the exploration that we were looking for. Additionally there is no interactivity for people to better understand Figure 7: 3D crime visualization The 3D visualization also has some serious drawbacks. Despite the small multiples, it s still very difficult to compare the peaks by area and category given the view. It also lacks the ability to filter by time. Data San Francisco Crime Data supplied by the San Francisco Local Government. Tools We relied heavily on d3.js and dc.js, a high level JavaScript visualization library that leverages the quick tabular filtering capabilities of crossfilter.js and d3.js. We used dc.js for our dashboard visualization. We used Leaflet and Leaflet.draw for our mapping and drawing polygons. The first version of our visualization used an API that returned GeoJSON which we built using Flask, PostgreSQL and PostGIS. Steps The first step was to understand the data. We did a quick exploratory analysis, created some plots, but set aside all data accuracy issues for this assignment. We imported the data into our database, and then developed the API. We created a front end that read data from the API after the user created a polygon on our map. We started designing the front end on paper, but after iterating through several versions, we scraped our tool and began developing a story around the data. There were a few stories we initially considered. Bay to Breakers, Outside Lands, the World Series, etc. News reports of violence after Game 7 of the 2014 World Series gave us something compelling to investigate, so we started drawing potential prototypes. We borrowed the code for our dashboard from our first visualization and added other

visualizations. Over the course of this process we solicited feedback from our classmates and refined the story and the interactions of our visualization. Results & Feedback We worked with some of our peers to make sure our message was getting across clearly. People seemed to understand our thought process and also offered several improvements. We struggled with the complexity associated with building a general purpose tool, even if it was for a specific dataset. This iterative process occupied much of our time early on. At times, we received mixed feedback on the types of features users would like to see on our tool. There was some consensus around adding a baseline comparison features (e.g. compare Game 7 to the average number of crimes committed on a Wednesday in 2014). However, we had already shifted away from a World Series vs. San Francisco visualization to comparing World Series games with each other. The overall story of the visualization focuses on the progression of each game, between wins and losses, all the way to the climactic ending. Adding regional baseline data would certainly add more context, but we feld that it would potentially distract the user from our story. If we continued working on our general purpose tool, the need for a baseline comparison feature would be far greater. Demo and Thumbnail A live demo is available at: http://sfcrime.github.io/sfc final/ Software Created All of the software we created is on the GitHub organization that we created which can be found at: https://github.com/sfcrime There are three repositories, one for the final version of the visualization for the World Series that we created, one for the original viewer code that we used during the tool building phase, and lastly one for the database that we created that stores all the crime data.

Work Distribution Work Person Description Database John Set up the database using Postgres and Postgis, created the crime table, and demonstrating the ability to run spatial queries Bill Added a separate table for user generated events, which did not make its way into the final visualization API John Created the first draft of the API using Flask, which returned JSON of aggregate statistics based on the longitude and latitude of a spatial query First draft visualization Copy and layout Bill John Bill John Bill Fleshed out the API to return GeoJSON, made it production ready Worked on dashboard page using leaflet.js and dc.js Created the event creation page and functionality for returning data from the API using Backbone.js. Edited copy and additional details/styling Visualization layout using scrolling, first draft of copy Radar graph Bill Developed radar chart Small multiple John Modified existing code to fit our dataset Dashboard John Munged data using Python to create dashboard; Used dc.js to create graphs Bill Added clustered map feature to improve map readability; improved dashboard and styling.

Appendix