Visualizing Crime in San Francisco During the 2014 World Series

Similar documents
Data Analyst Nanodegree Syllabus

Geobarra.org: A system for browsing and contextualizing data from the American Recovery and Reinvestment Act of 2009

Good enough to great: A quick guide for better data visualizations

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

What s New in Spotfire DXP 1.1. Spotfire Product Management January 2007

Data Analyst Nanodegree Syllabus

MiPhone Phone Usage Tracking

NewsConnect is a web application that allows users to see global connections in news, and makes it easy to find only the news you want to read.

The Observatory Tool Dashboard Guide

Uber Push and Subscribe Database

Visual System Implementation

Scottish Improvement Skills

CPSC 444 Project Milestone III: Prototyping & Experiment Design Feb 6, 2018

DKAN. Data Warehousing, Visualization, and Mapping

Learning Objectives for Data Concept and Visualization

Built for Speed: Comparing Panoply and Amazon Redshift Rendering Performance Utilizing Tableau Visualizations

RethinkDB. Niharika Vithala, Deepan Sekar, Aidan Pace, and Chang Xu

Product Requirements for Data Dwarf. Revisions

Eurostat Regions and Cities Illustrated: Usage guide

ODK Tables Maps. Christopher Gelon

Transit Signal Priority on the California 1 Bus Line

ATRIS User Guide. Table of Contents

CPU DB Data Visualization Senior Project Report

Proposal: Judicial Case Law History Timeline viewer CPSC 547

Product Manager Visualization Final Report

Chapter 2: Understanding Data Distributions with Tables and Graphs

SAVI Advanced The Basics

Visual Analytics Tools for the Global Change Assessment Model. Ross Maciejewski Arizona State University

Lehigh Walking Wizard Final Report Steven Costa & Zhi Huang

TrajAnalytics: A software system for visual analysis of urban trajectory data

User-Centered Design. Jeff Bos, Design Insights BlackBerry

Implementing ITIL v3 Service Lifecycle

Automated Testing of Tableau Dashboards

Migrating from ArcIMS to ArcGIS Server Atlanta Regional Commission, Atlanta, GA. Brock Kingston Latitude Geographics

D&B Market Insight Release Notes. July 2016

The Process of Interaction Design DECO1200

BETA DEMO SCENARIO - ATTRITION IBM Corporation

Analysis Tool Project

Data 100. Lecture 5: Data Cleaning & Exploratory Data Analysis

PYRAMID April 2018 Release

Analysing crime data in Maps for Office and ArcGIS Online

DATA ANALYTICS BOOT CAMP

SQL, Scaling, and What s Unique About PostgreSQL

DSC 201: Data Analysis & Visualization

PYRAMID Headline Features. April 2018 Release

Concept Production. S ol Choi, Hua Fan, Tuyen Truong HCDE 411

Visualization of EU Funding Programmes

Extension Web Publishing 3 Lecture # 1. Chapter 6 Site Types and Architectures

MIS0855: Data Science In-Class Exercise for Mar Creating Interactive Dashboards

1. Start ArcMap by going to the Start menu > All Programs > ArcGIS > ArcMap.

Esri and MarkLogic: Location Analytics, Multi-Model Data

Excel Tutorial 4: Analyzing and Charting Financial Data

FLORIDA DEPARTMENT OF TRANSPORTATION PRODUCTION BIG DATA PLATFORM

Create Awesomeness: Use Custom Visualizations to Extend SAS Visual Analytics to Get the Results You Need

MicroStrategy Academic Program

THE DATA ANALYTICS BOOT CAMP

Seamless Dynamic Web (and Smart Device!) Reporting with SAS D.J. Penix, Pinnacle Solutions, Indianapolis, IN

Ovation Process Historian

INDEX UNIT 4 PPT SLIDES

UCF DATA ANALYTICS AND VISUALIZATION BOOT CAMP

Texas Death Row. Last Statements. Data Warehousing and Data Mart. By Group 16. Irving Rodriguez Joseph Lai Joe Martinez

SECURITY AUTOMATION BEST PRACTICES. A Guide on Making Your Security Team Successful with Automation SECURITY AUTOMATION BEST PRACTICES - 1

We turn hard problems into great software.

Rediscover Charts IN THIS CHAPTER NOTE. Inserting Excel Charts into PowerPoint. Getting Inside a Chart. Understanding Chart Layouts

16 Data Visualizations. to Improve Your Application

Outline. The Collaborative Research Platform for Data Curation and Repositories: CKAN For ANGIS Data Portal. Open Access & Open Data.

Data 100 Lecture 5: Data Cleaning & Exploratory Data Analysis

PROJECT REPORT. TweetMine Twitter Sentiment Analysis Tool KRZYSZTOF OBLAK C

Process Book. Website Github Repo. By Claudia Huang, Raul Jordan and Jacques van Rhyn

Optimize Online Testing for Site Optimization: 101. White Paper. White Paper Webtrends 2014 Webtrends, Inc. All Rights Reserved

CS Equalizing Society - Assignment 8. Interactive Hi-fi Prototype

The main website for Henrico County, henrico.us, received a complete visual and structural

Cognitive Walkthrough Evaluation

SAS Visual Analytics 8.2: What s New in Reporting?

Google FusionTables for Global Health User Manual

Blitz: Automating Interaction in Visualization

MGMT 3125 Introduction to Data Visualization

SOFTWARE REQUIREMENTS ENGINEERING LECTURE # 7 TEAM SKILL 2: UNDERSTANDING USER AND STAKEHOLDER NEEDS REQUIREMENT ELICITATION TECHNIQUES-IV

SECURITY AUTOMATION BEST PRACTICES. A Guide to Making Your Security Team Successful with Automation

Dynamic Aggregation to Support Pattern Discovery: A case study with web logs

Viságe.BIT. An OLAP/Data Warehouse solution for multi-valued databases

MIS 0855 Data Science (Section 006) Fall 2017 In-Class Exercise (Day 15) Creating Interactive Dashboards

Chapter 2 Example Modeling and Forecasting Scenario

Spreadsheet and Graphing Exercise Biology 210 Introduction to Research

EXCEL DASHBOARD AND REPORTS BASIC SKILLS

Media Transparency - An Interactive Visualization of Advertisement Money Flows between Governmental Organizations and Austrian Media Companies

USER EXPERIENCE DESIGN GA.CO/UXD

Operations Dashboard for ArcGIS Monitoring GIS Operations. Michele Lundeen Esri

SobekCM Digital Repository : A Retrospective

Integrated Clinical Systems, Inc. announces JReview 13.1 with new AE Incidence

SuperStream speeds up time to market for new product by 25%

Participants. Results & Recommendations. Summary of Findings from User Study Round 3. Overall. Dashboard

MicroStrategy Desktop

GPS/GIS Activities Summary

1. I NEED TO HAVE MULTIPLE VERSIONS OF VISUAL STUDIO INSTALLED IF I M MAINTAINING APPLICATIONS THAT RUN ON MORE THAN ONE VERSION OF THE.

Introduction to Health Informatics

Qlik s Associative Model

METIER Course n februray Introduction to ArcView 3

Real-Time & Big Data GIS: Leveraging the spatiotemporal big data store

Market Insight Excelsior 2 Module Training Manual v2.0

Transcription:

Visualizing Crime in San Francisco During the 2014 World Series John Semerdjian and Bill Chambers Description The scope and focus of our project evolved over the course of its lifetime. Our goals were to continue building experience with d3.js and to create a standalone data driven web application. We envisioned a tool that would allow for creating different kinds of comparisons using spatial data. This led us to explore the crime incident dataset published by the city of San Francisco. Initially, we liked the idea of a tool that would allow a user to draw a geographic area and quickly analyze crime trends against a normalized crime rate. The front end of our first prototype was built with d3.js and leaflet.js, which leveraged our RESTful API built with Flask. The UI allowed a user to create and analyze data for events of their choosing.

Figure 0: First version of our visualization/tool. Users draw a polygon on a map, define specific dates, and analyze data. Given all the possible types of analyses one may conduct, we had difficulty deciding upon the specific types of interactions we wanted to support. We eventually pivoted our project to focus on just a single set of events: all seven games of the 2014 World Series between the San Francisco Giants and Kansas City Royals. Shifting the focus of our project from analysis tool to a story driven visualization simplified our project tremendously. We felt that a single narrative was more conducive to a visualization, but we still wanted to preserve some aspects of our original idea of an analysis tool. The data itself had lacked context around the incidents and only offered a few fields that merited deeper analysis. For example, we knew the time and day and incident was reported, but we didn t know how many individuals were or the severity of the incident. We were still able to represent the same data in various ways based on visualization concepts from the course. We started our visualization with references to news reports of crime during the World Series. We used a simple map along with markers indicating points of interest and a short narrative. Figure 1: Story Introduction We presented the area of our analysis, and wanted to show a high level comparison of the different types of crime that came up during the world Series. After experimenting with some different graph types we selected a spider graph.

Figure 2: Radar chart of crime types for each World Series game. The radar chart seemed the most conducive to comparative analysis of different games. By hovering over the legend or the chart area, a user could see all of the general shapes of each game while still being able to select and focus on one specific game. We then used a small multiple visualization with total crimes on the y axis and hour of day as the x axis. This allowed the user to compare points in time across every game, simply by hovering over a point in time. Additionally, we provided a table for the user to reference characteristics of the games. Scanning the visualization horizontally shows how different Game 7 is from the other games. The significant rise in crime was something that many of our users noted. They suggested we include a vertical line in the graphs to highlight the start and end of each game. Unfortunately, we weren t able to include it in our final version.

Figure 3: Small multiples total crimes per hour Our visualization ends with our exploratory dashboard. We were able to preserve this aspect of our initial design. It brings out a lot of interactivity and opportunities for further exploration. After reading our short analysis, users quickly analyzed the data for themselves. Frequently, users started by filtering by crime types. Vandalism and assault were usually the first filters that people selected given the references from the new reports we linked to. Combining the filters of crime type with specific games also yielded interesting insights for our users. Users can jump between specific regions of the map and combine multiple filters in order to test multiple hypotheses. Figure 4: Interactive dashboard

Related Work San Francisco crime data has been analyzed and visualized by numerous individuals and groups. After some research, we discovered that most visualization merely reduce the data to points on a map, choropleths, or aggregate statistics for the entire city. We provide a few examples of these visualizations below. Trulia created a heatmap and provided aggregate crime statistics. This approach doesn t lend itself to user driven analysis as it s mostly just summary statistics. Figure 5: Trula crime heatmap Crimespotting improves their crime visualization by providing the user additional tools for exploration. Users can filter the data by time, day/week, and type. However, it s not designed for comparing specific dates/times with each other, nor do they provide any aggregate level data. Lastly, the user doesn t have the ability to define a specific geographic region of interest. Figure 6: Crimespotting

Some experimental crime data visualizations express the crime rate through 3D representations of elevation. The shadows cast by the high crime rates can evoke an emotional response that a bar chart or heatmap can t. This, although informative, didn t allow for the exploration that we were looking for. Additionally there is no interactivity for people to better understand Figure 7: 3D crime visualization The 3D visualization also has some serious drawbacks. Despite the small multiples, it s still very difficult to compare the peaks by area and category given the view. It also lacks the ability to filter by time. Data San Francisco Crime Data supplied by the San Francisco Local Government. Tools We relied heavily on d3.js and dc.js, a high level JavaScript visualization library that leverages the quick tabular filtering capabilities of crossfilter.js and d3.js. We used dc.js for our dashboard visualization. We used Leaflet and Leaflet.draw for our mapping and drawing polygons. The first version of our visualization used an API that returned GeoJSON which we built using Flask, PostgreSQL and PostGIS. Steps The first step was to understand the data. We did a quick exploratory analysis, created some plots, but set aside all data accuracy issues for this assignment. We imported the data into our database, and then developed the API. We created a front end that read data from the API after the user created a polygon on our map. We started designing the front end on paper, but after iterating through several versions, we scraped our tool and began developing a story around the data. There were a few stories we initially considered. Bay to Breakers, Outside Lands, the World Series, etc. News reports of violence after Game 7 of the 2014 World Series gave us something compelling to investigate, so we started drawing potential prototypes. We borrowed the code for our dashboard from our first visualization and added other

visualizations. Over the course of this process we solicited feedback from our classmates and refined the story and the interactions of our visualization. Results & Feedback We worked with some of our peers to make sure our message was getting across clearly. People seemed to understand our thought process and also offered several improvements. We struggled with the complexity associated with building a general purpose tool, even if it was for a specific dataset. This iterative process occupied much of our time early on. At times, we received mixed feedback on the types of features users would like to see on our tool. There was some consensus around adding a baseline comparison features (e.g. compare Game 7 to the average number of crimes committed on a Wednesday in 2014). However, we had already shifted away from a World Series vs. San Francisco visualization to comparing World Series games with each other. The overall story of the visualization focuses on the progression of each game, between wins and losses, all the way to the climactic ending. Adding regional baseline data would certainly add more context, but we feld that it would potentially distract the user from our story. If we continued working on our general purpose tool, the need for a baseline comparison feature would be far greater. Demo and Thumbnail A live demo is available at: http://sfcrime.github.io/sfc final/ Software Created All of the software we created is on the GitHub organization that we created which can be found at: https://github.com/sfcrime There are three repositories, one for the final version of the visualization for the World Series that we created, one for the original viewer code that we used during the tool building phase, and lastly one for the database that we created that stores all the crime data.

Work Distribution Work Person Description Database John Set up the database using Postgres and Postgis, created the crime table, and demonstrating the ability to run spatial queries Bill Added a separate table for user generated events, which did not make its way into the final visualization API John Created the first draft of the API using Flask, which returned JSON of aggregate statistics based on the longitude and latitude of a spatial query First draft visualization Copy and layout Bill John Bill John Bill Fleshed out the API to return GeoJSON, made it production ready Worked on dashboard page using leaflet.js and dc.js Created the event creation page and functionality for returning data from the API using Backbone.js. Edited copy and additional details/styling Visualization layout using scrolling, first draft of copy Radar graph Bill Developed radar chart Small multiple John Modified existing code to fit our dataset Dashboard John Munged data using Python to create dashboard; Used dc.js to create graphs Bill Added clustered map feature to improve map readability; improved dashboard and styling.

Appendix