Publishing CitiSense Data: Privacy Concerns and Remedies

Similar documents
Time Distortion Anonymization for the Publication of Mobility Data with High Utility

Mobility Data Management & Exploration

Publishing CitiSense Data: Privacy Concerns and Remedies

Contents. Part I Setting the Scene

Fosca Giannotti et al,.

DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li

DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li

APPLICATION OF AERIAL VIDEO FOR TRAFFIC FLOW MONITORING AND MANAGEMENT

Privacy-Preserving of Check-in Services in MSNS Based on a Bit Matrix

Database and Knowledge-Base Systems: Data Mining. Martin Ester

TrajStore: an Adaptive Storage System for Very Large Trajectory Data Sets

Detecting Anomalous Trajectories and Traffic Services

Evaluation of Privacy Preserving Algorithms Using Traffic Knowledge Based Adversary Models

Public Sensing Using Your Mobile Phone for Crowd Sourcing

A Geometric Stack for Location-Aware Networking. Marco Gruteser, Rich Martin WINLAB, Rutgers University

Scalable Selective Traffic Congestion Notification

Spatial Outlier Detection

Filtering and Enhancing Images

Trajectory Data Mining: An Overview

Schedule-Driven Coordination for Real-Time Traffic Control

Evaluation of Seed Selection Strategies for Vehicle to Vehicle Epidemic Information Dissemination

OSM-SVG Converting for Open Road Simulator

ACTIVITY IDENTIFICATION FROM ANIMAL GPS TRACKS WITH SPATIAL TEMPORAL CLUSTERING METHOD DDB-SMOT

Introduction to Trajectory Clustering. By YONGLI ZHANG

Mobility Data Mining. Mobility data Analysis Foundations

Spatio-temporal Range Searching Over Compressed Kinetic Sensor Data. Sorelle A. Friedler Google Joint work with David M. Mount

Large-Scale Flight Phase identification from ADS-B Data Using Machine Learning Methods

CRF Based Point Cloud Segmentation Jonathan Nation

Trip Reconstruction and Transportation Mode Extraction on Low Data Rate GPS Data from Mobile Phone

Differentially Private Multi- Dimensional Time Series Release for Traffic Monitoring

Visual Traffic Jam Analysis based on Trajectory Data

Privacy Protected Spatial Query Processing

A System for Discovering Regions of Interest from Trajectory Data

Solutions. Location-Based Services (LBS) Problem Statement. PIR Overview. Spatial K-Anonymity

Important issues. Query the Sensor Network. Challenges. Challenges. In-network network data aggregation. Distributed In-network network Storage

A Framework for Trajectory Data Preprocessing for Data Mining

Data Mining. Jeff M. Phillips. January 7, 2019 CS 5140 / CS 6140

An algorithm for Trajectories Classification

Clustering Part 4 DBSCAN

Code No: R Set No. 1

CrowdPath: A Framework for Next Generation Routing Services using Volunteered Geographic Information

Surface Creation & Analysis with 3D Analyst

Network Traffic Measurements and Analysis

Research on Recognition and Classification of Moving Objects in Mixed Traffic Based on Video Detection

Understanding Tracking and StroMotion of Soccer Ball

A STUDY ON CURRENT TRENDS IN VEHICULAR AD HOC NETWORKS

CS 229: Machine Learning Final Report Identifying Driving Behavior from Data

Efficient Orienteering-Route Search over Uncertain Spatial Datasets

IEEE networking projects

University of Florida CISE department Gator Engineering. Clustering Part 5

A Review on Privacy Preserving Data Mining Approaches

Introduction to Indexing R-trees. Hong Kong University of Science and Technology

Activity-Based Human Mobility Patterns Inferred from Mobile Phone Data: A Case Study of Singapore

Detection of Missing Values from Big Data of Self Adaptive Energy Systems

CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008

Trajectory Compression under Network constraints

Horizontal Aggregations for Mining Relational Databases

Tuning an Algorithm for Identifying and Tracking Cells

A method for depth-based hand tracing

Viscous Fingers: A topological Visual Analytic Approach

Practical Use of ADUS for Real- Time Routing and Travel Time Prediction

M Thulasi 2 Student ( M. Tech-CSE), S V Engineering College for Women, (Affiliated to JNTU Anantapur) Tirupati, A.P, India

Void main Technologies

Challenges in Ubiquitous Data Mining

Analyzing Dshield Logs Using Fully Automatic Cross-Associations

Learning the Three Factors of a Non-overlapping Multi-camera Network Topology

Location Based Advertising and Location k- Anonymity

Who Cares about Others Privacy: Personalized Anonymization of Moving Object Trajectories

University of Florida CISE department Gator Engineering. Clustering Part 4

A New Perspective On Trajectory Compression Techniques

A Framework for Mobility Pattern Mining and Privacy- Aware Querying of Trajectory Data

TRAJECTORY PATTERN MINING

CHAPTER 7 INTEGRATION OF CLUSTERING AND ASSOCIATION RULE MINING TO MINE CUSTOMER DATA AND PREDICT SALES

Forecasting of Road Traffic Congestion Using Weighted Density Variation Algorithm

Using GPS-enabled Cell Phones to Improve Multimodal Planning and Facilitate Travel Behavior Change

DETECTION AND ROBUST ESTIMATION OF CYLINDER FEATURES IN POINT CLOUDS INTRODUCTION

Privacy-Preserving Assessment of Location Data Trustworthiness

European Network on New Sensing Technologies for Air Pollution Control and Environmental Sustainability - EuNetAir COST Action TD1105

Mobility Models. Larissa Marinho Eglem de Oliveira. May 26th CMPE 257 Wireless Networks. (UCSC) May / 50

TRAFFIC INFORMATION SERVICE IN ROAD NETWORK USING MOBILE LOCATION DATA

Compression of Trajectory Data: A Comprehensive Evaluation and New Approach

Analysis of Image and Video Using Color, Texture and Shape Features for Object Identification

Defining a Better Vehicle Trajectory With GMM

Cybercasing the Joint: On the Privacy Implications of Geo-Tagging

Time Handling in Programming Language

Co-clustering for differentially private synthetic data generation

The Role of a Context Service in a System that aims at integrating the Digital with the Real World

Quadstream. Algorithms for Multi-scale Polygon Generalization. Curran Kelleher December 5, 2012

Clustering in Data Mining

Mapping Internet Sensors with Probe Response Attacks

Chapter 8: GPS Clustering and Analytics

3. Data Structures for Image Analysis L AK S H M O U. E D U

A Novel Method for Activity Place Sensing Based on Behavior Pattern Mining Using Crowdsourcing Trajectory Data

Mobile Millennium Using Smartphones as Traffic Sensors

Browsing the World in the Sensors Continuum. Franco Zambonelli. Motivations. all our everyday objects all our everyday environments

Data Structures for Moving Objects

TrajStore: an Adaptive Storage System for Very Large Trajectory Data Sets

AIAA ANERS Radar Trajectory Processing Technique for Merged Data Sources. April 21, 2017 Prepared by Bao Tong. Federal Aviation Administration

CHAPTER 5 ANT-FUZZY META HEURISTIC GENETIC SENSOR NETWORK SYSTEM FOR MULTI - SINK AGGREGATED DATA TRANSMISSION

TxDOT Video Analytics System User Manual

Transcription:

Publishing CitiSense Data: Privacy Concerns and Remedies Kapil Gupta Advisor : Prof. Bill Griswold 1

Location Based Services Great utility of location based services data traffic control, mobility management, urban planning etc. Critical to preserve privacy of the users involved Anonymity cannot be assured by simply replacing users real identifiers with pseudonyms. This project deals with these issues for CitiSense dataset 2 CSE Dept., University of California, San Diego

CitiSense: Introduction Portable pollution monitoring system Real-time air quality readings on a phone 3

CitiSense: Introduction 4

CitiSense: Objective Deliver air quality estimation to individuals and public health agencies. Understand the behavior of air pollutants within urban areas. So far so good! Where is the problem? 5

CitiSense: Data Publishing 6

Solutions? Strip the location information Utility completely lost! Add noise to location data Will hurt the utility Addition is not privacy-aware Does striping user identifiers from dataset still has privacy implications? Yes!!, underlying linear relation between temporal and spatial movement of individual. 7

Rest of the presentation: CitiSense data overview Data preprocessing Why Steps Privacy Breaches What will an attacker do What information is compromised Location Anonymization How Utility 8

CitiSense: Data Overview Spatio-temporal data moving object data, trajectory data, or mobility data Readings from 30 users over a period of five weeks(jul30 - Sep7) 21.5 million readings (data points) 7 sensors 9

CitiSense: Visualization? If 1 marker = 1 pixel 21.5M markers => whole screen covered! 10

CitiSense: Visualization Millions of point Browser crashes after 600 MB of memory First try Quality Threshold Clustering? 11

CitiSense: Preprocessing Filter outlier/noise/speed smoothing Trip Segmentation Trajectory Smoothing Trajectory Compression Error Measure for Trajectory Compression 12

CitiSense: Preprocessing Filter outlier/noise/speed smoothing Trip Segmentation Trajectory Smoothing Trajectory Compression Error Measure for Trajectory Compression 13

CitiSense: Side Note location data => earth s coordinates Mercator projection cylindrical map projection Earth s Radius =6378100m 14

CitiSense: Filtering Duplication filter Multiple sensor s leads to multiple readings with same time and location information Speed and Acceleration filter few readings indicate speed of 546km/hr with 52m/sec 2 acceleration neighboring data points need to be smoothed accordingly 15

CitiSense: Filtering Results Data points combined threshold of 30 seconds && No location change. speed limit of 150km/s and acceleration of 10m/s 2 16 CSE Dept., University of California, San Diego

17 CSE Dept., University of California, San Diego

CitiSense: Preprocessing Filter outlier/noise/speed smoothing Trip Segmentation Trajectory Smoothing Trajectory Compression Error Measure for Trajectory Compression 18

CitiSense: Trip Segmentation extract trips from Change in speed. Time gap between consecutive positions Length of the trip Values set are: 300 seconds and 100 m for and respectively. 19

CitiSense: Trip Segmentation 20 CSE Dept., University of California, San Diego

CitiSense: Preprocessing Filter outlier/noise/speed smoothing Trip Segmentation Trajectory Smoothing Trajectory Compression Error Measure for Trajectory Compression 21

CitiSense: Trajectory Smoothing smooth noise Apply Median filter. Although suffer from lag 22

CitiSense: Preprocessing Filter outlier/noise/speed smoothing Trip Segmentation Trajectory Smoothing Trajectory Compression Error Measure for Trajectory Compression 23

CitiSense: Trajectory Compression Error Measure: Euclidean distance 24

CitiSense: Trajectory Compression Synchronous Euclidean distance (SED) 25

CitiSense: Trajectory Compression Similar to line generalization problem Uniform sampling algorithm? Douglas-Peucker Curve matching Top-down time-ratio (TD-TR) GTC trajectory compression algorithm Greedy solution Uses farthest point with an approximated SED less than the given error tolerance. 26

CitiSense: Trajectory Compression 27

CitiSense: Preprocessing Summary 28

CitiSense: Privacy Breaches Region of Interest (ROI) Behavior Mining Predictive Query Regular Routes Mining Recognizing Travel Modes 29 CSE Dept., University of California, San Diego

CitiSense: Privacy Breaches Region of Interest (ROI) Behavior Mining Predictive Query Regular Routes Mining Recognizing Travel Modes 30 CSE Dept., University of California, San Diego

CitiSense: Region of Interest (ROI) Stops Semantically important part of a trajectory 31 CSE Dept., University of California, San Diego

CitiSense: Region of Interest (ROI) Algorithms: IB-SMoT (Intersection Based Stops and Moves of Trajectories) CB-SMoT (Clustering-Based Stops and Moves of Trajectories) Depends on speed variation If stops are repeated frequently => ROI User s home, office, gym location etc 32 CSE Dept., University of California, San Diego

CitiSense: ROIs 33

CitiSense: ROIs 34

CitiSense: ROIs 35

CitiSense: Privacy Breaches Region of Interest (ROI) Behavior Mining Predictive Query Regular Routes Mining Recognizing Travel Modes 36 CSE Dept., University of California, San Diego

37 CSE Dept., University of California, San Diego

CitiSense: Privacy Breaches Region of Interest (ROI) Behavior Mining Predictive Query Regular Routes Mining Recognizing Travel Modes 38 CSE Dept., University of California, San Diego

39 CSE Dept., University of California, San Diego

CitiSense: Privacy Breaches Region of Interest (ROI) Behavior Mining Predictive Query Regular Routes Mining Recognizing Travel Modes 40 CSE Dept., University of California, San Diego

CitiSense: Regular Route Mining Routes Similarity Routes Grouping 41 CSE Dept., University of California, San Diego

CitiSense: Regular Route Mining 42 CSE Dept., University of California, San Diego

CitiSense: Privacy Breaches Region of Interest (ROI) Behavior Mining Predictive Query Regular Routes Mining Recognizing Travel Modes 43 CSE Dept., University of California, San Diego

CitiSense: Recognizing Travel Modes Walk 2-3 miles/hr 44 CSE Dept., University of California, San Diego

CitiSense: Recognizing Travel Modes Bike 4-5 miles/hr 45 CSE Dept., University of California, San Diego

CitiSense: Recognizing Travel Modes Car 55-70 miles/hr 46 CSE Dept., University of California, San Diego

Demo 47 CSE Dept., University of California, San Diego

CitiSense: Trajectory Anonymization Clustering based Anonymization ROI based Anonymization Temporal Cloaking 48

CitiSense: Trajectory Anonymization Clustering based Anonymization ROI based Anonymization Temporal Cloaking 49

CitiSense: Clustering based Anonymization Also called NWA (Never Walk Alone) Based on the inherent uncertainty of GPS system Trajectory is not a line, it is cylinder utilizes the uncertainty of trajectory data to group k co-localized trajectories within the same time period to form a k-anonymized aggregate trajectory. 50

CitiSense: Clustering based Anonymization 51

CitiSense: Clustering based Anonymization 3 main steps: Pre-processing step: group all trajectories that have the same starting and ending times. Trajectories trimmed if necessary Clustering step: clusters trajectories, near by k-1 Radius is bounded by Space transformation step: arithmetic mean of the cluster See next figure. 52

CitiSense: Clustering based Anonymization 53

CitiSense: Trajectory Anonymization Clustering based Anonymization ROI based Anonymization Temporal Cloaking 54

CitiSense: Utility Before proceeding further, lets analyze utility of CitiSense data. Utility of CitiSense data: is not hurt by changing temporal dimension by a small amount. does not depend on number of points in database, rather number of points in different regions. 55 CSE Dept., University of California, San Diego

CitiSense: ROI based Anonymization Information is revealed from ROIs. Remove frequent stops from trajectory data! Simply removing stops won t work, attacker can still extrapolate. Solution: remove all points in the neighboring regions of the stops also Parameters: Avg. duration and frequency of stop to qualify, area to be removed 56

CitiSense: ROI based Anonymization A user trajectory on a particular day 57

CitiSense: ROI based Anonymization Stops in the trajectory 58

CitiSense: ROI based Anonymization Trajectory after removal of stops 59

CitiSense: ROI based Anonymization Further improvements: Semantic analysis: Tag public and private places for each user Remove private ROIs Increasing utility If privately tagged location contains more than k users. 60

CitiSense: Trajectory Anonymization Clustering based Anonymization ROI based Anonymization Temporal Cloaking 61

CitiSense: Temporal Cloaking Privacy breaches depends on successful creation of trips. Trip segmentation depends heavily on temporal pattern. Idea: Blur the users presence at a location at a particular time by inserting Gaussian noise into time linear relation between distance and time is disrupted. 62

CitiSense: Temporal Cloaking Introduce Gaussian noise in temporal part of the data. Small noises does not hurt the utility of the CitiSense data. 63

CitiSense Anonymization Results 64

CitiSense: Clustering based Anonymization Percentage of points anonymize by NWA is only 48% Points belong to dense region All located near CSE, UCSD k-anonymity NWA needs regions having more than 1 CitiSense user present at approximately the same time. 65 CSE Dept., University of California, San Diego

CitiSense: ROI Results Side Note: concentrated data point => trips=0, stops=1 66

CitiSense: ROI Results 67

CitiSense: ROI Results 68

CitiSense: ROI Results Table suggests anonymization leads to loss of huge data and utility. Is it the right measure? 69 CSE Dept., University of California, San Diego

CitiSense: ROI Results, Coverage The coverage by a data point can be defined as the area where the readings from the sensor can be considered same is the diameter of the cluster is the coverage parameter 70

CitiSense: Results 71

CitiSense: ROI Results Note: Area covered does not take into account overlapping of trajectories. Why? NWA can take care. 72

CitiSense: Temporal Cloaking Results Gaussian parameters (mean, sigma) are set to 600 seconds and 1 respectively. Performing preprocessing in this transformed data results in 54% less trips Points discarded as outliers. 73 CSE Dept., University of California, San Diego

CitiSense: Anonymization Order In what order should we apply these 3 techniques? 1. NWA 2. ROI based 3. Temporal Cloaking. Why? 74 CSE Dept., University of California, San Diego

CitiSense: Results Summary % of points anonymize by NWA is 48% mainly in dense regions ROI based data anonymization Protect personal information of users by compromising utility by 6%. Temporal Cloaking () => 54% less trip segmentation. Implies low data mining to extract information Ex. finding regular routes, mode of transportation 75

Conclusion Major privacy concern is resolved by loss of 6% loss of in utility. NWA will work better in dense data Temporal Cloaking needs more analysis. Can we find mathematical guarantees for immunity against attackers? 76

Questions? 77