Why? SenseML 2014 Keynote Immanuel Schweizer
Background Immanuel Schweizer TU Darmstadt, Germany Telecooperation Lab Ubiquitous Computing Smart Urban Networks SenseML 2014 2
Background Graph-based optimization for P2P networks PhD Thesis Energy-efficient network protocols for wireless sensor networks Flow Control Topology Control Application: Urban Management SenseML 2014 3
Background SenseML 2014 4
Inductive Loops >150 traffic lights ~3,000 sensors Two parameters Utilization Count SenseML 2014 5
Street Cars ~10 sensors Deployed on streetcars Solar cells, Zigbee (868 MHz), temperature, GPS, SenseML 2014 6
Phones / Noisemap Noise pollution via microphone More than 2000 installations 30 active users per day ~ 750,000 data points Gamification Calibration SenseML 2014 7
da_sense SenseML 2014 8
More sensors more data! SenseML 2014 9
And more data OpenSense (ETH Zurich, http://www.opensense.ethz.ch/trac/) DeviceAnalyzer (University of Cambridge, https://deviceanalyzer.cl.cam.ac.uk/) SenseML 2014 10
What do we do with all that data? SenseML 2014 11
What do we do with all that data? Help with planning tasks Understand human activity Environmental models Detect events Track users Nowcasting / Forecasting SenseML 2014 12
Machine Learning SenseML 2014 13
What s special about sensor data? SenseML 2014 14
Where does sensor data come from? SenseML 2014 15
Sensor Infrastructure SenseML 2014 16
Sensor Infrastructure High cost per sensor Mostly wired High quality of information Some kind of certification SenseML 2014 17
Sensor Infrastructure (Wireless) Sensor Networks SenseML 2014 18
Wireless Sensor Networks Cheaper hardware Mostly wireless Battery-powered Mixed quality of information High diversity SenseML 2014 19
Sensor Infrastructure (Wireless) Sensor Networks Mobile Sensing / User-generated Data SenseML 2014 20
Mobile Sensing Easy development and deployment Almost no hardware cost Lack of control over quality of information Privacy Humans-in-the-loop SenseML 2014 21
Sensor Infrastructure Quality (Wireless) Sensor Networks Quantity Mobile Sensing / User-generated Data SenseML 2014 22
What s special about sensor data? Heterogeneity Unstructured vs. Structured data Different hardware Different Sensors Mobile Phones vs. Dedicated Hardware Heterogeneity of data sources Spatial and time resolution Quality-of-Information Low cost sensors Mobility Human-in-the-loop Faults Placement SenseML 2014 23
Preprocessing Data Fusion Integrating External Sources Filtering Approximation Fault Detection Manual Cleaning SenseML 2014 24
Example 1: Location SenseML 2014 25
Example 2: Filtering Noisemap SenseML 2014 26
Example 2: Filtering Noisemap SenseML 2014 27
Example 3: Road Network Traffic measurements Noise measurements Idea: Predict traffic, based on noise measurements SenseML 2014 28
Example 3: Road Network SenseML 2014 29
Road network data processing Road Characteristics Road Type Surface Type Maximum Speed Oneway Number of lanes Etc. Road Segment Road Segment Geometry A polygon area in WGS 84 coordinate system Selection area geometry An area around the road segment, excluding the space near neighbor segements and the areas of surrounding buildings. Average sound pressure level for a time interval Weather conditions Traffic level SenseML 2014 30
Road network data processing OpenStreetMap Goal - create road segments automatically Largest free road network dataset OSM Data format Node, way, relation Attributes SenseML 2014 31
Road network data processing OSM - Non-planar topology Straight-forward planarization not possible Road segment separated in multiple polylines SenseML 2014 32
Road network data processing Misclassified road links Remove "unclassified" roads Filter by length Represent multiple ways as single way Merge ways Missing common node Merge nodes in proximity of 5 cm SenseML 2014 33
Road network data processing Clean up Combine parallel ways of the same street SenseML 2014 34
Road network data processing 2D geometry Based on number of lanes SenseML 2014 35
Road network data processing Spatial filter Which sound pressure records to include? Straight-forward approach: select measurements based on proximity 2 spatial buffers around each segment SelectionArea = A\(B 1 B 2 B n ) SenseML 2014 36
Road network data processing Exclude buildings Location accuracy - falsely included/excluded measurements Inward/outward offsetting Inward: minimize the number of included measurements, that are recorded outside Outward: minimize the number of filtered out measurements, that are recorded inside SenseML 2014 37
Example 3: Road Network SenseML 2014 38
What s special about sensor data? SenseML 2014 39
What s special about sensor data? SenseML 2014 40
What s special about sensor data? =? SenseML 2014 41
Real-world data Classes for classification Sound Level Traffic Level SenseML 2014 42
Example: Traffic Level SenseML 2014 43
Example: Traffic Level SenseML 2014 44
Real-world data Classes for classification Sound Level Traffic Level Evaluation Transferability SenseML 2014 45
Example: Noise Pollution Initial Dataset External Data Sources Classification Visualization Noisemap OpenStreetMap Additional Data 1 Instances of noise data Data File Extracting OSM information about nearby streets Adding additional information 1 Attributes ARFF Writer Geocoordinates Extracting information about nearby buildings Extracting weather information in the surrounding area Decision Tree Learning 2 Point of Interest Object Data (RDF) SPARQL LinkedGeoData Data File WeatherData 2 Final Model Sound Level Prediction SenseML 2014 46
Evaluation Cross Validation Accuracy, Precision, Recall ~ 80% Other Models Same Resolution Same Input Data Difference? Human-readable rules SenseML 2014 47
Transferability Perfect Model for Darmstadt No noise data in Nancy, France Same Features? External data sources Different regulations SenseML 2014 48
What s special about sensor data? SenseML 2014 49
Pipeline Initial Dataset External Data Sources Classification Visualization Noisemap OpenStreetMap Additional Data 1 Instances of noise data Data File Extracting OSM information about nearby streets Adding additional information 1 Attributes ARFF Writer Geocoordinates Extracting information about nearby buildings Extracting weather information in the surrounding area Decision Tree Learning 2 Point of Interest Object Data (RDF) SPARQL LinkedGeoData Data File WeatherData 2 Final Model Sound Level Prediction SenseML 2014 50
Pipelines Initial Dataset Noisemap External Data Sources OpenStreetMap Additional Data Classification Visualization 1 Instances of noise data Data File Extracting OSM information about nearby streets Adding additional information 1 Attributes ARFF Writer Geocoordinates 2 Point of Interest Extracting information about nearby buildings Object Data (RDF) SPARQL LinkedGeoData Extracting weather information in the surrounding area Data File WeatherData 2 Decision Tree Learning Final Model Sound Level Prediction Layer 1 OSM XML Measurements Traffic Data Layer 2 OSM Parser Measurement Filter Traffic Parser Layer 3 Training Set Builder Machine Learning Model SenseML 2014 51
Pipelines Initial Dataset Noisemap External Data Sources OpenStreetMap Additional Data Classification Visualization 1 Instances of noise data Data File Extracting OSM information about nearby streets Adding additional information 1 Attributes ARFF Writer Standardized Toolbox Rapidminer++ Geocoordinates Extracting information about nearby buildings Object Data (RDF) SPARQL LinkedGeoData Extracting weather information in the surrounding area WeatherData Generalize Components (with interfaces) 2 Point of Interest Data File 2 Decision Tree Learning Final Model Sound Level Prediction Learn and share What parts can be generalized? Why? Share your experience about building these pipelines SenseML 2014 52
What s special about sensor data? Preprocessing Heterogeneity QoI Real-World Classes Evaluation Transferability Pipeline Share, learn, and standardize? More automation SenseML 2014 53