Managing and mining (streaming) sensor data

Similar documents
An Adaptive Framework for Multistream Classification

Instructor: Dr. Mehmet Aktaş. Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University

Challenges in Ubiquitous Data Mining

On Biased Reservoir Sampling in the Presence of Stream Evolution

Epilog: Further Topics

Mining Data that Changes. 17 July 2015

Random Sampling over Data Streams for Sequential Pattern Mining

DATA STREAMS: MODELS AND ALGORITHMS

Introduction to Data Mining

Mining of Massive Datasets

Introduction to Data Mining

Thanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently New challenges: with a

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Summarizing and mining inverse distributions on data streams via dynamic inverse sampling

MapReduce: Algorithm Design for Relational Operations

9. Conclusions. 9.1 Definition KDD

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

Frequent Pattern Mining in Data Streams. Raymond Martin

Big Data Analytics CSCI 4030

Case Study: Social Network Analysis. Part II

Deep Learning Photon Identification in a SuperGranular Calorimeter

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Lecture Notes to Big Data Management and Analytics Winter Term 2018/2019 Stream Processing

MULTI-MODAL MAPPING. Robotics Day, 31 Mar Frank Mascarich, Shehryar Khattak, Tung Dang

MULTIDIMENSIONAL INDEXING TREE STRUCTURE FOR SPATIAL DATABASE MANAGEMENT

60-538: Information Retrieval

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344

Multiresolution Motif Discovery in Time Series

A Framework for Clustering Massive Text and Categorical Data Streams

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Clustering from Data Streams

PROOF-Condor integration for ATLAS

Massive Data Analysis

Sensor technology for mobile robots

Mining Frequent Itemsets for data streams over Weighted Sliding Windows

Chapter 1, Introduction

Knowledge Discovery and Data Mining

Chapter 5: Stream Processing. Big Data Management and Analytics 193

Important issues. Query the Sensor Network. Challenges. Challenges. In-network network data aggregation. Distributed In-network network Storage

Data Stream Mining. Tore Risch Dept. of information technology Uppsala University Sweden

Memory Models for Incremental Learning Architectures. Viktor Losing, Heiko Wersing and Barbara Hammer

Web Mining. Data Mining and Text Mining (UIC Politecnico di Milano) Daniele Loiacono

Nesnelerin İnternetinde Veri Analizi

H2020 Space Robotic SRC- OG4

Intelligent Embedded Systems

Web Mining. Data Mining and Text Mining (UIC Politecnico di Milano) Daniele Loiacono

DOI:: /ijarcsse/V7I1/0111

K-means based data stream clustering algorithm extended with no. of cluster estimation method

Web Mining. Data Mining and Text Mining (UIC Politecnico di Milano) Daniele Loiacono

EFFICIENT ADAPTIVE PREPROCESSING WITH DIMENSIONALITY REDUCTION FOR STREAMING DATA

Nesnelerin İnternetinde Veri Analizi

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Towards New Heterogeneous Data Stream Clustering based on Density

High-Dimensional Incremental Divisive Clustering under Population Drift

Large-scale Video Classification with Convolutional Neural Networks

Extended R-Tree Indexing Structure for Ensemble Stream Data Classification

JAVA Projects. 1. Enforcing Multitenancy for Cloud Computing Environments (IEEE 2012).

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

1 (eagle_eye) and Naeem Latif

Mining Data Streams. From Data-Streams Management System Queries to Knowledge Discovery from continuous and fast-evolving Data Records.

Batch-Incremental vs. Instance-Incremental Learning in Dynamic and Evolving Data

SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394

Introduction to Data Management CSE 344

Data Analytics with HPC. Data Streaming

Summary of the LHC Computing Review

Chapter 4. Adaptive Self-tuning : A Neural Network approach. 4.1 Introduction

Massive Data Algorithmics. Lecture 1: Introduction

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Exploratory Analysis: Clustering

Locally Weighted Learning for Control. Alexander Skoglund Machine Learning Course AASS, June 2005

Uncertainties: Representation and Propagation & Line Extraction from Range data

Computer Vision at Cambridge: Reconstruction,Registration and Recognition

Data Mining Course Overview

scikit-learn (Machine Learning in Python)

Improving Packet Processing Performance of a Memory- Bounded Application

Tracker-based Peer Selection using ALTO Map Information

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman

CS60021: Scalable Data Mining. Sourangshu Bhattacharya

3D Perception. CS 4495 Computer Vision K. Hawkins. CS 4495 Computer Vision. 3D Perception. Kelsey Hawkins Robotics

Unsupervised Learning

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams

gsketch: On Query Estimation in Graph Streams

Detecting malware even when it is encrypted

Challenges for Data Driven Systems

: Semantic Web (2013 Fall)

Feature Extraction in Wireless Personal and Local Area Networks

Machine Learning 13. week

Backpack-based inertial navigation and LiDAR mapping in forest environments

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems

Next Steps in Data Mining. Sistemas de Apoio à Decisão Cláudia Antunes

Mining data streams. Irene Finocchi. finocchi/ Intro Puzzles

Aircraft Tracking Based on KLT Feature Tracker and Image Modeling

Neural Network based Energy-Efficient Fault Tolerant Architect

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá

Classification of Concept-Drifting Data Streams using Optimized Genetic Algorithm

Transcription:

Petr Čížek Artificial Intelligence Center Czech Technical University in Prague November 3, 2016 Petr Čížek VPD 1 / 1

Stream data mining / stream data querying Problem definition Data can not be stored Data arrive in stream or streams Random access to data impossible or very expensive single scan algorithms Challenges Queries are continuous Pre-defined vs. ad-hoc queries Answer update over time anytime property Petr Čížek VPD 2 / 1

Practical example 1 - sensors Sensor MB/s GB/h Inertial Measurement Unit 0.1 0.3 Monocular camera (640x480@60fps MJPEG) 1.73 6.1 Monocular camera (640x480@60fps RAW) 17.5 63.2 Stereo camera (2x640x480@60fps RAW) 35 126.4 Velodyne 3D laser scanner 100 351 Petr Čížek VPD 3 / 1

Practical example 2 - institutions Institution GB/s CERN RAW data (sensors) 600000 RAW data processed 25 ALICE 4 ATLAS 1 CMS 0.6 LHCb 0.8 Network peer nodes NIX.cz 37 AMS-IX 500 Petr Čížek VPD 4 / 1

Comparison to traditional data mining Traditional Stream No. of passes Multiple Single Processing time Unlimited Restricted Memory usage Unlimited Restricted Type of result Accurate Approximate Concept Static Evolving Distributed No Yes Petr Čížek VPD 5 / 1

Stream data mining / data querying Applications Statistics, Classification, Clustering, Outlier (error) detection Challenges Pre-defined vs. ad-hoc queries Concept-drift Concept-evolution Feature-evolution Methods Random sampling Sketching Histograms Sliding windows (Fading windows) Multi resolution model (subsampling) Feature selection Petr Čížek VPD 6 / 1

Challenges - concept-drift Statistical properties of the target variable, which the model is trying to predict, evolve over time in unforeseen ways. Petr Čížek VPD 7 / 1

Challenges - concept-drift Statistical properties of the target variable, which the model is trying to predict, evolve over time in unforeseen ways. Petr Čížek VPD 8 / 1

Challenges - concept-drift Active solutions Activated by triggers Can be used by any classification algorithm Need to completely relearn the model when triggered E.g. n latest decisions are monitored Passive solutions Adaptive - continuously updating the model Don t detect changes Petr Čížek VPD 9 / 1

Challenges - concept-evolution Misclassification of novel class in data Petr Čížek VPD 10 / 1

Challenges - feature-evolution The features are evolving throughout the time Petr Čížek VPD 11 / 1

Methods for stream data processing - random sampling Subsample the data in randomized way. Save only 1/n samples randomly Law of large numbers assure probability completeness Petr Čížek VPD 12 / 1

Methods for stream data processing - sketching Extract frequency moments of the stream The kth frequency moment of a set of frequencies a is F k (a) = n i=1 ak i F 1 - total count of different frequencies F 2 - statistical properties - e.g. dispersion F - frequency of the most frequent items Petr Čížek VPD 13 / 1

Methods for stream data processing - histograms Types of histograms V-optimal v 1, v 2 v n classes i (v i ˆv i ) 2 Equal-width End-biased Petr Čížek VPD 14 / 1

Methods for stream data processing - sliding window Forgetting mechanism Sliding window Fading factor Petr Čížek VPD 15 / 1

Methods for stream data processing - multi resolution model Using decision trees Use part of stream for choosing the root attribute Following examples pass to leaves Advantage - scalable Disadvantage - only for stationary distribution Using context-drift aware decision trees Petr Čížek VPD 16 / 1

Methods for stream data processing - feature selection Features are to characterize a particular sample dimension reduction Artificial - (e.g. Image features) Learned - (e.g. using neural networks, reinforcement learning) Petr Čížek VPD 17 / 1

Mining data streams - research issues Mining sequential patterns Mining partial periodicity Mining notable gradients Mining outliers and unusual patterns Clustering Petr Čížek VPD 18 / 1

Thank you for your attention! Petr Čížek VPD 19 / 1

References Pier Luca Lanzi, Course "Machine Learning and Data Mining", Politecnico di Milano http://www.slideshare.net/pierluca.lanzi/18-data-streams Anand Rajaraman and Jeffrey D. Ullman, Mining of Massive Datasets, Cambridge University Press, 2011 Charu C. Aggarwal, Managing and Mining Sensor Data, Springer Science Business Media, 2013. Manuel Martín, Master s thesis: Handling concept drift in data stream mining, University of Granada http://www.slideshare.net/draxus/handling-concept-drift-in-data-stream-mining Petr Čížek VPD 20 / 1