Large-Scale Flight Phase identification from ADS-B Data Using Machine Learning Methods

Size: px
Start display at page:

Download "Large-Scale Flight Phase identification from ADS-B Data Using Machine Learning Methods"

Transcription

1 Large-Scale Flight Phase identification from ADS-B Data Using Methods Junzi Sun PhD student, ATM Control and Simulation, Aerospace Engineering

2 Large-Scale Flight Phase identification from ADS-B Data Using Methods Overview Introduction on the research, ADS-B & Air Traffic Data Flights extraction using machine learning Flight phase determination Results & Conclusions 2

3 Introduction 3

4 Background / Introduction Research group on Air Traffic Management, Aerospace Engineering, TU Delft Aircraft performance modelling with open ADS-B data Goal of this paper/research - developing applicable machine learning methods to process and mining ADS-B data 4

5 What is ADS-B? Automatic dependent surveillance broadcast Determine aircraft position via satellite navigation system (e.g.: GPS) Broadcast information through Mode-S transponder at 1090 MHz ADS-B data containing: a. aircraft ICAO address b. aircraft position (latitude, longitude), and altitude c. aircraft velocity, heading d. aircraft callsign Constantly monitored by global receiver network (e.g.: FlightRadar24) 5

6 Open ADS-B Data Advantages: a. b. c. Large quantity of data Open, unencrypted and available for everyone No limitation of usage and distribution Each day from our receiver: ~ 3 thousand ICAOs ~ 12 million ADS-B messages ~ 5 million positions decoded ~ 5.5 million velocities decoded ~ 2.5 GB raw + compressed data Challenges: a. b. c. d. e. Arbitrary aircraft models and owners Atmospheric conditions uncertainty Wind uncertainty Aircraft weight uncertainty Thrust settings An small Global A320 Dataset (3 days) ~ 60 million entries of data ~ 4 thousand ICAOs ~ 20GB disk storage 6

7 ADS-B receiver and data service 1. Hardware setup 2. Python ADS-B library a. b. Decode position / velocity / ID c. Aggregate position and velocity data 3. ADS-B decoding guide a. 7

8 Single ADS-B receiver 8

9 FlightRadar24 Our receiver is part of FR24 network, which is made by contributors like us Continues tracking with ADS-B with large coverage Example: 24 hours 63 million entries 30,000 airplanes North America, Europe, and South Asia dominates the air traffic 9

10 Technical challenges How to extract information from scattered ADS-B data? How to deal with large variance in the data set? How to deal with unknown number of aircraft contains in a dataset? How to distinct different s carried out by the same aircraft? How to segment trajectory data into separate phases? How to maintain efficiency and scalability? 10

11 Data processing chain 1. ADS-B 2. Data pre-processing 3. Extraction of continuous trajectory, using unsupervised machine learning clustering a. 4. Smoothing, filtering, and interpolating missing data (optional) Flight phase segmentation Raw ADSB data 11

12 Flight Extraction 12

13 Step 1., data formating Mongo DB data External ADS-B Data in different format (e.g. FR24) data 13

14 Preprocessing - encoding, feature scaling 1. Encoding: Convert non numerical label to numerical data (ICAO, ICAO Index Models) 2. Scaling: Project numerical data of different features to the same kts range [0, 100] data 14

15 Choosing a clustering method Selection criterias: 1. Scalability: with very large dataset 2. N-Clusters: non-fixed number of clusters 3. Geometry: distance between nearest 4. Other: Possibility of outlier removal and batch clustering scikit-learn.org data 15

16 Choosing a clustering method - K-Means Idea is from H. Steinhaus (1957) Centroid-based algorithm Predefined number of clusters Minimizing the Euclidian distance of all data point to its cluster centroid data 16

17 Choosing a clustering method - K-Means data 17

18 Choosing a clustering method - BIRCH T. Zhang et al. (1996) BIRCH - balanced iterative reducing and clustering using hierarchies Step 1, scan all data to build an initial CF (Characteristic Feature) Tree, where the leaf contains several sub-clusters. Threshold: Each sub-cluster radius limit by distance T Branching Factor: Each non-leaf node has at most B entries, each leaf node has at most L CF entries Step 2, condensed into desired length by constructing a smaller CF Tree Step 3, global clustering data 18

19 BIRCH (Build CF Tree) Branching Factor (B, L) = 3 data 19

20 BIRCH (Build CF Tree) Branching Factor (B, L) = 3 data 20

21 BIRCH (Build CF Tree) Branching Factor (B, L) = 3 data 21

22 Choosing a clustering method - DBSCAN M. Ester et al. (1996) DBSCAN - Density-based spatial clustering of applications with noise Key parameters: Core, reachable, and outliers Eps: Maximum distance between two data point in the same cluster MinPts: number of data samples in the neighbourhood of a core point To form a cluster: All in a cluster are density connected A new point is part of the cluster if it is reachable from any of a cluster data 22

23 Choosing a clustering method - DBSCAN data 23

24 Results of BIRCH data 24

25 Results of BIRCH data 25

26 Results of BIRCH data 26

27 Results of DBSCAN data 27

28 Results of DBSCAN data 28

29 Results of DBSCAN data 29

30 Flight Phase Segmentation 30

31 Flight Phase Segmentation Goal: To separate data into different phases Challenges: 1. Data size 2. Uncertainty behavior a. b. c. duration altitude velocity 3. Accuracy of the algorithms data 31

32 Design Member functions design: Membership functions design Altitude {high, low, and ground} RoC: {zero, positive, negative} Ground speed: {high, medium, low} Flight phases: {ground, climbing, descending, cruise} data 32

33 Segmentation Flow data 33

34 Segmentation result (Example) data 34

35 Conclusion 35

36 Conclusion Raw ADSB data 36

37 Conclusion 1. The chosen methods have shown promising results. a. b. Clustering (DBSCAN, BIRCH) i. Large size of dataset ii. Unknown number of aircraft and s Flight phase segmentation ( ) i. Different profiles ii. Advantage over noisy trajectory data 2. Limitations: a. b. Lack of stream data clustering Offline use only 3. Open Python libraries / code: a. b

38 [github] 38

Large-Scale Flight Phase Identification from ADS-B Data Using Machine Learning Methods

Large-Scale Flight Phase Identification from ADS-B Data Using Machine Learning Methods Large-Scale Flight Phase Identification from ADS-B Data Using Machine Learning Methods Junzi Sun, Joost Ellerbroek, Jacco Hoekstra Control and Simulation, Faculty of Aerospace Engineering Delft University

More information

Modeling Aircraft Performance Parameters with Open ADS-B Data

Modeling Aircraft Performance Parameters with Open ADS-B Data Twelfth USA/Europe Air Traffic Management Research and Development Seminar Modeling Aircraft Performance Parameters with Open ADS-B Data Junzi Sun, Joost Ellerbroek, Jacco Hoekstra Delft University of

More information

Clustering Part 4 DBSCAN

Clustering Part 4 DBSCAN Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

Unsupervised Learning Hierarchical Methods

Unsupervised Learning Hierarchical Methods Unsupervised Learning Hierarchical Methods Road Map. Basic Concepts 2. BIRCH 3. ROCK The Principle Group data objects into a tree of clusters Hierarchical methods can be Agglomerative: bottom-up approach

More information

University of Florida CISE department Gator Engineering. Clustering Part 4

University of Florida CISE department Gator Engineering. Clustering Part 4 Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

Clustering part II 1

Clustering part II 1 Clustering part II 1 Clustering What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods 2 Partitioning Algorithms:

More information

Density-Based Clustering. Izabela Moise, Evangelos Pournaras

Density-Based Clustering. Izabela Moise, Evangelos Pournaras Density-Based Clustering Izabela Moise, Evangelos Pournaras Izabela Moise, Evangelos Pournaras 1 Reminder Unsupervised data mining Clustering k-means Izabela Moise, Evangelos Pournaras 2 Main Clustering

More information

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017)

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017) 1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should

More information

PAM algorithm. Types of Data in Cluster Analysis. A Categorization of Major Clustering Methods. Partitioning i Methods. Hierarchical Methods

PAM algorithm. Types of Data in Cluster Analysis. A Categorization of Major Clustering Methods. Partitioning i Methods. Hierarchical Methods Whatis Cluster Analysis? Clustering Types of Data in Cluster Analysis Clustering part II A Categorization of Major Clustering Methods Partitioning i Methods Hierarchical Methods Partitioning i i Algorithms:

More information

Clustering in Data Mining

Clustering in Data Mining Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,

More information

DBSCAN. Presented by: Garrett Poppe

DBSCAN. Presented by: Garrett Poppe DBSCAN Presented by: Garrett Poppe A density-based algorithm for discovering clusters in large spatial databases with noise by Martin Ester, Hans-peter Kriegel, Jörg S, Xiaowei Xu Slides adapted from resources

More information

Clustering to Reduce Spatial Data Set Size

Clustering to Reduce Spatial Data Set Size Clustering to Reduce Spatial Data Set Size Geoff Boeing arxiv:1803.08101v1 [cs.lg] 21 Mar 2018 1 Introduction Department of City and Regional Planning University of California, Berkeley March 2018 Traditionally

More information

Density-based clustering algorithms DBSCAN and SNN

Density-based clustering algorithms DBSCAN and SNN Density-based clustering algorithms DBSCAN and SNN Version 1.0, 25.07.2005 Adriano Moreira, Maribel Y. Santos and Sofia Carneiro {adriano, maribel, sofia}@dsi.uminho.pt University of Minho - Portugal 1.

More information

Data Mining 4. Cluster Analysis

Data Mining 4. Cluster Analysis Data Mining 4. Cluster Analysis 4.5 Spring 2010 Instructor: Dr. Masoud Yaghini Introduction DBSCAN Algorithm OPTICS Algorithm DENCLUE Algorithm References Outline Introduction Introduction Density-based

More information

A Review on Cluster Based Approach in Data Mining

A Review on Cluster Based Approach in Data Mining A Review on Cluster Based Approach in Data Mining M. Vijaya Maheswari PhD Research Scholar, Department of Computer Science Karpagam University Coimbatore, Tamilnadu,India Dr T. Christopher Assistant professor,

More information

CS570: Introduction to Data Mining

CS570: Introduction to Data Mining CS570: Introduction to Data Mining Scalable Clustering Methods: BIRCH and Others Reading: Chapter 10.3 Han, Chapter 9.5 Tan Cengiz Gunay, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei.

More information

Big Data SONY Håkan Jonsson Vedran Sekara

Big Data SONY Håkan Jonsson Vedran Sekara Big Data 2016 - SONY Håkan Jonsson Vedran Sekara Schedule 09:15-10:00 Cluster analysis, partition-based clustering 10.00 10.15 Break 10:15 12:00 Exercise 1: User segmentation based on app usage 12:00-13:15

More information

Data Reduction and Partitioning in an Extreme Scale GPU-Based Clustering Algorithm

Data Reduction and Partitioning in an Extreme Scale GPU-Based Clustering Algorithm Data Reduction and Partitioning in an Extreme Scale GPU-Based Clustering Algorithm Benjamin Welton and Barton Miller Paradyn Project University of Wisconsin - Madison DRBSD-2 Workshop November 17 th 2017

More information

Wake Vortex Tangential Velocity Adaptive Spectral (TVAS) Algorithm for Pulsed Lidar Systems

Wake Vortex Tangential Velocity Adaptive Spectral (TVAS) Algorithm for Pulsed Lidar Systems Wake Vortex Tangential Velocity Adaptive Spectral (TVAS) Algorithm for Pulsed Lidar Systems Hadi Wassaf David Burnham Frank Wang Communication, Navigation, Surveillance (CNS) and Traffic Management Systems

More information

CHAPTER 4: CLUSTER ANALYSIS

CHAPTER 4: CLUSTER ANALYSIS CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis

More information

Data Mining Algorithms

Data Mining Algorithms for the original version: -JörgSander and Martin Ester - Jiawei Han and Micheline Kamber Data Management and Exploration Prof. Dr. Thomas Seidl Data Mining Algorithms Lecture Course with Tutorials Wintersemester

More information

Clustering Lecture 4: Density-based Methods

Clustering Lecture 4: Density-based Methods Clustering Lecture 4: Density-based Methods Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced

More information

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science Data Mining Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology Department of Computer Science 2016 201 Road map What is Cluster Analysis? Characteristics of Clustering

More information

Clustering Algorithm (DBSCAN) VISHAL BHARTI Computer Science Dept. GC, CUNY

Clustering Algorithm (DBSCAN) VISHAL BHARTI Computer Science Dept. GC, CUNY Clustering Algorithm (DBSCAN) VISHAL BHARTI Computer Science Dept. GC, CUNY Clustering Algorithm Clustering is an unsupervised machine learning algorithm that divides a data into meaningful sub-groups,

More information

Lesson 3. Prof. Enza Messina

Lesson 3. Prof. Enza Messina Lesson 3 Prof. Enza Messina Clustering techniques are generally classified into these classes: PARTITIONING ALGORITHMS Directly divides data points into some prespecified number of clusters without a hierarchical

More information

CHAPTER 7. PAPER 3: EFFICIENT HIERARCHICAL CLUSTERING OF LARGE DATA SETS USING P-TREES

CHAPTER 7. PAPER 3: EFFICIENT HIERARCHICAL CLUSTERING OF LARGE DATA SETS USING P-TREES CHAPTER 7. PAPER 3: EFFICIENT HIERARCHICAL CLUSTERING OF LARGE DATA SETS USING P-TREES 7.1. Abstract Hierarchical clustering methods have attracted much attention by giving the user a maximum amount of

More information

CHAPTER 4 AN IMPROVED INITIALIZATION METHOD FOR FUZZY C-MEANS CLUSTERING USING DENSITY BASED APPROACH

CHAPTER 4 AN IMPROVED INITIALIZATION METHOD FOR FUZZY C-MEANS CLUSTERING USING DENSITY BASED APPROACH 37 CHAPTER 4 AN IMPROVED INITIALIZATION METHOD FOR FUZZY C-MEANS CLUSTERING USING DENSITY BASED APPROACH 4.1 INTRODUCTION Genes can belong to any genetic network and are also coordinated by many regulatory

More information

Clustering Techniques

Clustering Techniques Clustering Techniques Marco BOTTA Dipartimento di Informatica Università di Torino botta@di.unito.it www.di.unito.it/~botta/didattica/clustering.html Data Clustering Outline What is cluster analysis? What

More information

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan Working with Unlabeled Data Clustering Analysis Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan chanhl@mail.cgu.edu.tw Unsupervised learning Finding centers of similarity using

More information

A Comparative Study of Various Clustering Algorithms in Data Mining

A Comparative Study of Various Clustering Algorithms in Data Mining Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,

More information

DS504/CS586: Big Data Analytics Big Data Clustering II

DS504/CS586: Big Data Analytics Big Data Clustering II Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: KH 116 Fall 2017 Updates: v Progress Presentation: Week 15: 11/30 v Next Week Office hours

More information

DS504/CS586: Big Data Analytics Big Data Clustering II

DS504/CS586: Big Data Analytics Big Data Clustering II Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: AK 232 Fall 2016 More Discussions, Limitations v Center based clustering K-means BFR algorithm

More information

Introduction to Trajectory Clustering. By YONGLI ZHANG

Introduction to Trajectory Clustering. By YONGLI ZHANG Introduction to Trajectory Clustering By YONGLI ZHANG Outline 1. Problem Definition 2. Clustering Methods for Trajectory data 3. Model-based Trajectory Clustering 4. Applications 5. Conclusions 1 Problem

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/28/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

More information

Faster Simulations of the National Airspace System

Faster Simulations of the National Airspace System Faster Simulations of the National Airspace System PK Menon Monish Tandale Sandy Wiraatmadja Optimal Synthesis Inc. Joseph Rios NASA Ames Research Center NVIDIA GPU Technology Conference 2010, San Jose,

More information

Clustering in Ratemaking: Applications in Territories Clustering

Clustering in Ratemaking: Applications in Territories Clustering Clustering in Ratemaking: Applications in Territories Clustering Ji Yao, PhD FIA ASTIN 13th-16th July 2008 INTRODUCTION Structure of talk Quickly introduce clustering and its application in insurance ratemaking

More information

What is Cluster Analysis? COMP 465: Data Mining Clustering Basics. Applications of Cluster Analysis. Clustering: Application Examples 3/17/2015

What is Cluster Analysis? COMP 465: Data Mining Clustering Basics. Applications of Cluster Analysis. Clustering: Application Examples 3/17/2015 // What is Cluster Analysis? COMP : Data Mining Clustering Basics Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, rd ed. Cluster: A collection of data

More information

Clustering Algorithms for Data Stream

Clustering Algorithms for Data Stream Clustering Algorithms for Data Stream Karishma Nadhe 1, Prof. P. M. Chawan 2 1Student, Dept of CS & IT, VJTI Mumbai, Maharashtra, India 2Professor, Dept of CS & IT, VJTI Mumbai, Maharashtra, India Abstract:

More information

Sandy Brownlee 1, Jason Atkin 2, John Woodward 1, Una Benlic 1 & Edmund Burke 1

Sandy Brownlee 1, Jason Atkin 2, John Woodward 1, Una Benlic 1 & Edmund Burke 1 Sandy Brownlee 1, Jason Atkin 2, John Woodward 1, Una Benlic 1 & Edmund Burke 1 1 CHORDS Group, University of Stirling 2 ASAP Group, University of Nottingham Outline The ground movement problem Introduction

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

Unsupervised Learning

Unsupervised Learning Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised

More information

MULTIVARIATE ANALYSIS OF STEALTH QUANTITATES (MASQ)

MULTIVARIATE ANALYSIS OF STEALTH QUANTITATES (MASQ) MULTIVARIATE ANALYSIS OF STEALTH QUANTITATES (MASQ) Application of Machine Learning to Testing in Finance, Cyber, and Software Innovation center, Washington, D.C. THE SCIENCE OF TEST WORKSHOP 2017 AGENDA

More information

Publishing CitiSense Data: Privacy Concerns and Remedies

Publishing CitiSense Data: Privacy Concerns and Remedies Publishing CitiSense Data: Privacy Concerns and Remedies Kapil Gupta Advisor : Prof. Bill Griswold 1 Location Based Services Great utility of location based services data traffic control, mobility management,

More information

Hierarchy. No or very little supervision Some heuristic quality guidances on the quality of the hierarchy. Jian Pei: CMPT 459/741 Clustering (2) 1

Hierarchy. No or very little supervision Some heuristic quality guidances on the quality of the hierarchy. Jian Pei: CMPT 459/741 Clustering (2) 1 Hierarchy An arrangement or classification of things according to inclusiveness A natural way of abstraction, summarization, compression, and simplification for understanding Typical setting: organize

More information

Chapter 8: GPS Clustering and Analytics

Chapter 8: GPS Clustering and Analytics Chapter 8: GPS Clustering and Analytics Location information is crucial for analyzing sensor data and health inferences from mobile and wearable devices. For example, let us say you monitored your stress

More information

Unsupervised learning on Color Images

Unsupervised learning on Color Images Unsupervised learning on Color Images Sindhuja Vakkalagadda 1, Prasanthi Dhavala 2 1 Computer Science and Systems Engineering, Andhra University, AP, India 2 Computer Science and Systems Engineering, Andhra

More information

Unsupervised Learning. Andrea G. B. Tettamanzi I3S Laboratory SPARKS Team

Unsupervised Learning. Andrea G. B. Tettamanzi I3S Laboratory SPARKS Team Unsupervised Learning Andrea G. B. Tettamanzi I3S Laboratory SPARKS Team Table of Contents 1)Clustering: Introduction and Basic Concepts 2)An Overview of Popular Clustering Methods 3)Other Unsupervised

More information

Geometric Rectification of Remote Sensing Images

Geometric Rectification of Remote Sensing Images Geometric Rectification of Remote Sensing Images Airborne TerrestriaL Applications Sensor (ATLAS) Nine flight paths were recorded over the city of Providence. 1 True color ATLAS image (bands 4, 2, 1 in

More information

OSM-SVG Converting for Open Road Simulator

OSM-SVG Converting for Open Road Simulator OSM-SVG Converting for Open Road Simulator Rajashree S. Sokasane, Kyungbaek Kim Department of Electronics and Computer Engineering Chonnam National University Gwangju, Republic of Korea sokasaners@gmail.com,

More information

Clustering from Data Streams

Clustering from Data Streams Clustering from Data Streams João Gama LIAAD-INESC Porto, University of Porto, Portugal jgama@fep.up.pt 1 Introduction 2 Clustering Micro Clustering 3 Clustering Time Series Growing the Structure Adapting

More information

Distance-based Methods: Drawbacks

Distance-based Methods: Drawbacks Distance-based Methods: Drawbacks Hard to find clusters with irregular shapes Hard to specify the number of clusters Heuristic: a cluster must be dense Jian Pei: CMPT 459/741 Clustering (3) 1 How to Find

More information

CS412 Homework #3 Answer Set

CS412 Homework #3 Answer Set CS41 Homework #3 Answer Set December 1, 006 Q1. (6 points) (1) (3 points) Suppose that a transaction datase DB is partitioned into DB 1,..., DB p. The outline of a distributed algorithm is as follows.

More information

A Survey on Clustering Algorithms for Data in Spatial Database Management Systems

A Survey on Clustering Algorithms for Data in Spatial Database Management Systems A Survey on Algorithms for Data in Spatial Database Management Systems Dr.Chandra.E Director Department of Computer Science DJ Academy for Managerial Excellence Coimbatore, India Anuradha.V.P Research

More information

Framework of Frequently Trajectory Extraction from AIS Data

Framework of Frequently Trajectory Extraction from AIS Data Framework of Frequently Trajectory Extraction from AIS Data 1 State Key Laboratory of Networking and Switching Technology,Beijing University of Posts and Telecommunications Beijing,100876, China E-mail:

More information

Chapter 4: Text Clustering

Chapter 4: Text Clustering 4.1 Introduction to Text Clustering Clustering is an unsupervised method of grouping texts / documents in such a way that in spite of having little knowledge about the content of the documents, we can

More information

Towards New Heterogeneous Data Stream Clustering based on Density

Towards New Heterogeneous Data Stream Clustering based on Density , pp.30-35 http://dx.doi.org/10.14257/astl.2015.83.07 Towards New Heterogeneous Data Stream Clustering based on Density Chen Jin-yin, He Hui-hao Zhejiang University of Technology, Hangzhou,310000 chenjinyin@zjut.edu.cn

More information

Data Mining. Jeff M. Phillips. January 7, 2019 CS 5140 / CS 6140

Data Mining. Jeff M. Phillips. January 7, 2019 CS 5140 / CS 6140 Data Mining CS 5140 / CS 6140 Jeff M. Phillips January 7, 2019 What is Data Mining? What is Data Mining? Finding structure in data? Machine learning on large data? Unsupervised learning? Large scale computational

More information

Tree-Based Density Clustering using Graphics Processors

Tree-Based Density Clustering using Graphics Processors Tree-Based Density Clustering using Graphics Processors A First Marriage of MRNet and GPUs Evan Samanas and Ben Welton Paradyn Project Paradyn / Dyninst Week College Park, Maryland March 26-28, 2012 The

More information

Mobility Data Management & Exploration

Mobility Data Management & Exploration Mobility Data Management & Exploration Ch. 07. Mobility Data Mining and Knowledge Discovery Nikos Pelekis & Yannis Theodoridis InfoLab University of Piraeus Greece infolab.cs.unipi.gr v.2014.05 Chapter

More information

Contents. Part I Setting the Scene

Contents. Part I Setting the Scene Contents Part I Setting the Scene 1 Introduction... 3 1.1 About Mobility Data... 3 1.1.1 Global Positioning System (GPS)... 5 1.1.2 Format of GPS Data... 6 1.1.3 Examples of Trajectory Datasets... 8 1.2

More information

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Database and Knowledge-Base Systems: Data Mining. Martin Ester Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro

More information

Data Mining. Jeff M. Phillips. January 8, 2014

Data Mining. Jeff M. Phillips. January 8, 2014 Data Mining Jeff M. Phillips January 8, 2014 Data Mining What is Data Mining? Finding structure in data? Machine learning on large data? Unsupervised learning? Large scale computational statistics? Data

More information

Data Collection, Preprocessing and Implementation

Data Collection, Preprocessing and Implementation Chapter 6 Data Collection, Preprocessing and Implementation 6.1 Introduction Data collection is the loosely controlled method of gathering the data. Such data are mostly out of range, impossible data combinations,

More information

Data Mining Course Overview

Data Mining Course Overview Data Mining Course Overview 1 Data Mining Overview Understanding Data Classification: Decision Trees and Bayesian classifiers, ANN, SVM Association Rules Mining: APriori, FP-growth Clustering: Hierarchical

More information

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018)

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018) 1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 09: Vector Data: Clustering Basics Instructor: Yizhou Sun yzsun@cs.ucla.edu October 27, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification

More information

DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li

DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH 116 Fall 2017 First Grading for Reading Assignment Weka v 6 weeks v https://weka.waikato.ac.nz/dataminingwithweka/preview

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

Lecture 7 Cluster Analysis: Part A

Lecture 7 Cluster Analysis: Part A Lecture 7 Cluster Analysis: Part A Zhou Shuigeng May 7, 2007 2007-6-23 Data Mining: Tech. & Appl. 1 Outline What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering

More information

Physical Interface. Interface Priority. MAVLink documentation can be found at:

Physical Interface. Interface Priority. MAVLink documentation can be found at: MAVLink documentation can be found at: http://qgroundcontrol.org/mavlink/start Physical Interface Communication to a uavionix transponder is accomplished through a full duplex asynchronous serial interface.

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

Density Based Clustering using Modified PSO based Neighbor Selection

Density Based Clustering using Modified PSO based Neighbor Selection Density Based Clustering using Modified PSO based Neighbor Selection K. Nafees Ahmed Research Scholar, Dept of Computer Science Jamal Mohamed College (Autonomous), Tiruchirappalli, India nafeesjmc@gmail.com

More information

DOCUMENT CHANGE RECORD. The following table records the complete history of the successive editions of the present document.

DOCUMENT CHANGE RECORD. The following table records the complete history of the successive editions of the present document. EUROCONTROL INTERFACE SPECIFICATION Page 1 DOCUMENT CHANGE RECORD The following table records the complete history of the successive editions of the present document. EDITION DATE COMMENT 2.5 ter 20 May

More information

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking

More information

Knowledge Discovery in Databases

Knowledge Discovery in Databases Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Lecture notes Knowledge Discovery in Databases Summer Semester 2012 Lecture 8: Clustering

More information

COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS

COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS Mariam Rehman Lahore College for Women University Lahore, Pakistan mariam.rehman321@gmail.com Syed Atif Mehdi University of Management and Technology Lahore,

More information

数据挖掘 Introduction to Data Mining

数据挖掘 Introduction to Data Mining 数据挖掘 Introduction to Data Mining Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2019 S8700113C 1 Introduction Last week: Association Analysis

More information

Chapter 5: Outlier Detection

Chapter 5: Outlier Detection Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 5: Outlier Detection Lecture: Prof. Dr.

More information

Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013

Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013 Your Name: Your student id: Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013 Problem 1 [5+?]: Hypothesis Classes Problem 2 [8]: Losses and Risks Problem 3 [11]: Model Generation

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

Visual Traffic Jam Analysis based on Trajectory Data

Visual Traffic Jam Analysis based on Trajectory Data Visualization Workshop 13 Visual Traffic Jam Analysis based on Trajectory Data Zuchao Wang 1, Min Lu 1, Xiaoru Yuan 1, 2, Junping Zhang 3, Huub van de Wetering 4 1) Key Laboratory of Machine Perception

More information

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 1. Supervised Data Mining Classification Regression Outlier detection

More information

Spatial Outlier Detection

Spatial Outlier Detection Spatial Outlier Detection Chang-Tien Lu Department of Computer Science Northern Virginia Center Virginia Tech Joint work with Dechang Chen, Yufeng Kou, Jiang Zhao 1 Spatial Outlier A spatial data point

More information

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm DATA MINING LECTURE 7 Hierarchical Clustering, DBSCAN The EM Algorithm CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related)

More information

Data Stream Mining. Tore Risch Dept. of information technology Uppsala University Sweden

Data Stream Mining. Tore Risch Dept. of information technology Uppsala University Sweden Data Stream Mining Tore Risch Dept. of information technology Uppsala University Sweden 2016-02-25 Enormous data growth Read landmark article in Economist 2010-02-27: http://www.economist.com/node/15557443/

More information

Crime Prediction and Analysis using Clustering Approaches and Regression Methods

Crime Prediction and Analysis using Clustering Approaches and Regression Methods Crime Prediction and Analysis using Clustering Approaches and Regression Methods 1 Raghavendhar T.V, 2 Joslin Joshy, 3 Mahaalakshmi R, 4 Ashutosh Soni M 1 Department of CSE, SRM Institute of Science and

More information

Faster Clustering with DBSCAN

Faster Clustering with DBSCAN Faster Clustering with DBSCAN Marzena Kryszkiewicz and Lukasz Skonieczny Institute of Computer Science, Warsaw University of Technology, Nowowiejska 15/19, 00-665 Warsaw, Poland Abstract. Grouping data

More information

Crime - Based Predictive Analysis and Warning System

Crime - Based Predictive Analysis and Warning System Crime - Based Predictive Analysis and Warning System Sahil Puri, Parul Verma 12.01.2016 Outline Motivation Goal Dataset details Architecture Modelling and Approach Progress Future work Motivation and Goal

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,

More information

3. Data Structures for Image Analysis L AK S H M O U. E D U

3. Data Structures for Image Analysis L AK S H M O U. E D U 3. Data Structures for Image Analysis L AK S H M AN @ O U. E D U Different formulations Can be advantageous to treat a spatial grid as a: Levelset Matrix Markov chain Topographic map Relational structure

More information

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before

More information

TDT- An Efficient Clustering Algorithm for Large Database Ms. Kritika Maheshwari, Mr. M.Rajsekaran

TDT- An Efficient Clustering Algorithm for Large Database Ms. Kritika Maheshwari, Mr. M.Rajsekaran TDT- An Efficient Clustering Algorithm for Large Database Ms. Kritika Maheshwari, Mr. M.Rajsekaran M-Tech Scholar, Department of Computer Science and Engineering, SRM University, India Assistant Professor,

More information

Analysis and Extensions of Popular Clustering Algorithms

Analysis and Extensions of Popular Clustering Algorithms Analysis and Extensions of Popular Clustering Algorithms Renáta Iváncsy, Attila Babos, Csaba Legány Department of Automation and Applied Informatics and HAS-BUTE Control Research Group Budapest University

More information

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data

More information

BCON 1. User Guide. Document Number: SSD1000A4 Rev 1.4,

BCON 1. User Guide. Document Number: SSD1000A4 Rev 1.4, BCON 1 User Guide Rev 1.4, 2017-07-31 SKYSENSE AB Kistagången 12, SE-164 40 Kista, Sweden, support@skysense.io Page 1 of 19 Contents User Guide...1 Contents...2 Introduction...3 Overview of the BCON1 Transmitter...3

More information

9. Conclusions. 9.1 Definition KDD

9. Conclusions. 9.1 Definition KDD 9. Conclusions Contents of this Chapter 9.1 Course review 9.2 State-of-the-art in KDD 9.3 KDD challenges SFU, CMPT 740, 03-3, Martin Ester 419 9.1 Definition KDD [Fayyad, Piatetsky-Shapiro & Smyth 96]

More information

A Framework for construction of and computations on Four-Dimensional Aircraft Trajectories

A Framework for construction of and computations on Four-Dimensional Aircraft Trajectories A Framework for construction of and computations on Four-Dimensional Aircraft Trajectories CMSC 663-664 2016-2017 Author: Jon Dehn jondehn@umd.edu Advisor: Dr. Sergio Torres Fellow, Leidos Corporation

More information

! Introduction. ! Partitioning methods. ! Hierarchical methods. ! Model-based methods. ! Density-based methods. ! Scalability

! Introduction. ! Partitioning methods. ! Hierarchical methods. ! Model-based methods. ! Density-based methods. ! Scalability Preview Lecture Clustering! Introduction! Partitioning methods! Hierarchical methods! Model-based methods! Densit-based methods What is Clustering?! Cluster: a collection of data objects! Similar to one

More information

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned

More information