An Empirical Comparison of Stream Clustering Algorithms

Size: px
Start display at page:

Download "An Empirical Comparison of Stream Clustering Algorithms"

Transcription

1 MÜNSTER An Empirical Comparison of Stream Clustering Algorithms Matthias Carnein Dennis Assenmacher Heike Trautmann CF 17 BigDAW Workshop Siena Italy May

2 Clustering MÜNSTER An Empirical Comparison of Stream Clustering Algorithms 2 / Clustering aims to discover groups of similar objects Traditional algorithms assume a fixed set of data If new data arrives: Repeat the entire procedure

3 Stream Clustering MÜNSTER An Empirical Comparison of Stream Clustering Algorithms 3 /29 6 data stream Stream Model 1. Data points arrive as a continuous stream 2. The size of the stream is large and potentially unbounded 3. Data points can only be evaluated once 4. The order of data points cannot be influenced 5. The distribution of the data can change over time

4 Outline MÜNSTER An Empirical Comparison of Stream Clustering Algorithms 4 /29 1. Stream Clustering 2. Setup 3. Results 4. Conclusion

5 Outline MÜNSTER An Empirical Comparison of Stream Clustering Algorithms 4 /29 1. Stream Clustering 2. Setup 3. Results 4. Conclusion

6 MÜNSTER An Empirical Comparison of Stream Clustering Algorithms 5 /29 Two-Stage Clustering data stream online micro-clusters offline Reclustering macro-clusters [Aggarwal et al. 23]

7 Distance-Based Grid-Based BIRCH [Zhang et al. 1996] STREAM [O Callaghan et al. 22] CluStream [Aggarwal et al. 23] DenStream [Cao et al. 26] ClusTree [Kranen et al. 29] streamkm++ [Ackermann et al. 212] BICO [Fichtenberger et al. 213] DBSTREAM [Hahsler et al. 216] D-Stream [Chen et al. 27] D-Stream + attraction [Tu et al. 29] Challenges Many parameters No clear guidelines how to set them

8 MÜNSTER An Empirical Comparison of Stream Clustering Algorithms 7 /29 DenStream #points linear sum per dimension squared sum per dimension Clustering Feature CF = ( n LS SS ) describes micro-cluster Decay the weight of clusters over time: ω(t) = 2 λt Core clusters can decay to outlier clusters attempt 3 attempt 1 attempt 2 outlier micro-cluster core micro-cluster

9 MÜNSTER An Empirical Comparison of Stream Clustering Algorithms 8 /29 D-Stream Maintains density of grid cells Reclustering: combine neighbouring dense cells Extension using attraction: only combine cells that share points at the border

10 MÜNSTER An Empirical Comparison of Stream Clustering Algorithms 9 /29 DBSTREAM Assign points based on distance threshold Move clusters towards new points (competitive learning) Maintains shared density between micro-clusters Reclustering: Merge clusters with high shared density

11 MÜNSTER An Empirical Comparison of Stream Clustering Algorithms 1 /29 Remaining Algorithms BIRCH: maintain Clustering Features in a tree structure STREAM: repeatedly apply k-median approximation on chunks of data CluStream: store latest snapshots for various time intervals to extract portion of entire stream streamkm++: divisive clustering approach using medoids BICO: similar tree structure as BIRCH but with medoids similar to streamkm++ Research Goal Fair evaluation and comparison of ten popular stream clustering algorithms

12 Outline MÜNSTER An Empirical Comparison of Stream Clustering Algorithms 11 /29 1. Stream Clustering 2. Setup 3. Results 4. Conclusion

13 W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT M ÜNSTER An Empirical Comparison of Stream Clustering Algorithms 12 /29 Data sets Description square1 spiral complex9 t8.8k KDD Cup'99 Sensor Covertype Spherical Connected Various shapes Shapes + Noise Network Intrusion Environment sensors Forest type Stream Clustering 5 Data set Matthias Carnein Observations Setup 24 6 Dimension Cluster Noise % % % 4% Results 5 1 Conclusion

14 Evaluation MÜNSTER An Empirical Comparison of Stream Clustering Algorithms 13 /29 Static Data: Evaluate clusters at the end of the stream Real world Data: Prequential evaluation [Cao et al. 26] 1. Evaluate 2. Update 3. Evaluate 4. Update 5. Evaluate 6. Update External Evaluation Micro-clusters: Purity Proportion of points that belong to the majority class in a cluster Macro-clusters: Adjusted Rand Index (ARI) Similarity between true groups and generated clusters

15 Parameter-Tuning MÜNSTER An Empirical Comparison of Stream Clustering Algorithms 14 /29 Choice of parameters is important for a fair comparison Automatic parameter-configuration using Iterated Racing (irace) [López-Ibáñez et al. 216] Search for configurations that optimize the ARI of the macro-clustering Upper limit on the number of micro-clusters Static data: 5% of the number of observations in the stream Real world data: fixed upper limit of 5

16 Outline MÜNSTER An Empirical Comparison of Stream Clustering Algorithms 15 /29 1. Stream Clustering 2. Setup 3. Results 4. Conclusion

17 Quality square1 BIRCH STREAM CluStream DenStream D-Stream D-Stream + attr. ClusTree StreamKM++ BICO DBSTREAM Purity 1.5 ARI 1.5 Micro-clusters: generally high purity inconsistency for DenStream Macro-clusters: good results but lowest ARI for DBSTREAM Example: Clustering result of DBSTREAM

18 Quality spiral BIRCH STREAM CluStream DenStream D-Stream D-Stream + attr. ClusTree StreamKM++ BICO DBSTREAM 1 1 Purity.5 ARI.5 Micro-clusters: high purity for CluStream DBSTREAM BICO streamkm++ Macro-clusters: only decent results for DBSTREAM Example: clustering result of STREAM

19 Quality Sensor BIRCH STREAM CluStream DenStream D-Stream D-Stream + attr. ClusTree StreamKM++ BICO DBSTREAM 1 Purity ARI Time (in 1) Time (in 1)

20 Runtime Various programming languages (R C C++ Java) Different interfaces different authors BIRCH STREAM CluStream DenStream D-Stream D-Stream + attr. ClusTree StreamKM++ BICO DBSTREAM Runtime (in s) square1 spiral complex9 t8.8k KDD Cup'99 Sensor Covertype

21 Outline MÜNSTER An Empirical Comparison of Stream Clustering Algorithms 2 /29 1. Stream Clustering 2. Setup 3. Results 4. Conclusion

22 Conclusion MÜNSTER An Empirical Comparison of Stream Clustering Algorithms 21 /29 Empirical comparison of 1 stream clustering algorithms DBSTREAM performed best but dependent on insertion order D-Stream good results but requires more micro-cluster DenStream poor results and very inconsistent Future Work More algorithms and data sets and runtime experiments Alternative configuration methods Use results to design an improved algorithm Application to Multi-Channel Customer Relationship Management

23 MÜNSTER An Empirical Comparison of Stream Clustering Algorithms 22 /29 Questions??????

24 References I MÜNSTER An Empirical Comparison of Stream Clustering Algorithms 23 /29 Ackermann Marcel R. Marcus Märtens Christoph Raupach Kamil Swierkot Christiane Lammersen and Christian Sohler (212). StreamKM++: A Clustering Algorithm for Data Streams. In: J. Exp. Algorithmics : :2.3. (Visited on 16/2/217). Aggarwal Charu C. Jiawei Han Jianyong Wang and Philip S. Yu (23). A Framework for Clustering Evolving Data Streams. In: Proceedings of the 29th International Conference on Very Large Data Bases. Vol. 29. VLDB 3. Berlin Germany: VLDB Endowment pp Cao Feng Martin Ester Weining Qian and Aoying Zhou (26). Density-based clustering over an evolving data stream with noise. In: Conference on Data Mining (SIAM 6) pp DOI: 1.119/CCCM Chen Yixin and Li Tu (27). Density-based Clustering for Real-time Stream Data. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD 7. San Jose California USA: ACM pp DOI: /

25 References II MÜNSTER An Empirical Comparison of Stream Clustering Algorithms 24 /29 Fichtenberger Hendrik Marc Gillé Melanie Schmidt Chris Schwiegelshohn and Christian Sohler (213). BICO: BIRCH Meets Coresets for k-means Clustering. In: Algorithms - ESA st Annual European Symposium Sophia Antipolis France September Proceedings. URL: (visited on 2/3/217) pp DOI: 1.17/ _41. Guha S. A. Meyerson N. Mishra R. Motwani and L. O Callaghan (23). Clustering data streams: Theory and practice. In: IEEE Transactions on Knowledge and Data Engineering 15.3 pp DOI: 1.119/TKDE Hahsler Michael and Matthew Bolaños (216). Clustering Data Streams Based on Shared Density between Micro-Clusters. In: IEEE Transactions on Knowledge and Data Engineering 28.6 pp DOI: 1.119/TKDE Kranen Philipp Ira Assent Corinna Baldauf and Thomas Seidl (29). Self-Adaptive Anytime Stream Clustering. In: Ninth IEEE International Conference on Data Mining (ICDM 9) pp DOI: 1.119/ICDM

26 References III MÜNSTER An Empirical Comparison of Stream Clustering Algorithms 25 /29 López-Ibáñez Manuel Jérémie Dubois-Lacoste Leslie Pérez Cáceres Thomas Stützle and Mauro Birattari (216). The irace package: Iterated Racing for Automatic Algorithm Configuration. In: Operations Research Perspectives. DOI: 1.116/j.orp O Callaghan L. N. Mishra A. Meyerson S. Guha and R. Motwani (22). Streaming-data algorithms for high-quality clustering. In: Proceedings of the 18th International Conference on Data Engineering (ICDE). URL: (visited on 2/3/217) pp DOI: 1.119/ICDE Tu Li and Yixin Chen (29). Stream Data Clustering Based on Grid Density and Attraction. In: ACM Trans. Knowl. Discov. Data :1 12:27. DOI: / Zhang Tian Raghu Ramakrishnan and Miron Livny (1996). BIRCH: An Efficient Data Clustering Method for Very Large Databases. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data. SIGMOD 96. New York NY USA: ACM pp DOI: /

Data Stream Clustering Using Micro Clusters

Data Stream Clustering Using Micro Clusters Data Stream Clustering Using Micro Clusters Ms. Jyoti.S.Pawar 1, Prof. N. M.Shahane. 2 1 PG student, Department of Computer Engineering K. K. W. I. E. E. R., Nashik Maharashtra, India 2 Assistant Professor

More information

Clustering from Data Streams

Clustering from Data Streams Clustering from Data Streams João Gama LIAAD-INESC Porto, University of Porto, Portugal jgama@fep.up.pt 1 Introduction 2 Clustering Micro Clustering 3 Clustering Time Series Growing the Structure Adapting

More information

K-means based data stream clustering algorithm extended with no. of cluster estimation method

K-means based data stream clustering algorithm extended with no. of cluster estimation method K-means based data stream clustering algorithm extended with no. of cluster estimation method Makadia Dipti 1, Prof. Tejal Patel 2 1 Information and Technology Department, G.H.Patel Engineering College,

More information

arxiv: v1 [cs.lg] 3 Oct 2018

arxiv: v1 [cs.lg] 3 Oct 2018 Real-time Clustering Algorithm Based on Predefined Level-of-Similarity Real-time Clustering Algorithm Based on Predefined Level-of-Similarity arxiv:1810.01878v1 [cs.lg] 3 Oct 2018 Rabindra Lamsal Shubham

More information

Clustering Large Datasets using Data Stream Clustering Techniques

Clustering Large Datasets using Data Stream Clustering Techniques Clustering Large Datasets using Data Stream Clustering Techniques Matthew Bolaños 1, John Forrest 2, and Michael Hahsler 1 1 Southern Methodist University, Dallas, Texas, USA. 2 Microsoft, Redmond, Washington,

More information

Towards New Heterogeneous Data Stream Clustering based on Density

Towards New Heterogeneous Data Stream Clustering based on Density , pp.30-35 http://dx.doi.org/10.14257/astl.2015.83.07 Towards New Heterogeneous Data Stream Clustering based on Density Chen Jin-yin, He Hui-hao Zhejiang University of Technology, Hangzhou,310000 chenjinyin@zjut.edu.cn

More information

Clustering Algorithms for Data Stream

Clustering Algorithms for Data Stream Clustering Algorithms for Data Stream Karishma Nadhe 1, Prof. P. M. Chawan 2 1Student, Dept of CS & IT, VJTI Mumbai, Maharashtra, India 2Professor, Dept of CS & IT, VJTI Mumbai, Maharashtra, India Abstract:

More information

Survey Paper on Clustering Data Streams Based on Shared Density between Micro-Clusters

Survey Paper on Clustering Data Streams Based on Shared Density between Micro-Clusters Survey Paper on Clustering Data Streams Based on Shared Density between Micro-Clusters Dure Supriya Suresh ; Prof. Wadne Vinod ME(Student), ICOER,Wagholi, Pune,Maharastra,India Assit.Professor, ICOER,Wagholi,

More information

A Framework for Clustering Massive Text and Categorical Data Streams

A Framework for Clustering Massive Text and Categorical Data Streams A Framework for Clustering Massive Text and Categorical Data Streams Charu C. Aggarwal IBM T. J. Watson Research Center charu@us.ibm.com Philip S. Yu IBM T. J.Watson Research Center psyu@us.ibm.com Abstract

More information

Knowledge Discovery in Databases II. Lecture 4: Stream clustering

Knowledge Discovery in Databases II. Lecture 4: Stream clustering Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Lecture notes Knowledge Discovery in Databases II Winter Semester 2012/2013 Lecture 4: Stream

More information

Anytime Concurrent Clustering of Multiple Streams with an Indexing Tree

Anytime Concurrent Clustering of Multiple Streams with an Indexing Tree JMLR: Workshop and Conference Proceedings 41:19 32, 2015 BIGMINE 2015 Anytime Concurrent Clustering of Multiple Streams with an Indexing Tree Zhinoos Razavi Hesabi zhinoos.razavi@rmit.edu.au Timos Sellis

More information

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06

More information

E-Stream: Evolution-Based Technique for Stream Clustering

E-Stream: Evolution-Based Technique for Stream Clustering E-Stream: Evolution-Based Technique for Stream Clustering Komkrit Udommanetanakit, Thanawin Rakthanmanon, and Kitsana Waiyamai Department of Computer Engineering, Faculty of Engineering Kasetsart University,

More information

DOI:: /ijarcsse/V7I1/0111

DOI:: /ijarcsse/V7I1/0111 Volume 7, Issue 1, January 2017 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey on

More information

ISSN Vol.05,Issue.08, August-2017, Pages:

ISSN Vol.05,Issue.08, August-2017, Pages: WWW.IJITECH.ORG ISSN 2321-8665 Vol.05,Issue.08, August-2017, Pages:1480-1488 Efficient Online Evolution of Clustering DB Streams Between Micro-Clusters KS. NAZIA 1, C. NAGESH 2 1 PG Scholar, Dept of CSE,

More information

ABSTRACT I. INTRODUCTION. Avula Chitty Department of CSE, Assistant Professor, Sri Indu College of Engineering And Technology, Hyderabad,

ABSTRACT I. INTRODUCTION. Avula Chitty Department of CSE, Assistant Professor, Sri Indu College of Engineering And Technology, Hyderabad, International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 3 ISSN : 2456-3307 Efficiency of Clustering Data Streams Based on

More information

Robust Clustering for Tracking Noisy Evolving Data Streams

Robust Clustering for Tracking Noisy Evolving Data Streams Robust Clustering for Tracking Noisy Evolving Data Streams Olfa Nasraoui Carlos Rojas Abstract We present a new approach for tracking evolving and noisy data streams by estimating clusters based on density,

More information

Stream: A Framework For Data Stream Modeling in R

Stream: A Framework For Data Stream Modeling in R Southern Methodist University SMU Scholar CSE Computer Science and Engineering (CSE) Spring 2011 Stream: A Framework For Data Stream Modeling in R John Forrest Follow this and additional works at: http://scholar.smu.edu/engineering_compsci_research

More information

Coresets for k-means clustering

Coresets for k-means clustering Melanie Schmidt, TU Dortmund Resource-aware Machine Learning - International Summer School 02.10.2014 Coresets and k-means Coresets and k-means 75 KB Coresets and k-means 75 KB 55 KB Coresets and k-means

More information

Density-Based Clustering over an Evolving Data Stream with Noise

Density-Based Clustering over an Evolving Data Stream with Noise Density-Based Clustering over an Evolving Data Stream with Noise Feng Cao Martin Ester Weining Qian Aoying Zhou Abstract Clustering is an important task in mining evolving data streams. Beside the limited

More information

Clustering in the Presence of Concept Drift

Clustering in the Presence of Concept Drift Clustering in the Presence of Concept Drift Richard Hugh Moulton 1, Herna L. Viktor 1, Nathalie Japkowicz 2, and João Gama 3 1 School of Electrical Engineering and Computer Science, University of Ottawa,

More information

An Efficient Density Based Incremental Clustering Algorithm in Data Warehousing Environment

An Efficient Density Based Incremental Clustering Algorithm in Data Warehousing Environment An Efficient Density Based Incremental Clustering Algorithm in Data Warehousing Environment Navneet Goyal, Poonam Goyal, K Venkatramaiah, Deepak P C, and Sanoop P S Department of Computer Science & Information

More information

Time-Series Clustering for Data Analysis in Smart Grid

Time-Series Clustering for Data Analysis in Smart Grid Time-Series Clustering for Data Analysis in Smart Grid Akanksha Maurya* Alper Sinan Akyurek* Baris Aksanli** Tajana Simunic Rosing** *Electrical and Computer Engineering Department, University of California

More information

Research on Parallelized Stream Data Micro Clustering Algorithm Ke Ma 1, Lingjuan Li 1, Yimu Ji 1, Shengmei Luo 1, Tao Wen 2

Research on Parallelized Stream Data Micro Clustering Algorithm Ke Ma 1, Lingjuan Li 1, Yimu Ji 1, Shengmei Luo 1, Tao Wen 2 International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015) Research on Parallelized Stream Data Micro Clustering Algorithm Ke Ma 1, Lingjuan Li 1, Yimu Ji 1,

More information

City, University of London Institutional Repository

City, University of London Institutional Repository City Research Online City, University of London Institutional Repository Citation: Andrienko, N., Andrienko, G., Fuchs, G., Rinzivillo, S. & Betz, H-D. (2015). Real Time Detection and Tracking of Spatial

More information

Knowledge Discovery in Databases II Summer Semester 2018

Knowledge Discovery in Databases II Summer Semester 2018 Ludwig Maximilians Universität München Institut für Informatik Lehr und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases II Summer Semester 2018 Lecture 3: Data Streams Lectures

More information

Clustering part II 1

Clustering part II 1 Clustering part II 1 Clustering What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods 2 Partitioning Algorithms:

More information

Monitoring Data Fusion for Intrusion Tolerance. Atul Bohara P.I.: W. H. Sanders ACC Seminar, Oct 7, 2015

Monitoring Data Fusion for Intrusion Tolerance. Atul Bohara P.I.: W. H. Sanders ACC Seminar, Oct 7, 2015 Monitoring Data Fusion for Intrusion Tolerance Atul Bohara P.I.: W. H. Sanders ACC Seminar, Oct 7, 2015 1 Overview Protect a real-world networked system against malicious activities E.g., Enterprise network

More information

Data Stream Clustering: A Survey

Data Stream Clustering: A Survey Data Stream Clustering: A Survey JONATHAN A. SILVA, ELAINE R. FARIA, RODRIGO C. BARROS, EDUARDO R. HRUSCHKA, and ANDRE C. P. L. F. DE CARVALHO University of São Paulo and JOÃO P. GAMA University of Porto

More information

Computer Department, Savitribai Phule Pune University, Nashik, Maharashtra, India

Computer Department, Savitribai Phule Pune University, Nashik, Maharashtra, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 5 ISSN : 2456-3307 A Review on Various Outlier Detection Techniques

More information

CHAPTER 7. PAPER 3: EFFICIENT HIERARCHICAL CLUSTERING OF LARGE DATA SETS USING P-TREES

CHAPTER 7. PAPER 3: EFFICIENT HIERARCHICAL CLUSTERING OF LARGE DATA SETS USING P-TREES CHAPTER 7. PAPER 3: EFFICIENT HIERARCHICAL CLUSTERING OF LARGE DATA SETS USING P-TREES 7.1. Abstract Hierarchical clustering methods have attracted much attention by giving the user a maximum amount of

More information

Online Temporal-Spatial Analysis for Detection of Critical Events in Cyber-Physical Systems

Online Temporal-Spatial Analysis for Detection of Critical Events in Cyber-Physical Systems Online Temporal-Spatial Analysis for Detection of Critical Events in Cyber-Physical Systems Zhang Fu, Magnus Almgren, Olaf Landsiedel, Marina Papatriantafilou Chalmers University of Technology. Email:{zhafu,magnus.almgren,olafl,ptrianta}@chalmers.se

More information

Growing Hierarchical Trees for Data Stream Clustering and Visualization

Growing Hierarchical Trees for Data Stream Clustering and Visualization Growing Hierarchical Trees for Data Stream Clustering and Visualization Nhat-Quang Doan ICT Department University of Science and Technology of Hanoi 18 Hoang Quoc Viet Str, Cau Giay Hanoi, Vietnam Email:

More information

An Algorithm for Streaming Clustering

An Algorithm for Streaming Clustering IT 11 011 Examensarbete 30 hp Mars 2011 An Algorithm for Streaming Clustering Jiaowei Tang Institutionen för informationsteknologi Department of Information Technology Abstract An Algorithm for Streaming

More information

A New Online Clustering Approach for Data in Arbitrary Shaped Clusters

A New Online Clustering Approach for Data in Arbitrary Shaped Clusters A New Online Clustering Approach for Data in Arbitrary Shaped Clusters Richard Hyde, Plamen Angelov Data Science Group, School of Computing and Communications Lancaster University Lancaster, LA1 4WA, UK

More information

Temporal Structure Learning for Clustering Massive Data Streams in Real-Time

Temporal Structure Learning for Clustering Massive Data Streams in Real-Time Temporal Structure Learning for Clustering Massive Data Streams in Real-Time Michael Hahsler Margaret H. Dunham Abstract This paper describes one of the first attempts to model the temporal structure of

More information

Data stream clustering by divide and conquer approach based on vector model

Data stream clustering by divide and conquer approach based on vector model DOI 10.1186/s40537-015-0036-x RESEARCH Open Access Data stream clustering by divide and conquer approach based on vector model Madjid Khalilian 1*, Norwati Mustapha 2 and Nasir Sulaiman 2 *Correspondence:

More information

Data mining techniques for data streams mining

Data mining techniques for data streams mining REVIEW OF COMPUTER ENGINEERING STUDIES ISSN: 2369-0755 (Print), 2369-0763 (Online) Vol. 4, No. 1, March, 2017, pp. 31-35 DOI: 10.18280/rces.040106 Licensed under CC BY-NC 4.0 A publication of IIETA http://www.iieta.org/journals/rces

More information

TDT- An Efficient Clustering Algorithm for Large Database Ms. Kritika Maheshwari, Mr. M.Rajsekaran

TDT- An Efficient Clustering Algorithm for Large Database Ms. Kritika Maheshwari, Mr. M.Rajsekaran TDT- An Efficient Clustering Algorithm for Large Database Ms. Kritika Maheshwari, Mr. M.Rajsekaran M-Tech Scholar, Department of Computer Science and Engineering, SRM University, India Assistant Professor,

More information

A single pass algorithm for clustering evolving data streams based on swarm intelligence

A single pass algorithm for clustering evolving data streams based on swarm intelligence Data Min Knowl Disc DOI 10.1007/s10618-011-0242-x A single pass algorithm for clustering evolving data streams based on swarm intelligence Agostino Forestiero Clara Pizzuti Giandomenico Spezzano Received:

More information

Robust Clustering of Data Streams using Incremental Optimization

Robust Clustering of Data Streams using Incremental Optimization Robust Clustering of Data Streams using Incremental Optimization Basheer Hawwash and Olfa Nasraoui Knowledge Discovery and Web Mining Lab Computer Engineering and Computer Science Department University

More information

Dynamic Stream Clustering Using Ants

Dynamic Stream Clustering Using Ants Dynamic Stream Clustering Using Ants Conor Fahy and Shengxiang Yang Abstract Data stream mining is the process of extracting knowledge from continuous sequences of data. It differs from conventional data

More information

A Framework for Clustering Uncertain Data Streams

A Framework for Clustering Uncertain Data Streams A Framework for Clustering Uncertain Data Streams Charu C. Aggarwal, Philip S. Yu IBM T. J. Watson Research Center 19 Skyline Drive, Hawthorne, NY 10532, USA { charu, psyu }@us.ibm.com Abstract In recent

More information

PAM algorithm. Types of Data in Cluster Analysis. A Categorization of Major Clustering Methods. Partitioning i Methods. Hierarchical Methods

PAM algorithm. Types of Data in Cluster Analysis. A Categorization of Major Clustering Methods. Partitioning i Methods. Hierarchical Methods Whatis Cluster Analysis? Clustering Types of Data in Cluster Analysis Clustering part II A Categorization of Major Clustering Methods Partitioning i Methods Hierarchical Methods Partitioning i i Algorithms:

More information

Dynamic Clustering Of High Speed Data Streams

Dynamic Clustering Of High Speed Data Streams www.ijcsi.org 224 Dynamic Clustering Of High Speed Data Streams J. Chandrika 1, Dr. K.R. Ananda Kumar 2 1 Department of CS & E, M C E,Hassan 573 201 Karnataka, India 2 Department of CS & E, SJBIT, Bangalore

More information

IMPACT OF ONLINE STREAM CLUSTERING IN BANDWIDTH- CONSTRAINED MOBILE VIDEO ENVIRONMENT

IMPACT OF ONLINE STREAM CLUSTERING IN BANDWIDTH- CONSTRAINED MOBILE VIDEO ENVIRONMENT IMPACT OF ONLINE STREAM CLUSTERING IN BANDWIDTH- CONSTRAINED MOBILE VIDEO ENVIRONMENT P. M. Arun Kumar 1 and S. Chandramathi 2 1 Department of IT, Hindusthan College of Engineering and Technology, Coimbatore,

More information

Scalable Online-Offline Stream Clustering in Apache Spark

Scalable Online-Offline Stream Clustering in Apache Spark Scalable Online-Offline Stream Clustering in Apache Spark Omar Backhoff Technische Universität München (TUM) Munich, Germany omarbackhoff@gmail.com Eirini Ntoutsi Leibniz Universität Hannover Hanover,

More information

streammoa: Interface to Algorithms from MOA for stream

streammoa: Interface to Algorithms from MOA for stream streammoa: Interface to Algorithms from MOA for stream Matthew Bolaños Southern Methodist Universit John Forrest Microsoft Michael Hahsler Southern Methodist Universit Abstract This packages provides an

More information

Data Stream Clustering Algorithms: A Review

Data Stream Clustering Algorithms: A Review Int. J. Advance Soft Compu. Appl, Vol. 7, No. 3, November 2015 ISSN 2074-8523 Data Stream Clustering Algorithms: A Review Maryam Mousavi 1, Azuraliza Abu Bakar 1, and Mohammadmahdi Vakilian 1 1 Centre

More information

A Framework for Clustering Evolving Data Streams

A Framework for Clustering Evolving Data Streams VLDB 03 Paper ID: 312 A Framework for Clustering Evolving Data Streams Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu IBM T. J. Watson Research Center & UIUC charu@us.ibm.com, hanj@cs.uiuc.edu,

More information

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE Sinu T S 1, Mr.Joseph George 1,2 Computer Science and Engineering, Adi Shankara Institute of Engineering

More information

Mining Frequent Itemsets for data streams over Weighted Sliding Windows

Mining Frequent Itemsets for data streams over Weighted Sliding Windows Mining Frequent Itemsets for data streams over Weighted Sliding Windows Pauray S.M. Tsai Yao-Ming Chen Department of Computer Science and Information Engineering Minghsin University of Science and Technology

More information

High-Dimensional Incremental Divisive Clustering under Population Drift

High-Dimensional Incremental Divisive Clustering under Population Drift High-Dimensional Incremental Divisive Clustering under Population Drift Nicos Pavlidis Inference for Change-Point and Related Processes joint work with David Hofmeyr and Idris Eckley Clustering Clustering:

More information

Hierarchical Document Clustering

Hierarchical Document Clustering Hierarchical Document Clustering Benjamin C. M. Fung, Ke Wang, and Martin Ester, Simon Fraser University, Canada INTRODUCTION Document clustering is an automatic grouping of text documents into clusters

More information

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms

More information

Clustering Of Ecg Using D-Stream Algorithm

Clustering Of Ecg Using D-Stream Algorithm Clustering Of Ecg Using D-Stream Algorithm Vaishali Yeole Jyoti Kadam Department of computer Engg. Department of computer Engg. K.C college of Engg, K.C college of Engg Thane (E). Thane (E). Abstract The

More information

Package subspace. October 12, 2015

Package subspace. October 12, 2015 Title Interface to OpenSubspace Version 1.0.4 Date 2015-09-30 Package subspace October 12, 2015 An interface to 'OpenSubspace', an open source framework for evaluation and exploration of subspace clustering

More information

Tetkik: Akan Veri Kümeleme Algoritmalarını Çalıştırma ve Karşılaştırma

Tetkik: Akan Veri Kümeleme Algoritmalarını Çalıştırma ve Karşılaştırma Tetkik: Akan Veri Kümeleme Algoritmalarını Çalıştırma ve Karşılaştırma Rowanda D. Ahmed 1, Gökhan Dalkılıç 2, Murat Erten 3 Izmir Institute of Technology 1,3, Dokuz Eylül University 2 Computer Engineering

More information

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017)

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017) 1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should

More information

Distance-based Outlier Detection: Consolidation and Renewed Bearing

Distance-based Outlier Detection: Consolidation and Renewed Bearing Distance-based Outlier Detection: Consolidation and Renewed Bearing Gustavo. H. Orair, Carlos H. C. Teixeira, Wagner Meira Jr., Ye Wang, Srinivasan Parthasarathy September 15, 2010 Table of contents Introduction

More information

Volume 2, Issue 2, February 2014 International Journal of Advance Research in Computer Science and Management Studies

Volume 2, Issue 2, February 2014 International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 2, February 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Paper / Case Study Available online at: www.ijarcsms.com Mining

More information

Analysis and Extensions of Popular Clustering Algorithms

Analysis and Extensions of Popular Clustering Algorithms Analysis and Extensions of Popular Clustering Algorithms Renáta Iváncsy, Attila Babos, Csaba Legány Department of Automation and Applied Informatics and HAS-BUTE Control Research Group Budapest University

More information

C-NBC: Neighborhood-Based Clustering with Constraints

C-NBC: Neighborhood-Based Clustering with Constraints C-NBC: Neighborhood-Based Clustering with Constraints Piotr Lasek Chair of Computer Science, University of Rzeszów ul. Prof. St. Pigonia 1, 35-310 Rzeszów, Poland lasek@ur.edu.pl Abstract. Clustering is

More information

A Density-Based Clustering Structure Mining Algorithm for Data Streams

A Density-Based Clustering Structure Mining Algorithm for Data Streams A ensity-based Clustering Structure Mining Algorithm for ata Streams Huan Wang 1, Yanwei Yu 2, Qin Wang 3, Yadong Wan 4,* School of Computer and Communication Engineering, University of Science and Technology

More information

Research Paper Available online at: Efficient Clustering Algorithm for Large Data Set

Research Paper Available online at:   Efficient Clustering Algorithm for Large Data Set Volume 2, Issue, January 202 ISSN: 2277 28X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: Efficient Clustering Algorithm for

More information

A Data Clustering Using Modified Principal Component Analysis with Genetic Algorithm

A Data Clustering Using Modified Principal Component Analysis with Genetic Algorithm ISSN:0975-9646 Sohel Ahamd Khan et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 8 (6), 2017, 590-594 A Data Clustering Using Modified Principal Component

More information

Unsupervised learning on Color Images

Unsupervised learning on Color Images Unsupervised learning on Color Images Sindhuja Vakkalagadda 1, Prasanthi Dhavala 2 1 Computer Science and Systems Engineering, Andhra University, AP, India 2 Computer Science and Systems Engineering, Andhra

More information

A. Papadopoulos, G. Pallis, M. D. Dikaiakos. Identifying Clusters with Attribute Homogeneity and Similar Connectivity in Information Networks

A. Papadopoulos, G. Pallis, M. D. Dikaiakos. Identifying Clusters with Attribute Homogeneity and Similar Connectivity in Information Networks A. Papadopoulos, G. Pallis, M. D. Dikaiakos Identifying Clusters with Attribute Homogeneity and Similar Connectivity in Information Networks IEEE/WIC/ACM International Conference on Web Intelligence Nov.

More information

Online K-Means. Edo Liberty, Maxim Sviridenko, Ram Sriharsha

Online K-Means. Edo Liberty, Maxim Sviridenko, Ram Sriharsha Online K-Means Edo Liberty, Maxim Sviridenko, Ram Sriharsha K-Means definition We are given a set of points v 1,, v t,, v n in Euclidean space K-Means definition For each point we assign a cluster identifier

More information

Clustering Part 4 DBSCAN

Clustering Part 4 DBSCAN Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

COMP 465: Data Mining Still More on Clustering

COMP 465: Data Mining Still More on Clustering 3/4/015 Exercise COMP 465: Data Mining Still More on Clustering Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Describe each of the following

More information

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AN EFFICIENT APPROACH FOR TEXT MINING USING SIDE INFORMATION Kiran V. Gaidhane*, Prof. L. H. Patil, Prof. C. U. Chouhan DOI: 10.5281/zenodo.58632

More information

CS570: Introduction to Data Mining

CS570: Introduction to Data Mining CS570: Introduction to Data Mining Cluster Analysis Reading: Chapter 10.4, 10.6, 11.1.3 Han, Chapter 8.4,8.5,9.2.2, 9.3 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber &

More information

remm: Extensible Markov Model for Data Stream Clustering in R

remm: Extensible Markov Model for Data Stream Clustering in R remm: Extensible Markov Model for Data Stream Clustering in R Michael Hahsler Southern Methodist University Margaret H. Dunham Southern Methodist University Abstract Clustering streams of continuously

More information

A Soft Clustering Algorithm Based on k-median

A Soft Clustering Algorithm Based on k-median 1 A Soft Clustering Algorithm Based on k-median Ertu grul Kartal Tabak Computer Engineering Dept. Bilkent University Ankara, Turkey 06550 Email: tabak@cs.bilkent.edu.tr Abstract The k-median problem is

More information

Role of big data in classification and novel class detection in data streams

Role of big data in classification and novel class detection in data streams DOI 10.1186/s40537-016-0040-9 METHODOLOGY Open Access Role of big data in classification and novel class detection in data streams M. B. Chandak * *Correspondence: hodcs@rknec.edu; chandakmb@gmail.com

More information

Clustering Large Dynamic Datasets Using Exemplar Points

Clustering Large Dynamic Datasets Using Exemplar Points Clustering Large Dynamic Datasets Using Exemplar Points William Sia, Mihai M. Lazarescu Department of Computer Science, Curtin University, GPO Box U1987, Perth 61, W.A. Email: {siaw, lazaresc}@cs.curtin.edu.au

More information

Evolution-Based Clustering of High Dimensional Data Streams with Dimension Projection

Evolution-Based Clustering of High Dimensional Data Streams with Dimension Projection Evolution-Based Clustering of High Dimensional Data Streams with Dimension Projection Rattanapong Chairukwattana Department of Computer Engineering Kasetsart University Bangkok, Thailand Email: g521455024@ku.ac.th

More information

USC Real-time Pattern Isolation and Recognition Over Immersive Sensor Data Streams

USC Real-time Pattern Isolation and Recognition Over Immersive Sensor Data Streams Real-time Pattern Isolation and Recognition Over Immersive Sensor Data Streams Cyrus Shahabi and Donghui Yan Integrated Media Systems Center and Computer Science Department, University of Southern California

More information

DATA STREAMS: MODELS AND ALGORITHMS

DATA STREAMS: MODELS AND ALGORITHMS DATA STREAMS: MODELS AND ALGORITHMS DATA STREAMS: MODELS AND ALGORITHMS Edited by CHARU C. AGGARWAL IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 Kluwer Academic Publishers Boston/Dordrecht/London

More information

Comparison of spatial indexes

Comparison of spatial indexes Comparison of spatial indexes Nathalie Andrea Barbosa Roa To cite this version: Nathalie Andrea Barbosa Roa. Comparison of spatial indexes. [Research Report] Rapport LAAS n 16631,., 13p. HAL

More information

University of Florida CISE department Gator Engineering. Clustering Part 4

University of Florida CISE department Gator Engineering. Clustering Part 4 Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

MAIDS: Mining Alarming Incidents from Data Streams

MAIDS: Mining Alarming Incidents from Data Streams MAIDS: Mining Alarming Incidents from Data Streams (Demonstration Proposal) Y. Dora Cai David Clutter Greg Pape Jiawei Han Michael Welge Loretta Auvil Automated Learning Group, NCSA, University of Illinois

More information

Online Clustering with Experts

Online Clustering with Experts Journal of Machine Learning Research vol (2011) 1 18 Submitted September 1, 2011; Published publication date Online Clustering with Experts Anna Choromanska Department of Electrical Engineering, Columbia

More information

Evolution-Based Clustering Technique for Data Streams with Uncertainty

Evolution-Based Clustering Technique for Data Streams with Uncertainty Kasetsart J. (Nat. Sci.) 46 : 638-652 (2012) Evolution-Based Clustering Technique for Data Streams with Uncertainty Wicha Meesuksabai*, Thanapat Kangkachit and Kitsana Waiyamai ABSTRACT The evolution-based

More information

CLUSTERING data streams [1], [2], [3], [4] has become an

CLUSTERING data streams [1], [2], [3], [4] has become an IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING PREPRINT, ACCEPTED 1/17/2016 1 Clustering Data Streams Based on Shared Density Between Micro-Clusters Michael Hahsler, Member, IEEE, and Matthew Bolaños

More information

DBSCAN. Presented by: Garrett Poppe

DBSCAN. Presented by: Garrett Poppe DBSCAN Presented by: Garrett Poppe A density-based algorithm for discovering clusters in large spatial databases with noise by Martin Ester, Hans-peter Kriegel, Jörg S, Xiaowei Xu Slides adapted from resources

More information

Random Sampling over Data Streams for Sequential Pattern Mining

Random Sampling over Data Streams for Sequential Pattern Mining Random Sampling over Data Streams for Sequential Pattern Mining Chedy Raïssi LIRMM, EMA-LGI2P/Site EERIE 161 rue Ada 34392 Montpellier Cedex 5, France France raissi@lirmm.fr Pascal Poncelet EMA-LGI2P/Site

More information

Clustering in Data Mining

Clustering in Data Mining Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,

More information

Enhancement in K-Mean Clustering in Big Data

Enhancement in K-Mean Clustering in Big Data ISSN No. 97-597 Volume, No. 3, March April 17 International Journal of Advanced Research in Computer Science RESEARCH PAPER Available Online at www.ijarcs.info Enhancement in K-Mean Clustering in Big Data

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

CS570: Introduction to Data Mining

CS570: Introduction to Data Mining CS570: Introduction to Data Mining Scalable Clustering Methods: BIRCH and Others Reading: Chapter 10.3 Han, Chapter 9.5 Tan Cengiz Gunay, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei.

More information

Deakin Research Online

Deakin Research Online Deakin Research Online This is the published version: Saha, Budhaditya, Lazarescu, Mihai and Venkatesh, Svetha 27, Infrequent item mining in multiple data streams, in Data Mining Workshops, 27. ICDM Workshops

More information

MOVING GENERATOR: INFRASTRUCTURE FOR SPECIFYING AND SIMULATING EVOLVING STREAMS IN R

MOVING GENERATOR: INFRASTRUCTURE FOR SPECIFYING AND SIMULATING EVOLVING STREAMS IN R MOVING GENERATOR: INFRASTRUCTURE FOR SPECIFYING AND SIMULATING EVOLVING STREAMS IN R Approved by: Michael Hahsler, Ph. D. Margaret H. Dunham, Ph. D. Prof. Mark Fontenot MOVING GENERATOR: INFRASTRUCTURE

More information

A Review on Cluster Based Approach in Data Mining

A Review on Cluster Based Approach in Data Mining A Review on Cluster Based Approach in Data Mining M. Vijaya Maheswari PhD Research Scholar, Department of Computer Science Karpagam University Coimbatore, Tamilnadu,India Dr T. Christopher Assistant professor,

More information

Studying Community Dynamics with an Incremental Graph Mining Algorithm

Studying Community Dynamics with an Incremental Graph Mining Algorithm Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2008 Proceedings Americas Conference on Information Systems (AMCIS) 2008 Studying Community Dynamics with an Incremental Graph Mining

More information

COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS

COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS Mariam Rehman Lahore College for Women University Lahore, Pakistan mariam.rehman321@gmail.com Syed Atif Mehdi University of Management and Technology Lahore,

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

Event Streams Clustering Using Machine Learning Techniques

Event Streams Clustering Using Machine Learning Techniques Event Streams Clustering Using Machine Learning Techniques Hanen Bouali and Jalel Akaichi Institut Supérieur de Gestion, BESTMOD, Tunisia hanene.bouali@gmail.com, j.akaichi@gmail.com DOI: 10.20470/jsi.v6i4.224

More information

Dynamic Stream Allocation with the Discrepancy between Data Access Time and CPU Usage Time

Dynamic Stream Allocation with the Discrepancy between Data Access Time and CPU Usage Time Dynamic Stream Allocation with the Discrepancy between Data Access Time and CPU Usage Time Sayaka Akioka IT Research Organization Waseda University Tokyo, Japan Email: akioka@aoni.waseda.jp oichi Muraoka,

More information