Flock by Leader: A Novel Machine Learning Biologically Inspired Clustering Algorithm
|
|
- Mildred York
- 5 years ago
- Views:
Transcription
1 Flock by Leader: A Novel Machine Learning Biologically Inspired Clustering Algorithm Abdelghani Bellaachia 1, Anasse Bari 1 1 The George Washington University, School of Engineering and Applied Sciences Computer Science Department, nd Street NW, Washington DC 20052, USA {bell, bari}@gwu.edu Abstract. In the April 2010 Nature research report, it was announced that biological physicists only very recently discovered that there exists a leadership pattern in flocks of pigeon birds. The most authoritative birds of the pigeons flock take the lead, and followers follow the leaders directions. Pigeon leaders roles vary over time. Following this unprecedented discovery made by zoologists at the University of Oxford and Eötvös University, we extend in this paper the flocking model largely used in computer science. We define a new biologically inspired clustering algorithm entitled FlockbyLeader that detects hierarchical leaders, discovers their followers, and enables them to flock based on local proximity in an artificial virtual space to create clusters. We offer empirical evidence that the algorithm outperforms both the existing flocking algorithm and the K-means algorithm. We analyze the performance of the algorithm based on widely used datasets in the literature. Keywords: Swarm Intelligence, Information Retrieval, Machine Learning, Data Mining, Social Networks Analysis, Bioinformatics. 1 Introduction The long lasting mystery behind the phenomenon of flocking birds and the certainty of the existence of leaders orchestrating the flock logistics has finally been revealed. Biological physicists from Oxford University and Eötvös University s Department of Zoology found that flying pigeons flock following an organized chain of instructions. The recently published Nature research report [1] revealed that GPS loggers that were fitted into backpacks carried by flocks of pigeons allowed bird scientists to find hierarchies within flocks. It is now confirmed that there exist certain flock members that are authoritative over other birds. Dr. Biro of Oxford University's Department of Zoology claims[1], We found that, whilst most birds have a say in decisionmaking, a flexible system of 'rank' ensures that some birds are more likely to lead and others to boids. In computer science, flocking behavior is also known as Swarm Intelligence. Swarm Intelligence is the property of a system in which the collective behaviors of unsophisticated agents interacting locally with their environment cause 1 Corresponding author.
2 coherent functional global patterns to emerge [3]. Flocking modeling was initially introduced by Craig Reynolds [5]. Reynolds termed the generic simulated flocking birds as boids. The behavior of each bird (boid) is described by three simple rules: separation, cohesion, and alignment. Separation allows a boid to keep a certain distance from its nearest flock-mates, whereas cohesion permits a boid to join a local flock, and alignment enables a boid to move towards the average heading of local flock-mates. Examples of applications in which flocking modeling was successfully used would include but not be limited to robotics and computer animation. X. Cui et. al [2] were among the first researchers in the Swarm Intelligence literature who applied the flocking behavior into information retrieval. Also, very recently Bellaachia and Bari are the first in the Swarm Intelligence literature to introduce a flocking-based framework for community detection in dynamic social networks where a social network is modeled as an artificial life [7]. 2 Motivation The flocking model, also known as Craig Reynolds Model [5] introduced in 1985, lacks an important discovered component described earlier in the introduction: Leadership in Flocks Dynamics. The flocking clustering algorithm used in machine learning [2], [7], [13] is based on a pair-wise proximity in order to find similar data points. The recent discovery mentioned in the introduction shed light on considering mining leaders within the dataset, and thus, instead of a one-to-one proximity to discover similar data points, the algorithm performs a leader-to-many proximity through detecting local leaders and followers that will form subflocks. The existing algorithm in literature relies on a set of predefined heuristics that can significantly affect the clustering results [13]. Our proposed algorithm is motivated by the following open questions on the existing flocking algorithm: (1) Is it possible to minimize the number of moves of agents (birds) and yet maintain relatively good clustering results? (2) Is it possible to make the algorithm parameter free and make the maximum distance (d max ) a dynamic adaptive threshold? In this paper, we incorporate the recently discovered leadership dynamics in pigeon flocks into the existing flocking model, and we introduce a new biologically-inspired algorithm based on the extended model we present in this work. The rest of this paper is structured as follows: we present a formal definition of a Swarm Clustering Framework which serves as a clustering platform for several data mining applications that we have recently tackled in our research on but not limited to microarrays bioinformatics [8], social networks analysis [7], and information retrieval [2]. The fourth section introduces the Flock by Leader algorithm; the fifth section illustrates the experimental result; and the last section provides the conclusion. 3 Swarm Clustering Framework We define a multidisciplinary data mining framework that can be used in different clustering such as [8], [7] and [2]. We present the fundamental components that
3 constitute a Swarm Clustering Framework under which the FlockbyLeader algorithm will be defined in the next section. A swarm network can be modeled using algebraic graph theory. Formally, a graph consists of a set of vertices and a set of edges containing unordered pairs of distinct vertices. The graph has no self-loops and is undirected if. The scalar is referred as the order of graph, and is referred as the size of. Let be a set of heterogeneous data points to be clustered. We define a swarm clustering framework that consists of four main components: (0) Swarm Metric Space (1) Swarm Virtual Space, (2) Agents Position Graph, and (3) Feature Similarity Graph. Consider the following definitions: Definition 1 (Swarm Metric Space ). A Swarm Metric Space is a Metric Space that consists of a set and a distance function that satisfies three properties of a metric: Reflexivity, Symmetry, and Triangle inequality. An instance of a Swarm Metric Space is as follows: defined as the Euclidean distance in a d-dimensional space: (1) The Swarm metric space is taken to be instantiated and it is defined by the user depending on the application as Figure 1 shows. Definition 2 (Swarm Virtual Space ). The Swarm Virtual Space of a set is the Euclidean 2-dimensional space where n data points are being initially deployed at random. We refer to those points as agents. Every data point in is uniquely indexed by an agent. Agents in the virtual space move according to the flocking clustering algorithm that will be defined in the next sections. Let d min be the minimum distance that an agent must have to avoid collision with other agents in virtual space. The swarm virtual space serves as a simplified visualization of the clusters into a 2-dimensional space. Definition 3 (Agents Position Graph ). The agents position graph denoted as is a weighted graph that consists of the set of vertices and the set of edges. Let be the adjacency matrix of. is a matrix of size scalar such that: (2) where and are the position vectors of both Agents and j in the swarm virtual space. The scalar represents the distance between agents and agent in the swarm virtual space. The agents position graph
4 maintains the positions of the agents in the virtual space at every step of the algorithm and will be used to extract the topology of the clusters generated by the algorithm. Definition 4 (Feature Similarity Graph ). The feature similarity graph maintains the similarity between the entities involved in the clustering process. We define the feature similarity graph denoted as to be the weighted graph that consists of the set of vertices and the set of edges. Let be adjacency matrix of. is a matrix of size scalar such that: where and are the feature vectors of both. The scalar ρ represents the metric space that defines the similarity between node i and j. The feature similarity graph drives the movements of the agents in virtual space. We define a Swarm Clustering Framework denoted as to be the quadruple that consists of a metric space a position graph, a feature similarly graph, and a swarm virtual space. A flocking algorithm under the framework is a graph transformation process that transforms an ambiguous structure of heterogeneous entities into a partitioned structure. 4 Flock by Leader Clustering Algorithm We present in this section Flock by Leader clustering algorithm as an extension to the flocking algorithm known in [2] and [13]. In order to give a better understanding of the work presented in this paper, we invite the reader to a summary of the flocking model known as Reynolds model and presented in [2] and [13]. 4.1 Enhanced Flocking Model The enhanced Reynolds model we introduce in this section aims to (1) minimize the moves of the agents in virtual space; and (2) make the process parameters free in term of both the number of iterations and predefined maximum distance. The enhanced model uses the same flocking rules as Reynolds. However, instead of processing every boid and finding its neighbors, the enhanced model analyzes the data and discovers potential leaders. For every leader it finds its corresponding followers that will flock under their leader directions. In Reynolds model, the maximum distance is predefined and assigned to all boids. Boids within the maximum distance from a boid are considered its neighbors. In the enhanced model the maximum distance is relative to the leader. In Figure 1(right) leader (a) and leader (b) both have different distances that define their local neighbors. This observation is inspired from the pigeon leadership dynamics where the leaders distances are different from one to another. In the next sections we will explain how to find leaders and associate a maximum distance to them. In Figure 1 (right) a leader and its followers flock following the flocking rules (cohesion, alignment, and separations). The moves are
5 minimized as opposed to the original model shown in Figure 1(left): Instead of moving every boid to every other neighbor, we migrate the neighbors to their corresponding leaders. Fig. 1. Reynolds Model (left) and Enhanced Model (right) Flock by Leader Algorithm In every iteration, the algorithm starts by finding flock leaders. Then for every flock leader associated with a distance denoted as, the algorithm finds a leader s corresponding followers. The method that finds leaders and calculates their corresponding will be shown in the next section. Once a leader is identified, its corresponding followers agents in the virtual space will perform a flocking behavior and follow their leader. Then the followers are marked as visited in the feature graph and will be excluded in the flocking process on subsequent iterations. The leaders of every subflock are sent back to the virtual space as subflocks representatives. input:, the swarm clustering framework returns:, the new position graph While there are still nodes in that has are not been visited Do 1.1 LeadersList FindFlockLeaders ( 1.2 For leader Agent neighbor of in i
6 LeadersList within AgentFlock (Agent, L ) i i Agent.visited = true i Agent.leader = i End for Update ( End of do while Remark 1. An illustrative example of the aerobatics of agents in the virtual space following the FlockbyLeader algorithm. Every input data point in X (the set of data points to be clustered) is uniquely indexed by an agent in the virtual space. (a) Unvisited (blue) agents randomly deployed in the virtual space. In (b) six flock leaders (green) are detected and (c) their corresponding followers start flocking under their leaders direction in accordance with flocking rules (alignment, separation, cohesion). Figure (d) illustrates the beginning of another iteration of the algorithm: agent#1, agent#5 change roles into followers (yellow) (leaders in previous iteration (c)), agent#3, and agent#4 became outliers (gray) (leaders in (c)). In (e) the flocking process continues, agent#1 and its followers joined agent#2(leader) subflock, and agent#5 and its follower joined agent#6 subflock. Fig 2. Aerobatics of Agents It is important to note the following points as shown in Figure 2: In every iteration a node can be an unvisited node, a leader, a follower, or an outlier. A follower node is set as visited and its leader will serve as a representative in the next iteration. The
7 visited node will be excluded from the flocking process in the next iteration. A node that was a leader at iteration might stay a leader at iteration ; or might become a follower of a highly ranked leader; or might become an outlier as will be explained in the next section. The question arises how to distinguish between a leader, a follower, and an outlier. The following section illustrates our approach. 4.2 Mining Flock Leaders as Initial Clusters Centroids We rely on neighborhood and reverse neighborhood analysis to find potential flock leaders. The analysis is similar to the neighborhood and reverse-neighborhood approach that is mentioned in [6], [10] and [11]. The main difference is that the notion of neighborhood in the swarm framework is dynamic. During each iteration of the flocking process every agent s neighborhood changes depending on the flocking behavior of previous iterations. Let X be a dataset to be clustered. Let be a given distance function between objects and.let the set of nearest neighbors of at iteration is denoted by is a node in the feature graph, and is its corresponding Agent deployed in the virtual space. We adopt the definitions from [10] and apply them to the swarm framework as follows: Definition 5 (Dynamic k-neighborhood - DkNB). The k-neighborhood of at iteration denoted as ( is a set of data points that lie within a circle with as a center and as radius associated with leader at iteration t such that Definition 6 (Dynamic Reverse k-neighborhood DR-kNB). The reverse k- neighborhood of at iteration denoted as ( ) is the set of data points whose sets contains The ratio / has been widely used in the neighborhood based clustering literature [10] in order to determine which points are dense, even or spare. Several factors have been introduced, such as neighborhood density factor (NDF), and the structural role index (SRI) that was recently introduced in [10] and [11]. We define Dynamic Agent Role Factor denoted as of an Agent at Iteration to be: (3) (4) Intuitively a centroid of a cluster occupies the center position of a mass of associated data points. The larger is the more objects approaches. The initial centroid candidates should have the most reverse k nearest neighbors. Specifically, if then is a flockleader at iteration otherwise is a follower. If is close to zero then is an outlier. We extend the Agent role factor to introduce a local rank of an agent at iteration t to be:
8 (5) is the number of the neighbors at iteration and is the number of unvisited nodes at iteration. The rank is being used to sort the list of leaders. A leader of higher rank will be given priority to be processed first (finding its followers). 5 Experiments and Results 5.1 Datasets Two large datasets were used in our experiments. The first dataset consists of real news articles, details about the dataset can be found in [7]. The dataset consist of 100 news articles collected from cyberspace, which have been categorized by human experts into 12 clusters. We used KNIME tool [9] to preprocess the news articles and convert the dataset into keywords document matrix that Flock by Leader algorithm takes as input. The second dataset is the iris plant dataset. It contains 150 instances from three classes: Iris-virginica-class-1, Iris-versicolor-class-2, and Irissetosa-class-3. There exist fifty instances in each class. Each instance is described by four attributes. Details about the dataset can be found in [12]. 5.2 Evaluation Methodology We will use the F-measure as the quality measure. The F-Measure computes an average of the information retrieval precision and recall. Each cluster is considered as if it were the result of a query and each class as if it were the desired set of documents for a query. We then calculate the recall and precision of that cluster for each given class. The F-measure of cluster j (retrieved) and class i (known) is defined as follows. 5.3 Results Using the evaluation methods mentioned in the previous section, we compare the performance of FlockbyLeader algorithm against Flocking-based clustering algorithm mentioned in [3], and K-means. Table.9 illustrates the results of running FlockbyLeader algorithm on both the real news articles dataset and Iris dataset. We compare our results with results mentioned in [12] and [2] on the same dataset. Table 6 shows that FlockbyLeader has the largest F-measure values compared to both flocking Algorithm and K-means. The algorithm needed 4 iterations to converge. This is a significant improvement on the exiting flocking algorithm where the total number of iteration was 300 [2]. FlockbyLeader algorithm achieved 98.66% reduction in the number of iteration of the flocking process, a 7.5% increase in precision and recall (F-measure) over the existing flocking algorithm, and an average of 16.5% percent increase of precision and recall over K-means on both datasets. Figures 3 are (6)
9 snapshots of the virtual space on both datasets at initialization and after running the algorithm. Table 1. F-measure Evaluation Results. Dataset Algorithms Number F-measure of Clusters News Articles Flocking News Articles K-means 12( k=12) News Articles FlockbyLeader Iris Dataset K-means 3 (k=3) Iris Dataset FlockbyLeader Fig. 3. The process of Running Flock by Leader Algorithm on the IRIS Dataset 7 Conclusion In this paper we presented a simple, biologically-inspired clustering algorithm. FlockbyLeader incorporates a new discovery on Pigeons: Leadership Dynamics. Our algorithm is an enhancement of the existing flocking algorithm. The algorithm outperforms K-means on two large datasets. Our future work will include running the algorithm on different datasets.
10 8 References 1. Nagy, M., Z. Akos, D. Biro, and T. Vicsek. Hierarchical group dynamics in pigeon flocks. Nature 464, no (2010): X. Cui, J. Gao and T. E. Potok, A Flocking Based Algorithm for Document Clustering Analysis, Journal of System Architecture, June, 2006, ISSN: Vladimir G. Red'ko, Artificial Life Evolutionary Models, E. Bonabeau, M. Dorigo, and G. Theraulaz, Swarm intelligence: from natural to artificial systems, Oxford University Press, Craig W. Reynolds, Flocks, Herds, and Schools: A Distributed Behavioral Model, Computer Graphics, 21(4), July 1987, pp ] 6. S. Zhou, Y. Zhao, J. Guan, and J.Z. Huang, A Neighborhood-Based Clustering Algorithm, in Proc. PAKDD, 2005, pp Bellaachia, A.; Bari, A.;, SFLOSCAN: A biologically-inspired data mining framework for community identification in dynamic social networks, Swarm Intelligence (SIS), 2011 IEEE Symposium on, vol., no., pp.1-8, April 2011 doi: /SIS Bellaachia, A.; Bari, A.; A Flocking Based Data Mining Algorithm for Detecting Outliers in Cancer Gene Expression Microarray Data in Proc. IEEE International Conference on Information Retrieval and Knowledge Management, CAMP12, M. R. Berthold, N. Cebron, F. Dill, T. R. Gabriel, T. Kotter, T. Meinl, P. Ohl, K. Thiel, and B. Wiswedel. Knime - the konstanz information miner: version 2.0 and beyond.sigkdd Explor.Newsl., 11(1):26 31, Y. Ye, J.Z. Huang, X. Chen, S. Zhou, G.J. Williams, and X. Xu, "Neighborhood Density Method for Selecting Initial Cluster Centers in K-Means Clustering", in Proc. PAKDD, 2006, pp J. Ding, R. Ma, J. Yang, and S. Chen, "A tree-structured framework for purifying "complex" clusters with structural roles of individual data", presented at Pattern Recognition, 2010, pp Guillet, F., G. Ritschard, D.A. Zighed and H. Briand (eds) (2010) Advances in Knowledge Discovery and Management, Series: Studies in Computational Intelligence, Vol. 292, Berlin: Springer. doi: / Bellaachia, A.; X. He, An Artificial Life Based Data Mining Algorithm, Swarm Intelligence IEEE, 2006
PARTICLE SWARM OPTIMIZATION (PSO)
PARTICLE SWARM OPTIMIZATION (PSO) J. Kennedy and R. Eberhart, Particle Swarm Optimization. Proceedings of the Fourth IEEE Int. Conference on Neural Networks, 1995. A population based optimization technique
More informationFuzzy Ant Clustering by Centroid Positioning
Fuzzy Ant Clustering by Centroid Positioning Parag M. Kanade and Lawrence O. Hall Computer Science & Engineering Dept University of South Florida, Tampa FL 33620 @csee.usf.edu Abstract We
More informationAn Adaptive Flocking Algorithm for Spatial Clustering
An Adaptive Flocking Algorithm for Spatial Clustering Gianluigi Folino and Giandomenico Spezzano CNR-ISI Via Pietro Bucci cubo 41C c/o DEIS, UNICAL, 87036 Rende (CS), Italy Phone: +39 984 831722, Fax:
More informationA Novel Approach for Minimum Spanning Tree Based Clustering Algorithm
IJCSES International Journal of Computer Sciences and Engineering Systems, Vol. 5, No. 2, April 2011 CSES International 2011 ISSN 0973-4406 A Novel Approach for Minimum Spanning Tree Based Clustering Algorithm
More informationHigh Dimensional Indexing by Clustering
Yufei Tao ITEE University of Queensland Recall that, our discussion so far has assumed that the dimensionality d is moderately high, such that it can be regarded as a constant. This means that d should
More informationDensity Based Clustering using Modified PSO based Neighbor Selection
Density Based Clustering using Modified PSO based Neighbor Selection K. Nafees Ahmed Research Scholar, Dept of Computer Science Jamal Mohamed College (Autonomous), Tiruchirappalli, India nafeesjmc@gmail.com
More informationInternational Journal of Advanced Research in Computer Science and Software Engineering
Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue:
More informationNormalization based K means Clustering Algorithm
Normalization based K means Clustering Algorithm Deepali Virmani 1,Shweta Taneja 2,Geetika Malhotra 3 1 Department of Computer Science,Bhagwan Parshuram Institute of Technology,New Delhi Email:deepalivirmani@gmail.com
More informationC-NBC: Neighborhood-Based Clustering with Constraints
C-NBC: Neighborhood-Based Clustering with Constraints Piotr Lasek Chair of Computer Science, University of Rzeszów ul. Prof. St. Pigonia 1, 35-310 Rzeszów, Poland lasek@ur.edu.pl Abstract. Clustering is
More informationGraph projection techniques for Self-Organizing Maps
Graph projection techniques for Self-Organizing Maps Georg Pölzlbauer 1, Andreas Rauber 1, Michael Dittenbach 2 1- Vienna University of Technology - Department of Software Technology Favoritenstr. 9 11
More informationAdvanced visualization techniques for Self-Organizing Maps with graph-based methods
Advanced visualization techniques for Self-Organizing Maps with graph-based methods Georg Pölzlbauer 1, Andreas Rauber 1, and Michael Dittenbach 2 1 Department of Software Technology Vienna University
More information9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology
9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example
More informationImprovement of SURF Feature Image Registration Algorithm Based on Cluster Analysis
Sensors & Transducers 2014 by IFSA Publishing, S. L. http://www.sensorsportal.com Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis 1 Xulin LONG, 1,* Qiang CHEN, 2 Xiaoya
More informationAn Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data
An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University
More informationK-Means Clustering With Initial Centroids Based On Difference Operator
K-Means Clustering With Initial Centroids Based On Difference Operator Satish Chaurasiya 1, Dr.Ratish Agrawal 2 M.Tech Student, School of Information and Technology, R.G.P.V, Bhopal, India Assistant Professor,
More informationArgha Roy* Dept. of CSE Netaji Subhash Engg. College West Bengal, India.
Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Training Artificial
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationDetecting Clusters and Outliers for Multidimensional
Kennesaw State University DigitalCommons@Kennesaw State University Faculty Publications 2008 Detecting Clusters and Outliers for Multidimensional Data Yong Shi Kennesaw State University, yshi5@kennesaw.edu
More informationCOMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS
COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS Mariam Rehman Lahore College for Women University Lahore, Pakistan mariam.rehman321@gmail.com Syed Atif Mehdi University of Management and Technology Lahore,
More informationAnalyzing Outlier Detection Techniques with Hybrid Method
Analyzing Outlier Detection Techniques with Hybrid Method Shruti Aggarwal Assistant Professor Department of Computer Science and Engineering Sri Guru Granth Sahib World University. (SGGSWU) Fatehgarh Sahib,
More informationImproving Suffix Tree Clustering Algorithm for Web Documents
International Conference on Logistics Engineering, Management and Computer Science (LEMCS 2015) Improving Suffix Tree Clustering Algorithm for Web Documents Yan Zhuang Computer Center East China Normal
More informationClustering Algorithms for Data Stream
Clustering Algorithms for Data Stream Karishma Nadhe 1, Prof. P. M. Chawan 2 1Student, Dept of CS & IT, VJTI Mumbai, Maharashtra, India 2Professor, Dept of CS & IT, VJTI Mumbai, Maharashtra, India Abstract:
More informationAccelerating Unique Strategy for Centroid Priming in K-Means Clustering
IJIRST International Journal for Innovative Research in Science & Technology Volume 3 Issue 07 December 2016 ISSN (online): 2349-6010 Accelerating Unique Strategy for Centroid Priming in K-Means Clustering
More informationDatasets Size: Effect on Clustering Results
1 Datasets Size: Effect on Clustering Results Adeleke Ajiboye 1, Ruzaini Abdullah Arshah 2, Hongwu Qin 3 Faculty of Computer Systems and Software Engineering Universiti Malaysia Pahang 1 {ajibraheem@live.com}
More informationIteration Reduction K Means Clustering Algorithm
Iteration Reduction K Means Clustering Algorithm Kedar Sawant 1 and Snehal Bhogan 2 1 Department of Computer Engineering, Agnel Institute of Technology and Design, Assagao, Goa 403507, India 2 Department
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for
More informationSOMSN: An Effective Self Organizing Map for Clustering of Social Networks
SOMSN: An Effective Self Organizing Map for Clustering of Social Networks Fatemeh Ghaemmaghami Research Scholar, CSE and IT Dept. Shiraz University, Shiraz, Iran Reza Manouchehri Sarhadi Research Scholar,
More informationData Clustering Hierarchical Clustering, Density based clustering Grid based clustering
Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms
More informationEnhancing K-means Clustering Algorithm with Improved Initial Center
Enhancing K-means Clustering Algorithm with Improved Initial Center Madhu Yedla #1, Srinivasa Rao Pathakota #2, T M Srinivasa #3 # Department of Computer Science and Engineering, National Institute of
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 4
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationRegression Based Cluster Formation for Enhancement of Lifetime of WSN
Regression Based Cluster Formation for Enhancement of Lifetime of WSN K. Lakshmi Joshitha Assistant Professor Sri Sai Ram Engineering College Chennai, India lakshmijoshitha@yahoo.com A. Gangasri PG Scholar
More informationTraffic Signal Control Based On Fuzzy Artificial Neural Networks With Particle Swarm Optimization
Traffic Signal Control Based On Fuzzy Artificial Neural Networks With Particle Swarm Optimization J.Venkatesh 1, B.Chiranjeevulu 2 1 PG Student, Dept. of ECE, Viswanadha Institute of Technology And Management,
More informationCHAPTER VII INDEXED K TWIN NEIGHBOUR CLUSTERING ALGORITHM 7.1 INTRODUCTION
CHAPTER VII INDEXED K TWIN NEIGHBOUR CLUSTERING ALGORITHM 7.1 INTRODUCTION Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called cluster)
More informationK-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors
K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors Shao-Tzu Huang, Chen-Chien Hsu, Wei-Yen Wang International Science Index, Electrical and Computer Engineering waset.org/publication/0007607
More informationImproving the Efficiency of Fast Using Semantic Similarity Algorithm
International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year
More informationNORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM
NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM Saroj 1, Ms. Kavita2 1 Student of Masters of Technology, 2 Assistant Professor Department of Computer Science and Engineering JCDM college
More informationInternational Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at
Performance Evaluation of Ensemble Method Based Outlier Detection Algorithm Priya. M 1, M. Karthikeyan 2 Department of Computer and Information Science, Annamalai University, Annamalai Nagar, Tamil Nadu,
More informationResearch and Improvement on K-means Algorithm Based on Large Data Set
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 6 Issue 7 July 2017, Page No. 22145-22150 Index Copernicus value (2015): 58.10 DOI: 10.18535/ijecs/v6i7.40 Research
More informationKapitel 4: Clustering
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases WiSe 2017/18 Kapitel 4: Clustering Vorlesung: Prof. Dr.
More informationWrapper Feature Selection using Discrete Cuckoo Optimization Algorithm Abstract S.J. Mousavirad and H. Ebrahimpour-Komleh* 1 Department of Computer and Electrical Engineering, University of Kashan, Kashan,
More informationAutomatic Group-Outlier Detection
Automatic Group-Outlier Detection Amine Chaibi and Mustapha Lebbah and Hanane Azzag LIPN-UMR 7030 Université Paris 13 - CNRS 99, av. J-B Clément - F-93430 Villetaneuse {firstname.secondname}@lipn.univ-paris13.fr
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More informationCFMTL: Clustering Wireless Sensor Network Using Fuzzy Logic and Mobile Sink In Three-Level
CFMTL: Clustering Wireless Sensor Network Using Fuzzy Logic and Mobile Sink In Three-Level Ali Abdi Seyedkolaei 1 and Ali Zakerolhosseini 2 1 Department of Computer, Shahid Beheshti University, Tehran,
More informationRedefining and Enhancing K-means Algorithm
Redefining and Enhancing K-means Algorithm Nimrat Kaur Sidhu 1, Rajneet kaur 2 Research Scholar, Department of Computer Science Engineering, SGGSWU, Fatehgarh Sahib, Punjab, India 1 Assistant Professor,
More informationK-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection
K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection Zhenghui Ma School of Computer Science The University of Birmingham Edgbaston, B15 2TT Birmingham, UK Ata Kaban School of Computer
More informationHARD, SOFT AND FUZZY C-MEANS CLUSTERING TECHNIQUES FOR TEXT CLASSIFICATION
HARD, SOFT AND FUZZY C-MEANS CLUSTERING TECHNIQUES FOR TEXT CLASSIFICATION 1 M.S.Rekha, 2 S.G.Nawaz 1 PG SCALOR, CSE, SRI KRISHNADEVARAYA ENGINEERING COLLEGE, GOOTY 2 ASSOCIATE PROFESSOR, SRI KRISHNADEVARAYA
More informationDocument Clustering: Comparison of Similarity Measures
Document Clustering: Comparison of Similarity Measures Shouvik Sachdeva Bhupendra Kastore Indian Institute of Technology, Kanpur CS365 Project, 2014 Outline 1 Introduction The Problem and the Motivation
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationGraphs, Search, Pathfinding (behavior involving where to go) Steering, Flocking, Formations (behavior involving how to go)
Graphs, Search, Pathfinding (behavior involving where to go) Steering, Flocking, Formations (behavior involving how to go) Class N-2 1. What are some benefits of path networks? 2. Cons of path networks?
More informationREPRESENTATION OF BIG DATA BY DIMENSION REDUCTION
Fundamental Journal of Mathematics and Mathematical Sciences Vol. 4, Issue 1, 2015, Pages 23-34 This paper is available online at http://www.frdint.com/ Published online November 29, 2015 REPRESENTATION
More informationChapter DM:II. II. Cluster Analysis
Chapter DM:II II. Cluster Analysis Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster Analysis DM:II-1
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationAnalysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data
Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data D.Radha Rani 1, A.Vini Bharati 2, P.Lakshmi Durga Madhuri 3, M.Phaneendra Babu 4, A.Sravani 5 Department
More informationVisual programming language for modular algorithms
Visual programming language for modular algorithms Rudolfs Opmanis, Rihards Opmanis Institute of Mathematics and Computer Science University of Latvia, Raina bulvaris 29, Riga, LV-1459, Latvia rudolfs.opmanis@gmail.com,
More informationInternational Journal Of Engineering And Computer Science ISSN: Volume 5 Issue 11 Nov. 2016, Page No.
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 5 Issue 11 Nov. 2016, Page No. 19054-19062 Review on K-Mode Clustering Antara Prakash, Simran Kalera, Archisha
More informationUnsupervised learning on Color Images
Unsupervised learning on Color Images Sindhuja Vakkalagadda 1, Prasanthi Dhavala 2 1 Computer Science and Systems Engineering, Andhra University, AP, India 2 Computer Science and Systems Engineering, Andhra
More informationCHAPTER 4 AN IMPROVED INITIALIZATION METHOD FOR FUZZY C-MEANS CLUSTERING USING DENSITY BASED APPROACH
37 CHAPTER 4 AN IMPROVED INITIALIZATION METHOD FOR FUZZY C-MEANS CLUSTERING USING DENSITY BASED APPROACH 4.1 INTRODUCTION Genes can belong to any genetic network and are also coordinated by many regulatory
More informationBRACE: A Paradigm For the Discretization of Continuously Valued Data
Proceedings of the Seventh Florida Artificial Intelligence Research Symposium, pp. 7-2, 994 BRACE: A Paradigm For the Discretization of Continuously Valued Data Dan Ventura Tony R. Martinez Computer Science
More informationLECTURE 16: SWARM INTELLIGENCE 2 / PARTICLE SWARM OPTIMIZATION 2
15-382 COLLECTIVE INTELLIGENCE - S18 LECTURE 16: SWARM INTELLIGENCE 2 / PARTICLE SWARM OPTIMIZATION 2 INSTRUCTOR: GIANNI A. DI CARO BACKGROUND: REYNOLDS BOIDS Reynolds created a model of coordinated animal
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informatione-ccc-biclustering: Related work on biclustering algorithms for time series gene expression data
: Related work on biclustering algorithms for time series gene expression data Sara C. Madeira 1,2,3, Arlindo L. Oliveira 1,2 1 Knowledge Discovery and Bioinformatics (KDBIO) group, INESC-ID, Lisbon, Portugal
More informationAN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS
AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS H.S Behera Department of Computer Science and Engineering, Veer Surendra Sai University
More informationA Memetic Heuristic for the Co-clustering Problem
A Memetic Heuristic for the Co-clustering Problem Mohammad Khoshneshin 1, Mahtab Ghazizadeh 2, W. Nick Street 1, and Jeffrey W. Ohlmann 1 1 The University of Iowa, Iowa City IA 52242, USA {mohammad-khoshneshin,nick-street,jeffrey-ohlmann}@uiowa.edu
More informationClustering Techniques
Clustering Techniques Bioinformatics: Issues and Algorithms CSE 308-408 Fall 2007 Lecture 16 Lopresti Fall 2007 Lecture 16-1 - Administrative notes Your final project / paper proposal is due on Friday,
More informationCHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION
CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant
More informationAvailable online at ScienceDirect. Procedia Computer Science 89 (2016 )
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 89 (2016 ) 341 348 Twelfth International Multi-Conference on Information Processing-2016 (IMCIP-2016) Parallel Approach
More informationDistance-based Methods: Drawbacks
Distance-based Methods: Drawbacks Hard to find clusters with irregular shapes Hard to specify the number of clusters Heuristic: a cluster must be dense Jian Pei: CMPT 459/741 Clustering (3) 1 How to Find
More informationLocalized and Incremental Monitoring of Reverse Nearest Neighbor Queries in Wireless Sensor Networks 1
Localized and Incremental Monitoring of Reverse Nearest Neighbor Queries in Wireless Sensor Networks 1 HAI THANH MAI AND MYOUNG HO KIM Department of Computer Science Korea Advanced Institute of Science
More informationNUMB3RS Activity: Follow the Flock. Episode: In Plain Sight
Teacher Page 1 NUMB3RS Activity: Follow the Flock Topic: Introduction to Flock Behavior Grade Level: 8-12 Objective: Use a mathematical model to simulate an aspect of birds flying in a flock Time: 30 minutes
More informationClustering Lecture 4: Density-based Methods
Clustering Lecture 4: Density-based Methods Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced
More informationFinding Effective Software Security Metrics Using A Genetic Algorithm
International Journal of Software Engineering. ISSN 0974-3162 Volume 4, Number 2 (2013), pp. 1-6 International Research Publication House http://www.irphouse.com Finding Effective Software Security Metrics
More informationFast Efficient Clustering Algorithm for Balanced Data
Vol. 5, No. 6, 214 Fast Efficient Clustering Algorithm for Balanced Data Adel A. Sewisy Faculty of Computer and Information, Assiut University M. H. Marghny Faculty of Computer and Information, Assiut
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 15: Microarray clustering http://compbio.pbworks.com/f/wood2.gif Some slides were adapted from Dr. Shaojie Zhang (University of Central Florida) Microarray
More informationAPPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE
APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE Sundari NallamReddy, Samarandra Behera, Sanjeev Karadagi, Dr. Anantha Desik ABSTRACT: Tata
More informationSWARM INTELLIGENCE -I
SWARM INTELLIGENCE -I Swarm Intelligence Any attempt to design algorithms or distributed problem solving devices inspired by the collective behaviourof social insect colonies and other animal societies
More informationClustering of datasets using PSO-K-Means and PCA-K-means
Clustering of datasets using PSO-K-Means and PCA-K-means Anusuya Venkatesan Manonmaniam Sundaranar University Tirunelveli- 60501, India anusuya_s@yahoo.com Latha Parthiban Computer Science Engineering
More informationStability Analysis of M-Dimensional Asynchronous Swarms With a Fixed Communication Topology
76 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 48, NO. 1, JANUARY 2003 Stability Analysis of M-Dimensional Asynchronous Swarms With a Fixed Communication Topology Yang Liu, Member, IEEE, Kevin M. Passino,
More informationCity, University of London Institutional Repository
City Research Online City, University of London Institutional Repository Citation: Andrienko, N., Andrienko, G., Fuchs, G., Rinzivillo, S. & Betz, H-D. (2015). Real Time Detection and Tracking of Spatial
More informationParallel Approach for Implementing Data Mining Algorithms
TITLE OF THE THESIS Parallel Approach for Implementing Data Mining Algorithms A RESEARCH PROPOSAL SUBMITTED TO THE SHRI RAMDEOBABA COLLEGE OF ENGINEERING AND MANAGEMENT, FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
More informationA Two-phase Distributed Training Algorithm for Linear SVM in WSN
Proceedings of the World Congress on Electrical Engineering and Computer Systems and Science (EECSS 015) Barcelona, Spain July 13-14, 015 Paper o. 30 A wo-phase Distributed raining Algorithm for Linear
More informationTOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA)
TOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA) 1 S. ADAEKALAVAN, 2 DR. C. CHANDRASEKAR 1 Assistant Professor, Department of Information Technology, J.J. College of Arts and Science, Pudukkottai,
More informationComparing and Selecting Appropriate Measuring Parameters for K-means Clustering Technique
International Journal of Soft Computing and Engineering (IJSCE) Comparing and Selecting Appropriate Measuring Parameters for K-means Clustering Technique Shreya Jain, Samta Gajbhiye Abstract Clustering
More informationHard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering
An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other
More informationResearch Article QOS Based Web Service Ranking Using Fuzzy C-means Clusters
Research Journal of Applied Sciences, Engineering and Technology 10(9): 1045-1050, 2015 DOI: 10.19026/rjaset.10.1873 ISSN: 2040-7459; e-issn: 2040-7467 2015 Maxwell Scientific Publication Corp. Submitted:
More informationReview: Identification of cell types from single-cell transcriptom. method
Review: Identification of cell types from single-cell transcriptomes using a novel clustering method University of North Carolina at Charlotte October 12, 2015 Brief overview Identify clusters by merging
More informationMinimal Test Cost Feature Selection with Positive Region Constraint
Minimal Test Cost Feature Selection with Positive Region Constraint Jiabin Liu 1,2,FanMin 2,, Shujiao Liao 2, and William Zhu 2 1 Department of Computer Science, Sichuan University for Nationalities, Kangding
More informationMobility Data Management & Exploration
Mobility Data Management & Exploration Ch. 07. Mobility Data Mining and Knowledge Discovery Nikos Pelekis & Yannis Theodoridis InfoLab University of Piraeus Greece infolab.cs.unipi.gr v.2014.05 Chapter
More informationOptimization of Benchmark Functions Using Artificial Bee Colony (ABC) Algorithm
IOSR Journal of Engineering (IOSRJEN) e-issn: 2250-3021, p-issn: 2278-8719 Vol. 3, Issue 10 (October. 2013), V4 PP 09-14 Optimization of Benchmark Functions Using Artificial Bee Colony (ABC) Algorithm
More informationHOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery
HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery Ninh D. Pham, Quang Loc Le, Tran Khanh Dang Faculty of Computer Science and Engineering, HCM University of Technology,
More informationClustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation
More informationParticle Swarm Optimization
Particle Swarm Optimization Gonçalo Pereira INESC-ID and Instituto Superior Técnico Porto Salvo, Portugal gpereira@gaips.inesc-id.pt April 15, 2011 1 What is it? Particle Swarm Optimization is an algorithm
More informationHandling Multi Objectives of with Multi Objective Dynamic Particle Swarm Optimization
Handling Multi Objectives of with Multi Objective Dynamic Particle Swarm Optimization Richa Agnihotri #1, Dr. Shikha Agrawal #1, Dr. Rajeev Pandey #1 # Department of Computer Science Engineering, UIT,
More informationA Naïve Soft Computing based Approach for Gene Expression Data Analysis
Available online at www.sciencedirect.com Procedia Engineering 38 (2012 ) 2124 2128 International Conference on Modeling Optimization and Computing (ICMOC-2012) A Naïve Soft Computing based Approach for
More informationSwarm Based Fuzzy Clustering with Partition Validity
Swarm Based Fuzzy Clustering with Partition Validity Lawrence O. Hall and Parag M. Kanade Computer Science & Engineering Dept University of South Florida, Tampa FL 33620 @csee.usf.edu Abstract
More informationCLUSTERING IN BIOINFORMATICS
CLUSTERING IN BIOINFORMATICS CSE/BIMM/BENG 8 MAY 4, 0 OVERVIEW Define the clustering problem Motivation: gene expression and microarrays Types of clustering Clustering algorithms Other applications of
More informationClustering in Data Mining
Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,
More informationDynamic Clustering of Data with Modified K-Means Algorithm
2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq
More informationUnderstanding Clustering Supervising the unsupervised
Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data
More informationFast K-nearest neighbors searching algorithms for point clouds data of 3D scanning system 1
Acta Technica 62 No. 3B/2017, 141 148 c 2017 Institute of Thermomechanics CAS, v.v.i. Fast K-nearest neighbors searching algorithms for point clouds data of 3D scanning system 1 Zhang Fan 2, 3, Tan Yuegang
More information