CHAPTER 7 A GRID CLUSTERING ALGORITHM

Size: px
Start display at page:

Download "CHAPTER 7 A GRID CLUSTERING ALGORITHM"

Transcription

1 CHAPTER 7 A GRID CLUSTERING ALGORITHM 7.1 Introduction The grid-based methods have widely been used over all the algorithms discussed in previous chapters due to their rapid clustering results. In this chapter, a nonparametric grid-based clustering algorithm is presented using the concept of boundary grids and local outlier factor [31]. 7.2 Algorithm Based on Local Outlier Factor Let D = (x 1, x 2,, x n ) be the given set of n data points embedded in an m dimensional space. Each x i is composed of m attributes. i.e., x i = (x i1, x i2,, x im ). Initially a single grid is used to represent all the given n points. The extrema of this grid are taken as the minimum and maximum attribute values in each dimension [129]. Let Min(i) = min{x 1i, x 2i,..., x ni } and Max(i) = max{x 1i, x 2i,..., x ni }. Then the initial grid G(n;m) is represented as follows. G(n;m) = [Min(1), Max(1)] [Min(2), Max(2)] [Min(m), Max(m)]. Initially the value of k (number of clusters) is one. Now, we partition the grid G(n;m) into two equal volume grids G 1 (n 1 ;m) and G 2 (n 2 ;m) in a uniformly selected dimension m. Now, the data points of G(n;m) are distributed to these two grids. Here, we define the grids (cells) which contain at least one of the given points as non-empty grids. All the other grids are remained as empty grids. The empty grids which share the common vertex with non-empty grids are known as boundary grids. It can easily be noted that each cluster is surrounded by the boundary grids. After each round of partitioning of grids, it is necessary to check the presence of new clusters. Here, a cluster is defined as the collection of points in the non-emptygrids which are connected by the common vertices in the grid structure. In the next round of partitioning, the two grids G 1 (n 1 ;m) and G 2 (n 2 ;m) are partitioned into four 110

2 7.2 Algorithm Based on Local Outlier Factor equal volume grids G 1,1 (n 3 ;m), G 1,2 (n 4 ;m); and G 2, 1 (n 5 ;m) and G 2,2 (n 6 ;m) in another chosen dimension. Then the points are distributed as per the new grids. In this way all the grids of earlier iteration are bisected and the partitioning process is continued until a termination condition is met. In order to find the case of optimal grid structure we use the boundary grids. As discussed above, a cluster is defined as the collection of the points of directly connected non-empty grids. Each cluster is surrounded by the boundary grids. We also define the volume of a cluster as sum of the volumes of all the non-empty grids of that cluster. Here, we have observed that the volume of a cluster strictly decreases as the partitioning process continues. At the same time, the number of surrounding empty grids (boundary grids) increases significantly. This is clearly depicted in the Figures 7.1(a-e). If the required clusters are formed and we continue the partitioning process, a few clusters split into multiple clusters as shown in the Figure 7.1(e). Some of these clusters are surrounded by very small number of boundary grids compared to the clusters of previous iteration. For instance, five clusters surrounded by very less boundary grids are shown in the Figure 7.1(e). Therefore, this case indicates that the clusters of previous iteration are of our objective and the corresponding grid size is the optimal one compared to the sizes of all the other iterations. 111

3 7.2 Algorithm Based on Local Outlier Factor (a) (b) (c) (d) (e) Figure 7.1: Clusters produced by grouping the points in the grids with common vertices (a) one cluster; (b) five clusters; (c) five clusters; (d) more than five clusters; (e) a cluster partitioned into several sub-clusters with less boundary grids. 112

4 7.2.1 Problem of Outliers Problem of Outliers The above idea is sensitive if the given data set involves outlier points. As the outliers are far from the clusters, it can be observed that these points form separate clusters during the partitioning process. This happens because the outlier region is separated from the other clusters as the number of grids increases (or the grid size decreases). As per our definition of a cluster, the outlier points are surrounded by the boundary grids. Here, the number of boundary grids is significantly less compared to the boundary grids of other clusters which indicate the termination stage as per the above discussion. Therefore, the clusters may not be fully formed as the partitioning process is uncompleted with the effect of outliers. Hence, we need a measure of finding an outlier to continue the partitioning process smoothly. For that purpose, we have used the local outlier factor (LOF) proposed by Breunig et al. [31] to compute the degree to which a point is an outlier. Initially, the local neighborhood of a point x S with respect to the minimum points threshold mp is defined as follows: N(x, mp) = {y S / d (x, y) d (x, x mp ) where x mp is the mpth nearest neighbor of x. Thus N(x, mp) contains at least mp points. The density of a point x S is defined as follows. density x, mp d N x, mp,, y N x mp x y (7.1) If the distances between x and its neighboring points are small, then the density of x is high. Then the average relative density (ard) of x is calculated as below. ard x, mp density x, mp y N x, mp density y, mp, N x mp (7.2) 113

5 Now, the local outlier factor (LOF) of x is defined as the inverse of the average relative density of x. i.e., LOF (x, mp) = 1 ard x, mp (7.3) If a point belongs to any cluster, then its corresponding value of LOF is closer to one because the density of that point and the densities of its neighboring points are roughly equal. If any cluster has significantly less number of boundary grids in any iteration of partitioning, we compute the LOF value for all the points of that cluster by using the above definition. If any point is an outlier we continue the partitioning process. The process is terminated otherwise. The pseudo code of this algorithm is provided below. Algorithm OPT-GRID (S) Input: A set S of n data points; Output: A set of clusters C 1, C 2,, C k. Functions and variables used: Grid (n; m): A function to find the initial grid structure of the given set S of n points with dimension m. Partition (G): A function to partition all the grids into two equal volume new grids. Connected (); A function to find the clusters from the grid structure using the common vertices in the grid structure. LOF (x): A function to compute the local outlier factor. EG j : Empty grids. NE j : Non-empty grids. BG j : Boundary grids. 114

6 Step 1: Call Grid (n; m); Step 2: p 1; Step 3: Call Partition (G) to find the equal volume grids G 1 and G 2. Step 4: Call Connected () to produce the clusters (say, C 1, C 2,, C l for some l) by grouping points in the grids which are connected by the common vertices in the grid structure. Step 5: Find the number of boundary grids BG j of all the clusters C 1, C 2,, C l ; Step 6: If (p > 1) then If the number of boundary grids of any cluster C of t points is less than 20% of the boundary grids of any cluster in (p-1) th iteration then Go to step 7. Else { p p + 1; Go to step 3 to call the Grid function for all the grids. } Else Go to step 3 to call the Grid function for all the grids. Step 7: Call LOF (x q ) x q C, 1 q t; Step 8: If LOF (x q ) 1 x q C then Go to step 9; Else /* Sign of outliers */ p p + 1; Go to step 3 to continue the partitioning process for all the grids. Step 9: Output the clusters C 1, C 2,, C k with respect to the grid size in the (p-1) th iteration. Step 10: Exit (); 115

7 Function Grid (n; m) { Min (y) = min {x 1y, x 2y,..., x ny }, 1 y m; Max (y) = max {x 1y, x 2y,..., x ny }, 1 y m; G(n;m) = [Min(1), Max(1)] [Min(2), Max(2)] [Min(m), Max(m)]. i.e., G n; m Min y, Max y 1 y m } Connected ( ) { Step 1: l 1; Step 2: Start with any random grid G. Step 3: If G is non-visited, then mark it as visited and add all the points of G in C l. i.e., C l C l {points in the grid G}; Else Go to step 2. Step 4: Find the non-visited grids Gr (for some r) shared by G with any common vertex. Step 5: Add the points of all Gr to the cluster C l and Gr is visited. i.e., C l C l {points in the grids G r }; Step 6: Repeat the steps 4 and 5 for all the grids G r until no new grid is identified. Step 7: If j Else C j Return (C 1, C 2,, C j ); } n then l l + 1; Go to step 2 to restart the process. 116

8 7.2.2 Experimental Analysis Time Complexity: The proposed algorithm constructs the grid structure of the given data points for a finite number of times (say p) with respect to the uniformly chosen dimensions. This task requires O (pn) time as the partitioning process in an iteration requires linear time. Initially, the local outlier factor (LOF) is computed for the points (if existed) from the cluster that represents the outliers. The LOF is also computed for non-outlier points only in the last iteration of the algorithm. Hence, the LOF is computed very less number of times (say h) compared to the given data points (n). This is required O (hn) time. It can easily be seen that both p and h are very small compared to n. Therefore, the overall time complexity of the proposed algorithm may be linear Experimental Analysis We performed wide experiments of the proposed algorithm on many synthetic and biological data. In order to prove the efficiency of the proposed method, we compared the experimental results with the existing clustering techniques, namely K-means [26], IGDCA [145] and GGCA [129]. We used normalized information gain (NIG) to evaluate the quality of the clusters quantitatively. The NIG is defined as follows. Normalized Information Gain: Normalized Information Gain [91] is a measure of the quality of clusters based on the information gain [114]. This measure has been shown very effective for evaluating data classification. Hence, it is extended to evaluate clusters of supervised data. NIG is expressed in terms of total and weighted entropy s defined as follows. Suppose there are L classes and each of the given n data point belongs to any one of the L classes. Let c l denote the number of data points of the class l, for l = 1, 2,, L. Sol L 1 c l n Then, the Total Entropy (EN Total ) which is nothing but the average information per point in the dataset is defined as follows. EN Total L c c l log l 1 2 n l n (7.4) 117

9 7.2.2 Experimental Analysis Assume that the total number of clusters is K with the number of points in the kth cluster is n k. Let the number of points belong to class l where l = 1, 2,, L is c k l. Then the kth cluster entropy (EN k ) is given by EN k L c k l l 1 n k c k log l 2 n k (7.5) It is clear to note that if any cluster has points of only one class, then EN K is obviously zero. Now based on the kth cluster entropy, the weighted entropy (wen) which is nothing but the average information per point in each cluster is calculated as follows. wen 1 K n k EN k k n (7.6) If all the clusters are homogeneous, then wen is zero which indicates that the whole data set belong to a single cluster. The Normalized Information Gain (NIG) is calculated by the above equations as follows. EN wen NIG Total EN Total (7.7) It is important to note that NIG is zero, if no information is obtained and NIG is one if the total information is retrieved by the clustering technique. Therefore, NIG should be close to 1 for good quality clusters. Initially, the proposed method, K-means, IGDCA and GGCA were applied on eight two-dimensional synthetic data sets of various shapes and densities. The comparison results in the below Table 7.1 depict the efficiency of the proposed method over the existing. The NIG values of the proposed method are larger than the NIG values of the existing methods K-meaans, IGDCA and GGCA. Hence, as per the definition of NIG, the proposed method outperforms the existing methods. 118

10 7.2.2 Experimental Analysis Table 7.1: Results of synthetic data using normalized information gain (NIG) Data Data Size Cluster No. NIG K-means IGDCA GGCA Proposed DS DS DS DS DS DS DS DS Next, we experimented the proposed and existing methods on eight biological data sets namely, iris, wine, statlog heart, breast tissue, iima-india-diabetes, cloud, blood transfusion and yeast taken from UCI machine learning repository [32]. It can easily be observed from the comparison results shown in the Table 7.2 that the proposed algorithm consistently produces better results than that of the algorithms K- means, IGDCA and GGCA in terms of the normalized information gain (NIG). 119

11 Data Iris Wine Statlog Heart Breast Tissue Pima Indians Diabetes Cloud Blood Transfusion Yeast Table 7.2: Results of biological data using normalized information gain (NIG) No. of Attributes Data Size Cluster No K-means IGDCA NIG GGCA Proposed

12 7.3 Conclusion 7.3 Conclusion In this chapter, a new approach has been proposed to find the optimal grid size using the cluster boundaries. The proposed method is non-parametric and runs with linear computational complexity. It is insensitive to the outlier points by exploiting the local outlier factor (LOF). The proposed scheme is experimented on various synthetic and biological data using normalized information gain (NIG). It has been shown that the proposed method produces better results than that of K-means, IGDCA and GGCA. 121

13 CHAPTER 8 SUMMARY AND CONCLUSIONS In this thesis, we have presented various clustering algorithms of four basic models of clustering, namely, hierarchical, partitional, density and grid-based. We have addressed some of the problems associated with the existing algorithms and tried to resolve them through new or improved algorithms. We have shown the experimental results of all the proposed algorithms on several benchmark synthetic and biological data. The results have been compared with the existing algorithms to show the outperformance of the proposed algorithms. The introductory part of the thesis has been presented in the opening chapter 1. It comprises of the scope of the thesis, an introduction of four major clustering models, resources used and organization of the thesis. The chapter 2 is composed of an extensive review of the existing clustering algorithms of the above four major clustering models. The various clustering techniques of these models are discussed along with their merits and demerits. Our contribution begins with chapter 3 in which two novel MST-based clustering algorithms have been presented. The algorithm MST-CV has been designed using the coefficient of variation. This method has also been dealt with the outliers. The algorithm MST-DVI has been produced the optimal clusters using DVI. The proposed algorithms outperformed several existing methods, K-means, SFMST, SC, MSDR, IMST, SAM and MinClue in case of various synthetic and biological data. It is observed that both the proposed algorithms have quadratic time complexity. Chapter 4 presented three new clustering algorithms using Voronoi diagram. The algorithm GCVD produced desired clusters by exploiting the Voronoi edges. The algorithm Voronoi-Cluster resulted into optimum clusters by using a function defined with the help of Voronoi circles. Similarly, the algorithm Build-Voronoi generates 122

14 Summary and Conclusions efficient clusters by reconstructing the Voronoi diagram. The proposed algorithms have been shown efficient over K-means, FCM, CTVN, VDAC and SC. The time complexity of both of these algorithms has been shown to be O (n log n). Two algorithms to enhance the K-means clustering have been presented in chapter 5. The algorithm KD-JDF has been effective in case of random initialization and automation problems of K-means algorithm. Here, the initialization problem has been dealt with the kd-tree where as the number of clusters is automated by the JDF. The algorithm KD-Cluster improved the K-means clustering algorithm to be insensitive to the outliers. The proposed algorithms have been outperformed the classical and improved K-means algorithms. The time complexity of the proposed kdtree based algorithms is O (n log n). A prototype based approach has been designed in the chapter 6 to speed up the DBSCAN algorithm. The prototypes in this algorithm are produced using the squared error clustering technique. The proposed algorithm produced clusters with less computational cost over the existing techniques I-DBSCAN, DBCAMM, VDBSCAN and KFWDBSCAN. In chapter 7, we have proposed a grid clustering algorithm to compute the grid-size using a novel concept of boundary grids. In this method, we used the local outlier factor (LOF) to deal with the problem of outliers. The proposed technique is non-parametric and runs in linear time. The method has been produced better results than that of K-means, IGDCA and GGCA. Although, the proposed algorithms efficient in clustering various complex data, they have few challenges. Our future efforts will be made towards such problems described as follows. We shall attempt to find a solution to the localization problem of the proposed MST-based clustering algorithm. The proposed Voronoi diagram based algorithm has the problem of inputting the K value in advance. We shall enhance the algorithm in connection with this issue. The performance of the kdtree based K-means clustering algorithms is completely depends on the proper 123

15 Summary and Conclusions formation of the leaf bucket sizes. Therefore, we shall try to find the best possible size for the leaf buckets. The notion of the proposed MDBSCAN clustering algorithm is directed by the neighborhood parameter. We shall make an effort to automate the proper value of. If the given data has the clusters of diverse densities, the proposed concept of boundary grids in grid-based clustering may not provide the optimal gridsize. We shall try to enhance the grid-based clustering algorithm in our future endeavor for producing the clusters of varied densities. 124

CHAPTER 4 VORONOI DIAGRAM BASED CLUSTERING ALGORITHMS

CHAPTER 4 VORONOI DIAGRAM BASED CLUSTERING ALGORITHMS CHAPTER 4 VORONOI DIAGRAM BASED CLUSTERING ALGORITHMS 4.1 Introduction Although MST-based clustering methods are effective for complex data, they require quadratic computational time which is high for

More information

University of Florida CISE department Gator Engineering. Clustering Part 4

University of Florida CISE department Gator Engineering. Clustering Part 4 Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

Clustering Part 4 DBSCAN

Clustering Part 4 DBSCAN Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

Unsupervised Learning : Clustering

Unsupervised Learning : Clustering Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Synopsis. 1. Introduction

Synopsis. 1. Introduction Synopsis 1. Introduction With the recent advances of data generation and acquisition systems and the success of several projects such as Human Genome, a large number of databases, especially in biological

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)

More information

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest)

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest) Lecture-17: Clustering with K-Means (Contd: DT + Random Forest) Medha Vidyotma April 24, 2018 1 Contd. Random Forest For Example, if there are 50 scholars who take the measurement of the length of the

More information

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013 Voronoi Region K-means method for Signal Compression: Vector Quantization Blocks of signals: A sequence of audio. A block of image pixels. Formally: vector example: (0.2, 0.3, 0.5, 0.1) A vector quantizer

More information

Automatic Group-Outlier Detection

Automatic Group-Outlier Detection Automatic Group-Outlier Detection Amine Chaibi and Mustapha Lebbah and Hanane Azzag LIPN-UMR 7030 Université Paris 13 - CNRS 99, av. J-B Clément - F-93430 Villetaneuse {firstname.secondname}@lipn.univ-paris13.fr

More information

Clustering in Data Mining

Clustering in Data Mining Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

K-Means Clustering Using Localized Histogram Analysis

K-Means Clustering Using Localized Histogram Analysis K-Means Clustering Using Localized Histogram Analysis Michael Bryson University of South Carolina, Department of Computer Science Columbia, SC brysonm@cse.sc.edu Abstract. The first step required for many

More information

Chapter 4: Text Clustering

Chapter 4: Text Clustering 4.1 Introduction to Text Clustering Clustering is an unsupervised method of grouping texts / documents in such a way that in spite of having little knowledge about the content of the documents, we can

More information

AN IMPROVED DENSITY BASED k-means ALGORITHM

AN IMPROVED DENSITY BASED k-means ALGORITHM AN IMPROVED DENSITY BASED k-means ALGORITHM Kabiru Dalhatu 1 and Alex Tze Hiang Sim 2 1 Department of Computer Science, Faculty of Computing and Mathematical Science, Kano University of Science and Technology

More information

Chapter 4: Non-Parametric Techniques

Chapter 4: Non-Parametric Techniques Chapter 4: Non-Parametric Techniques Introduction Density Estimation Parzen Windows Kn-Nearest Neighbor Density Estimation K-Nearest Neighbor (KNN) Decision Rule Supervised Learning How to fit a density

More information

Cluster Analysis. Ying Shen, SSE, Tongji University

Cluster Analysis. Ying Shen, SSE, Tongji University Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group

More information

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned

More information

Figure (5) Kohonen Self-Organized Map

Figure (5) Kohonen Self-Organized Map 2- KOHONEN SELF-ORGANIZING MAPS (SOM) - The self-organizing neural networks assume a topological structure among the cluster units. - There are m cluster units, arranged in a one- or two-dimensional array;

More information

CHAPTER 4 AN IMPROVED INITIALIZATION METHOD FOR FUZZY C-MEANS CLUSTERING USING DENSITY BASED APPROACH

CHAPTER 4 AN IMPROVED INITIALIZATION METHOD FOR FUZZY C-MEANS CLUSTERING USING DENSITY BASED APPROACH 37 CHAPTER 4 AN IMPROVED INITIALIZATION METHOD FOR FUZZY C-MEANS CLUSTERING USING DENSITY BASED APPROACH 4.1 INTRODUCTION Genes can belong to any genetic network and are also coordinated by many regulatory

More information

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI

More information

Chapter 8 The C 4.5*stat algorithm

Chapter 8 The C 4.5*stat algorithm 109 The C 4.5*stat algorithm This chapter explains a new algorithm namely C 4.5*stat for numeric data sets. It is a variant of the C 4.5 algorithm and it uses variance instead of information gain for the

More information

A Novel Approach for Minimum Spanning Tree Based Clustering Algorithm

A Novel Approach for Minimum Spanning Tree Based Clustering Algorithm IJCSES International Journal of Computer Sciences and Engineering Systems, Vol. 5, No. 2, April 2011 CSES International 2011 ISSN 0973-4406 A Novel Approach for Minimum Spanning Tree Based Clustering Algorithm

More information

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering

More information

D-GridMST: Clustering Large Distributed Spatial Databases

D-GridMST: Clustering Large Distributed Spatial Databases D-GridMST: Clustering Large Distributed Spatial Databases Ji Zhang Department of Computer Science University of Toronto Toronto, Ontario, M5S 3G4, Canada Email: jzhang@cs.toronto.edu Abstract: In this

More information

DATA MINING - 1DL105, 1Dl111. An introductory class in data mining

DATA MINING - 1DL105, 1Dl111. An introductory class in data mining 1 DATA MINING - 1DL105, 1Dl111 Fall 007 An introductory class in data mining http://user.it.uu.se/~udbl/dm-ht007/ alt. http://www.it.uu.se/edu/course/homepage/infoutv/ht07 Kjell Orsborn Uppsala Database

More information

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017)

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017) 1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should

More information

CS S Lecture February 13, 2017

CS S Lecture February 13, 2017 CS 6301.008.18S Lecture February 13, 2017 Main topics are #Voronoi-diagrams, #Fortune. Quick Note about Planar Point Location Last week, I started giving a difficult analysis of the planar point location

More information

The K-modes and Laplacian K-modes algorithms for clustering

The K-modes and Laplacian K-modes algorithms for clustering The K-modes and Laplacian K-modes algorithms for clustering Miguel Á. Carreira-Perpiñán Electrical Engineering and Computer Science University of California, Merced http://faculty.ucmerced.edu/mcarreira-perpinan

More information

TRANSACTIONAL CLUSTERING. Anna Monreale University of Pisa

TRANSACTIONAL CLUSTERING. Anna Monreale University of Pisa TRANSACTIONAL CLUSTERING Anna Monreale University of Pisa Clustering Clustering : Grouping of objects into different sets, or more precisely, the partitioning of a data set into subsets (clusters), so

More information

A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data

A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data Journal of Computational Information Systems 11: 6 (2015) 2139 2146 Available at http://www.jofcis.com A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data

More information

Document Clustering: Comparison of Similarity Measures

Document Clustering: Comparison of Similarity Measures Document Clustering: Comparison of Similarity Measures Shouvik Sachdeva Bhupendra Kastore Indian Institute of Technology, Kanpur CS365 Project, 2014 Outline 1 Introduction The Problem and the Motivation

More information

NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM

NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM Saroj 1, Ms. Kavita2 1 Student of Masters of Technology, 2 Assistant Professor Department of Computer Science and Engineering JCDM college

More information

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008 Cluster Analysis Jia Li Department of Statistics Penn State University Summer School in Statistics for Astronomers IV June 9-1, 8 1 Clustering A basic tool in data mining/pattern recognition: Divide a

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling

More information

An Enhanced Density Clustering Algorithm for Datasets with Complex Structures

An Enhanced Density Clustering Algorithm for Datasets with Complex Structures An Enhanced Density Clustering Algorithm for Datasets with Complex Structures Jieming Yang, Qilong Wu, Zhaoyang Qu, and Zhiying Liu Abstract There are several limitations of DBSCAN: 1) parameters have

More information

Using a genetic algorithm for editing k-nearest neighbor classifiers

Using a genetic algorithm for editing k-nearest neighbor classifiers Using a genetic algorithm for editing k-nearest neighbor classifiers R. Gil-Pita 1 and X. Yao 23 1 Teoría de la Señal y Comunicaciones, Universidad de Alcalá, Madrid (SPAIN) 2 Computer Sciences Department,

More information

Chapter 7 UNSUPERVISED LEARNING TECHNIQUES FOR MAMMOGRAM CLASSIFICATION

Chapter 7 UNSUPERVISED LEARNING TECHNIQUES FOR MAMMOGRAM CLASSIFICATION UNSUPERVISED LEARNING TECHNIQUES FOR MAMMOGRAM CLASSIFICATION Supervised and unsupervised learning are the two prominent machine learning algorithms used in pattern recognition and classification. In this

More information

6 Randomized rounding of semidefinite programs

6 Randomized rounding of semidefinite programs 6 Randomized rounding of semidefinite programs We now turn to a new tool which gives substantially improved performance guarantees for some problems We now show how nonlinear programming relaxations can

More information

DETECTION AND ROBUST ESTIMATION OF CYLINDER FEATURES IN POINT CLOUDS INTRODUCTION

DETECTION AND ROBUST ESTIMATION OF CYLINDER FEATURES IN POINT CLOUDS INTRODUCTION DETECTION AND ROBUST ESTIMATION OF CYLINDER FEATURES IN POINT CLOUDS Yun-Ting Su James Bethel Geomatics Engineering School of Civil Engineering Purdue University 550 Stadium Mall Drive, West Lafayette,

More information

CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling

CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling To Appear in the IEEE Computer CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling George Karypis Eui-Hong (Sam) Han Vipin Kumar Department of Computer Science and Engineering University

More information

An Unsupervised Approach for Combining Scores of Outlier Detection Techniques, Based on Similarity Measures

An Unsupervised Approach for Combining Scores of Outlier Detection Techniques, Based on Similarity Measures An Unsupervised Approach for Combining Scores of Outlier Detection Techniques, Based on Similarity Measures José Ramón Pasillas-Díaz, Sylvie Ratté Presenter: Christoforos Leventis 1 Basic concepts Outlier

More information

Analyzing Outlier Detection Techniques with Hybrid Method

Analyzing Outlier Detection Techniques with Hybrid Method Analyzing Outlier Detection Techniques with Hybrid Method Shruti Aggarwal Assistant Professor Department of Computer Science and Engineering Sri Guru Granth Sahib World University. (SGGSWU) Fatehgarh Sahib,

More information

Unsupervised Learning

Unsupervised Learning Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/28/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

More information

Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy

Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Lutfi Fanani 1 and Nurizal Dwi Priandani 2 1 Department of Computer Science, Brawijaya University, Malang, Indonesia. 2 Department

More information

Normalization based K means Clustering Algorithm

Normalization based K means Clustering Algorithm Normalization based K means Clustering Algorithm Deepali Virmani 1,Shweta Taneja 2,Geetika Malhotra 3 1 Department of Computer Science,Bhagwan Parshuram Institute of Technology,New Delhi Email:deepalivirmani@gmail.com

More information

Indexing by Shape of Image Databases Based on Extended Grid Files

Indexing by Shape of Image Databases Based on Extended Grid Files Indexing by Shape of Image Databases Based on Extended Grid Files Carlo Combi, Gian Luca Foresti, Massimo Franceschet, Angelo Montanari Department of Mathematics and ComputerScience, University of Udine

More information

University of Florida CISE department Gator Engineering. Clustering Part 2

University of Florida CISE department Gator Engineering. Clustering Part 2 Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical

More information

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data

More information

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework IEEE SIGNAL PROCESSING LETTERS, VOL. XX, NO. XX, XXX 23 An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework Ji Won Yoon arxiv:37.99v [cs.lg] 3 Jul 23 Abstract In order to cluster

More information

University of Florida CISE department Gator Engineering. Clustering Part 5

University of Florida CISE department Gator Engineering. Clustering Part 5 Clustering Part 5 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville SNN Approach to Clustering Ordinary distance measures have problems Euclidean

More information

Topology Homework 3. Section Section 3.3. Samuel Otten

Topology Homework 3. Section Section 3.3. Samuel Otten Topology Homework 3 Section 3.1 - Section 3.3 Samuel Otten 3.1 (1) Proposition. The intersection of finitely many open sets is open and the union of finitely many closed sets is closed. Proof. Note that

More information

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan Working with Unlabeled Data Clustering Analysis Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan chanhl@mail.cgu.edu.tw Unsupervised learning Finding centers of similarity using

More information

Faster Clustering with DBSCAN

Faster Clustering with DBSCAN Faster Clustering with DBSCAN Marzena Kryszkiewicz and Lukasz Skonieczny Institute of Computer Science, Warsaw University of Technology, Nowowiejska 15/19, 00-665 Warsaw, Poland Abstract. Grouping data

More information

CHAPTER VII INDEXED K TWIN NEIGHBOUR CLUSTERING ALGORITHM 7.1 INTRODUCTION

CHAPTER VII INDEXED K TWIN NEIGHBOUR CLUSTERING ALGORITHM 7.1 INTRODUCTION CHAPTER VII INDEXED K TWIN NEIGHBOUR CLUSTERING ALGORITHM 7.1 INTRODUCTION Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called cluster)

More information

6. Dicretization methods 6.1 The purpose of discretization

6. Dicretization methods 6.1 The purpose of discretization 6. Dicretization methods 6.1 The purpose of discretization Often data are given in the form of continuous values. If their number is huge, model building for such data can be difficult. Moreover, many

More information

Digital Image Processing. Prof. P.K. Biswas. Department of Electronics & Electrical Communication Engineering

Digital Image Processing. Prof. P.K. Biswas. Department of Electronics & Electrical Communication Engineering Digital Image Processing Prof. P.K. Biswas Department of Electronics & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Image Segmentation - III Lecture - 31 Hello, welcome

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

Clustering. Supervised vs. Unsupervised Learning

Clustering. Supervised vs. Unsupervised Learning Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised

More information

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm DATA MINING LECTURE 7 Hierarchical Clustering, DBSCAN The EM Algorithm CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related)

More information

Clustering: Centroid-Based Partitioning

Clustering: Centroid-Based Partitioning Clustering: Centroid-Based Partitioning Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong 1 / 29 Y Tao Clustering: Centroid-Based Partitioning In this lecture, we

More information

An Efficient Method of Partitioning High Volumes of Multidimensional Data for Parallel Clustering Algorithms

An Efficient Method of Partitioning High Volumes of Multidimensional Data for Parallel Clustering Algorithms RESEARCH ARTICLE An Efficient Method of Partitioning High Volumes of Multidimensional Data for Parallel Clustering Algorithms OPEN ACCESS Saraswati Mishra 1, Avnish Chandra Suman 2 1 Centre for Development

More information

Analysis and Extensions of Popular Clustering Algorithms

Analysis and Extensions of Popular Clustering Algorithms Analysis and Extensions of Popular Clustering Algorithms Renáta Iváncsy, Attila Babos, Csaba Legány Department of Automation and Applied Informatics and HAS-BUTE Control Research Group Budapest University

More information

Supervised vs unsupervised clustering

Supervised vs unsupervised clustering Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful

More information

Clustering. K-means clustering

Clustering. K-means clustering Clustering K-means clustering Clustering Motivation: Identify clusters of data points in a multidimensional space, i.e. partition the data set {x 1,...,x N } into K clusters. Intuition: A cluster is a

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,

More information

NDoT: Nearest Neighbor Distance Based Outlier Detection Technique

NDoT: Nearest Neighbor Distance Based Outlier Detection Technique NDoT: Nearest Neighbor Distance Based Outlier Detection Technique Neminath Hubballi 1, Bidyut Kr. Patra 2, and Sukumar Nandi 1 1 Department of Computer Science & Engineering, Indian Institute of Technology

More information

Spectral Surface Reconstruction from Noisy Point Clouds

Spectral Surface Reconstruction from Noisy Point Clouds Spectral Surface Reconstruction from Noisy Point Clouds 1. Briefly summarize the paper s contributions. Does it address a new problem? Does it present a new approach? Does it show new types of results?

More information

数据挖掘 Introduction to Data Mining

数据挖掘 Introduction to Data Mining 数据挖掘 Introduction to Data Mining Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2019 S8700113C 1 Introduction Last week: Association Analysis

More information

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability Decision trees A decision tree is a method for classification/regression that aims to ask a few relatively simple questions about an input and then predicts the associated output Decision trees are useful

More information

Decomposing Coverings and the Planar Sensor Cover Problem

Decomposing Coverings and the Planar Sensor Cover Problem Intro. previous work. Restricted Strip Cover Decomposing Multiple Coverings Decomposing Coverings and the Planar Sensor Cover Problem Matt Gibson Kasturi Varadarajan presented by Niv Gafni 2012-12-12 Intro.

More information

Knowledge Discovery in Databases

Knowledge Discovery in Databases Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Lecture notes Knowledge Discovery in Databases Summer Semester 2012 Lecture 8: Clustering

More information

Univariate and Multivariate Decision Trees

Univariate and Multivariate Decision Trees Univariate and Multivariate Decision Trees Olcay Taner Yıldız and Ethem Alpaydın Department of Computer Engineering Boğaziçi University İstanbul 80815 Turkey Abstract. Univariate decision trees at each

More information

K-Means Clustering With Initial Centroids Based On Difference Operator

K-Means Clustering With Initial Centroids Based On Difference Operator K-Means Clustering With Initial Centroids Based On Difference Operator Satish Chaurasiya 1, Dr.Ratish Agrawal 2 M.Tech Student, School of Information and Technology, R.G.P.V, Bhopal, India Assistant Professor,

More information

CS 340 Lec. 4: K-Nearest Neighbors

CS 340 Lec. 4: K-Nearest Neighbors CS 340 Lec. 4: K-Nearest Neighbors AD January 2011 AD () CS 340 Lec. 4: K-Nearest Neighbors January 2011 1 / 23 K-Nearest Neighbors Introduction Choice of Metric Overfitting and Underfitting Selection

More information

Clustering. Lecture 6, 1/24/03 ECS289A

Clustering. Lecture 6, 1/24/03 ECS289A Clustering Lecture 6, 1/24/03 What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed

More information

Introduction to Mobile Robotics

Introduction to Mobile Robotics Introduction to Mobile Robotics Clustering Wolfram Burgard Cyrill Stachniss Giorgio Grisetti Maren Bennewitz Christian Plagemann Clustering (1) Common technique for statistical data analysis (machine learning,

More information

CHAPTER 3 TUMOR DETECTION BASED ON NEURO-FUZZY TECHNIQUE

CHAPTER 3 TUMOR DETECTION BASED ON NEURO-FUZZY TECHNIQUE 32 CHAPTER 3 TUMOR DETECTION BASED ON NEURO-FUZZY TECHNIQUE 3.1 INTRODUCTION In this chapter we present the real time implementation of an artificial neural network based on fuzzy segmentation process

More information

HW4 VINH NGUYEN. Q1 (6 points). Chapter 8 Exercise 20

HW4 VINH NGUYEN. Q1 (6 points). Chapter 8 Exercise 20 HW4 VINH NGUYEN Q1 (6 points). Chapter 8 Exercise 20 a. For each figure, could you use single link to find the patterns represented by the nose, eyes and mouth? Explain? First, a single link is a MIN version

More information

APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE

APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE Sundari NallamReddy, Samarandra Behera, Sanjeev Karadagi, Dr. Anantha Desik ABSTRACT: Tata

More information

COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS

COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS Mariam Rehman Lahore College for Women University Lahore, Pakistan mariam.rehman321@gmail.com Syed Atif Mehdi University of Management and Technology Lahore,

More information

Clustering Lecture 4: Density-based Methods

Clustering Lecture 4: Density-based Methods Clustering Lecture 4: Density-based Methods Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning

More information

CHAPTER 4: CLUSTER ANALYSIS

CHAPTER 4: CLUSTER ANALYSIS CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis

More information

This research aims to present a new way of visualizing multi-dimensional data using generalized scatterplots by sensitivity coefficients to highlight

This research aims to present a new way of visualizing multi-dimensional data using generalized scatterplots by sensitivity coefficients to highlight This research aims to present a new way of visualizing multi-dimensional data using generalized scatterplots by sensitivity coefficients to highlight local variation of one variable with respect to another.

More information

Classification and Regression via Integer Optimization

Classification and Regression via Integer Optimization Classification and Regression via Integer Optimization Dimitris Bertsimas Romy Shioda September, 2002 Abstract Motivated by the significant advances in integer optimization in the past decade, we introduce

More information

Use of Multi-category Proximal SVM for Data Set Reduction

Use of Multi-category Proximal SVM for Data Set Reduction Use of Multi-category Proximal SVM for Data Set Reduction S.V.N Vishwanathan and M Narasimha Murty Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560 012, India Abstract.

More information

Particle Swarm Optimization applied to Pattern Recognition

Particle Swarm Optimization applied to Pattern Recognition Particle Swarm Optimization applied to Pattern Recognition by Abel Mengistu Advisor: Dr. Raheel Ahmad CS Senior Research 2011 Manchester College May, 2011-1 - Table of Contents Introduction... - 3 - Objectives...

More information

COMP 465: Data Mining Still More on Clustering

COMP 465: Data Mining Still More on Clustering 3/4/015 Exercise COMP 465: Data Mining Still More on Clustering Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Describe each of the following

More information

Clustering. (Part 2)

Clustering. (Part 2) Clustering (Part 2) 1 k-means clustering 2 General Observations on k-means clustering In essence, k-means clustering aims at minimizing cluster variance. It is typically used in Euclidean spaces and works

More information

DATA MINING II - 1DL460

DATA MINING II - 1DL460 DATA MINING II - 1DL460 Spring 2016 A second course in data mining!! http://www.it.uu.se/edu/course/homepage/infoutv2/vt16 Kjell Orsborn! Uppsala Database Laboratory! Department of Information Technology,

More information

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018)

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018) 1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should

More information

ADVANCED MACHINE LEARNING MACHINE LEARNING. Kernel for Clustering kernel K-Means

ADVANCED MACHINE LEARNING MACHINE LEARNING. Kernel for Clustering kernel K-Means 1 MACHINE LEARNING Kernel for Clustering ernel K-Means Outline of Today s Lecture 1. Review principle and steps of K-Means algorithm. Derive ernel version of K-means 3. Exercise: Discuss the geometrical

More information

Gene expression & Clustering (Chapter 10)

Gene expression & Clustering (Chapter 10) Gene expression & Clustering (Chapter 10) Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species Dynamic programming Approximate pattern matching

More information

Improving Cluster Method Quality by Validity Indices

Improving Cluster Method Quality by Validity Indices Improving Cluster Method Quality by Validity Indices N. Hachani and H. Ounalli Faculty of Sciences of Bizerte, Tunisia narjes hachani@yahoo.fr Faculty of Sciences of Tunis, Tunisia habib.ounalli@fst.rnu.tn

More information

A New Online Clustering Approach for Data in Arbitrary Shaped Clusters

A New Online Clustering Approach for Data in Arbitrary Shaped Clusters A New Online Clustering Approach for Data in Arbitrary Shaped Clusters Richard Hyde, Plamen Angelov Data Science Group, School of Computing and Communications Lancaster University Lancaster, LA1 4WA, UK

More information

How does the ROI affect the thresholding?

How does the ROI affect the thresholding? How does the ROI affect the thresholding? Micro-computed tomography can be applied for the visualization of the inner structure of a material or biological tissue in a non-destructive manner. Besides visualization,

More information