Building a Concept Hierarchy from a Distance Matrix

Size: px
Start display at page:

Download "Building a Concept Hierarchy from a Distance Matrix"

Transcription

1 Building a Concept Hierarchy from a Distance Matrix Huang-Cheng Kuo 1 and Jen-Peng Huang 2 1 Department of Computer Science and Information Engineering National Chiayi University, Taiwan 600 hckuo@mail.ncyu.edu.tw 2 Department of Information Management Southern Taiwan University of Technology, Taiwan 710 jehuang@mail.stut.edu.tw Abstract. Concept hierarchies are important in many generalized data mining applications, such as multiple level association rule mining. In literature, concept hierarchy is usually given by domain experts. In this paper, we propose algorithms to automatically build a concept hierarchy from a provided distance matrix. Our approach is modifying the traditional hierarchical clustering algorithms. For the purpose of algorithm evaluation, a distance matrix is derived from the concept hierarchy built by our algorithm. Root mean squared error between the provided distant matrix and the derived distance matrix is used as evaluation criterion. We compare the traditional hierarchical clustering and our modified algorithm under three strategies of computing cluster distance, namely single link, average link, and complete link. Empirical results show that the traditional algorithm under complete link strategy performs better than the other strategies. Our modified algorithms perform almost the same under the three strategies; and our algorithms perform better than the traditional algorithms under various situations. 1 Introduction Generalization on nominal data is frequently studied, such as mining multiple level association rules, by means of a concept hierarchy [8,5,3]. In a concept hierarchy of categories, the similarity between two categories is reflected by the length of the path that connecting the categories. The similarity between two concepts is not necessary unchanged all the time. Consider the scenario that lawyers and doctors have common habits in a certain period of time. However, these common habits may change in the next time period. So, with respect to habit, the similarity between lawyer and doctor is changing. Concept hierarchies used in generalized data mining applications are usually given by domain experts. However, it is difficult to maintain a concept hierarchy when the number of categories is huge or when the characteristics of the data are changing frequently. Therefore, there is a need for an algorithm to automatically build a concept hierarchy of a set of nominal values. In this paper, our proposed approach is to modify traditional hierarchical clustering algorithms for this purpose. The input to our method is a distance matrix which

2 2 Huang-Cheng Kuo and Jen-Peng Huang can be computed by CACTUS [2] for tabular data, or by Jaccard coefficient for transactional data. Our contribution in modifying the traditional agglomerative hierarchical clustering algorithm, called traditional clustering algorithm in the rest of this paper, is twofold. (1) The traditional clustering algorithm builds a binary tree. While, in a concept hierarchy, it is very likely that more than two concepts share a common general concept. In order to capture such characteristics of concept hierarchy, our modified agglomerative hierarchical clustering algorithm allows more than two clusters to merge into a cluster. (2) The leaves of the binary tree generated by the traditional clustering algorithm are not necessary at the same level. This may cause an inversion problem. Consider the merging of a deep subtree and a single-node subtree. The longest path between two leaves on the deep subtree is longer than the path from a leaf on the deep subtree and the leaf of the single-node subtree. We solve this inversion problem by keeping all the leaves at the same level. In addition to the modified algorithm, we devise a novel measure metrics for the built concept hierarchy by deriving a distance matrix from the concept hierarchy. Root mean squared error between the input distance matrix and the derived distance matrix can then be computed. The paper is organized as follows. In section 2, we illustrate the need for a concept hierarchy. In section 3, the measurement for the algorithms is presented and the way to obtain the input distance matrix is discussed. Section 4 discusses the algorithms to build a concept hierarchy from a given distance matrix among the categories. Experiment description and the result are in section 5. The conclusion is in section 6. 2 The Need for a Concept Hierarchy Concept hierarchies, represented by taxonomies or sets of mapping rules, can be provided by domain experts. Following are examples of mapping rules [4] for Status and Income attributes: Status: {freshman, sophomore, junior, senior} undergraduate Status: {graduate, undergraduate} student Income: {1,000 25,000} low income Income: {25,001 50,000} mid income Multiple level association rule mining uses support for obtaining frequent itemsets [5,8]. By increasing the level of the concept hierarchy, support for an itemset increases with a possible increase in frequency. Other applications, such as data warehouse, require dimension tables for drill-down and roll-up operations [3]. Automatically constructing a concept hierarchy for a huge number of categories would relieve the burden of a domain expert.

3 3 Measurements for Concept Hierarchy Building Concept Hierarchy 3 There are some metrics for the quality of clustering. For example, how well the intra-cluster distance is minimized, how well the inter-cluster distance is maximized, how high the accuracy is when the algorithm performs on a pre-classified test dataset.but,therearenoexisting metrics for measuring the quality of a concept hierarchy automatically built by an algorithm. The above mentioned metrics for clustering algorithms are not applicable to concept hierarchies, since the number of clusters is not concerned for a concept hierarchy building algorithm. Input to the algorithms is a distance matrix, denoted as provided distance matrix. Output from the algorithm is a concept hierarchy. Since a correct concept hierarchy is usually not available, we propose an indirect measurement. In order to compare with the provided distance matrix, we convert the output concept hierarchy into a distance matrix, denoted as derived distance matrix. An element of the derived distance matrix is the length of path from a category to another. Min-max normalization is applied to the derived distance matrix so that it has the same scale with the provided distance matrix. Root mean squared error between the two distance matrices can be computed as the quality of the output concept hierarchy. Definition: Concept Hierarchy Quality Index Given a distance matrix, M provided, over a set of categories {c 1,c 2,...,c n }, the quality index of a concept hierarchy with respect to M provided is the root mean squared error between M provided and M derived,wherem derived (c i,c j ) is the normalized length of path from c i to c j in the concept hierarchy. Root mean squared error is defined as ci,c j (M provided (c i,c j ) M derived (c i,c j )) 2 There are methods for different types of data to obtain distance matrix. For data in relational tables, we adopt the similarity definition from CAC- TUS [2] with simplification. After obtaining the similarities between pairs of categories, we would like to normalize the similarities into distances in the range of ɛ and 1. Since the distance between two different categories should be greater than zero, we denote ɛ as the expected distance between a category and its most similar category [7]. 4 Algorithms It is intuitive to use traditional agglomerative hierarchical clustering for building a concept hierarchy of categorical objects. We first describe the traditional

4 4 Huang-Cheng Kuo and Jen-Peng Huang algorithm and point out two drawbacks of the algorithms. Then, we propose a modified version of hierarchical clustering algorithm. 4.1 Traditional agglomerative hierarchical clustering Hierarchical clustering treats each object as a singleton cluster, and then successively merges clusters until all objects have been merged into a single remaining cluster. The dendrogram built in this way is a binary tree. Leaf nodes in such a tree are likely at different levels of the tree. In this paper, we study the three strategies for computing the distance between a pair of clusters, namely, single link [9], average link, and complete link [6]. Agglomerative hierarchical clustering merges the pair of clusters with smallest distance into a cluster. The three strategies define the distance between a pair of clusters. The distances, dist single,dist average,anddist complete, between cluster C 1 and C 2 are defined below. dist single (C 1,C 2 )= dist average (C 1,C 2 )= min dist(x, y) x C 1,y C 2 avg dist(x, y) x C 1,y C 2 dist complete (C 1,C 2 )= max dist(x, y) x C 1,y C 2 With regard to building a concept hierarchy tree, there are two major drawbacks for traditional hierarchical clustering algorithms. First, the degree of a node can be larger than 2. For example, in figure 1, there are more than two kinds of s, and they are all specific concepts of. However, traditional hierarchical clustering algorithm tends to build a binary tree concept hierarchy. drink alcohol 3 beverage whisky beer coke grape apple orange lemonade Fig. 1. A Concept Hierarchy A possible way for perceiving the similarity between two categories in a concept hierarchy is the length of the path connecting the two categories. So, the second drawback is that the distance relationship among the categories might not be preserved with the traditional algorithm. In figure 2, the path

5 Building Concept Hierarchy 5 drink alcohol 3 beverage whisky beer coke grape-or-apple orange-or-lemonade grape apple orange lemonade Fig. 2. Concept Hierarchy Built by Traditional Algorithm from grape to orange is longer than the path from grape to coke. This is in contradiction with the intention specified by the users in figure 1. In order to solve or to improve the drawbacks, we propose modified hierarchical clustering algorithms that have two important features: (1) leaves of the tree are at the same levels, (2) the degree of an internal node can be larger than 2, i.e., a node joins another node in the upper level. 4.2 Multiple-way agglomerative hierarchical clustering We propose a new hierarchical clustering algorithm to improve the traditional hierarchical clustering algorithm. Initially all items are singleton clusters, and all items are leaf nodes. In order to guarantee that all leaves are at the same level, the algorithm merges or joins clusters level by level. In other words, clusters of the upper level will not be merged until every cluster of the current level has a parent cluster. Two clusters of the same level can be merged and a new cluster is created as the parent of the two clusters. The newly created cluster is placed at the upper level of the tree. We propose a new operation that a cluster can join a cluster at the upper level, such that, the cluster of the upper level is the parent of the cluster of a current level. The process continues until the root of the tree is created. Two clusters of the same level can be merged and a new cluster is created as the parent of the two clusters. The newly created cluster is placed at the upper level of the tree. We propose a new operation that a cluster can join a cluster at the upper level. Such that, cluster of the upper level is the parent of the cluster of current level. The process continues until the root of the tree is created. In the following discussion, a node is a cluster that contains one or more categorical objects. First, we discuss the join operator for hierarchical clustering. Consider the four clusters, A, B, C, and D, in figure 3. Assume that dist(a, B) is the smallest among all pairs of clusters, and A and B are merged

6 6 Huang-Cheng Kuo and Jen-Peng Huang into cluster E. Assume that either dist(a, C) or dist(b, C) is less than dist(c, D). In other words, cluster C is better merged with A or B than merged with D. In traditional hierarchical clustering algorithm, C is either merged with D or E. Merging with D is not good for C. Merging with E may be good. But, (1) if dist(a, B), dist(a, C) and dist(b, C) are about the same, the clustering result makes C quite different from A and B; (2) leaf nodes in the subtree rooted at C will not be at the same level of the whole tree. Fig. 3. Hierarchical Clustering with Join Operator 5 Experiment Results In this paper, we evaluate algorithms with generated data. A provided distance matrix of n objects is generated with the assistance of a tree, which is built bottom up. The data generation is described in the following steps. 1. Let each object be a leaf node. 2. A number of nodes of the same level are grouped together and an internal node is created as the parent node of the nodes. This process continues until the number of internal nodes of a level is one. In other words, the root is created. Any internal node of the tree has at least two children. The degree of an internal node is uniformly distributed in the interval of two and span, a given parameter. 3. The distance between any pair of leaf nodes is prepositional to the length of their path in the tree. The distances are divided by the length of the longest path, i.e., are normalized to one. 4. Noise is applied on the distance matrix. Uniformly distributed numbers between 1 noise and 1 + noise are multiplied to the distance values. In the experiment, we generate distance matrices where noise =0.1 and noise = 0.2. The distance values, after the noise is applied, are truncated to the interval of zero and one. The tree generated in the step 2 can be regarded as a perfect concept hierarchy. Since there is noise in the provided distance matrix, so the quality

7 Building Concept Hierarchy 7 Fig. 4. Experiment Results for noise = 0.1 index of the perfect concept hierarchy with respect to the provided distance matrix is not zero. In the experiment, we illustrate the performance of the algorithms under three parameters, namely noise, span, andnumber of items. Foreachparameter combination, root mean squared error values of 30 tests are averaged. For generating provided distance matrices, we build trees with four intervals of spans: [2, 4], [2, 6], [2, 8], and [2, 10]. Figure 4 depicts the quality indices for the algorithms where noise for the input datasets is 0.1. The lines NAL, NCL, and NSL represent for the performance for our new modified algorithm under the strategies average link, complete link, and single link. The lines TAL, TCL, and TSL represent for the performance for traditional algorithm under the three strategies. The line Perf represents the quality index of the perfect concept hierarchy. The results show that our proposed methods perform much better than the traditional agglomerative hierarchical clustering algorithm for all the input distance matrices. However, the strategy of cluster-to-cluster distance

8 8 Huang-Cheng Kuo and Jen-Peng Huang does not affect the result in our algorithms. Whereas, for the traditional algorithms, the single link strategy performs better than the other two strategies. The reason might be that we generate the input distance matrices from trees. All the algorithms perform worse for wider spans. Comparing the performance for data with different noise levels, all the algorithms perform worse for noisier data. Fig. 5. Experiment Results for noise = 0.2 Figure 5 depicts the quality indices for the algorithms where noise for the input datasets is 0.2. Compare to the result from input data with noise 0.1, root mean squared error increases from 0.07 to 0.12 for our new algorithms where span = [2, 4]. Similar comparisons can be observed for different spans.

9 6 Conclusions and Future Works Building Concept Hierarchy 9 Concept hierarchy is a useful mechanism for representing the generalization relationships among concepts. So that, multiple level association rule mining can be conducted. In this paper, we build a concept hierarchy from a distance matrix with the goal that the distance between any pair of concepts is preserved as much as possible. We adopt the traditional agglomerative hierarchical clustering with two major modifications: (1) not only a cluster merges with another cluster, but also a cluster joins another cluster, (2) leaf nodes are all at the same level of the concept hierarchy. Empirical results show that our modified algorithm performs much better than the original algorithm. Some areas of this study warrant further research: (1) A frequently questioned drawback of hierarchical clustering algorithm is that it does not rollback the merge or division. If re-assignment of an object from a cluster to another is allowed in certain stages, the clustering result may be improved. (2) All the lengths, i.e. weights on edges of the concept hierarchy, are the same. If weights on the edges of the concept hierarchy can be trained, the distance relationship between concepts can be better preserved. References 1. U. M. Fayyad, K. B. Irani, Multi-interval Discretization of Continuous-Valued Attributes for Classification Learning, In Proceedings of the thirteenth International Joint Conference on Artificial Intelligence, 1993, pp V. Ganti, J. Gehrke, and R. Ramakrishnan, CACTUS-Clustering Categorical Data Using Summaries, ACM KDD, 1999, pp J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Pub., J. Han and Y. Fu, Dynamic Generation and Refinement of Concept Hierarchies for Knowledge Discovery in Databases, Workshop on Knowledge Discovery in Databases, 1994, pp J. Han and Y. Fu, Discovery of Multiple-Level Association Rules from Large Databases, VLDB Conference, 1995, pp A. K. Jain, R. C. Dubes, Algorithms for Clustering Data, Prentice-Hall, Inc., Huang-Cheng Kuo, Yi-Sen Lin, Jen-Peng Huang, Distance Preserving Mapping from Categories to Numbers for Indexing, International Conference on Knowledge-Based Intelligent Information Engineering Systems, Lecture Notes in Artificial Intelligence, Vol. 3214, 2004, pp R. Srikant and R. Agrawal, Mining Generalized Association Rules, VLDB Conference, 1995, pp R. Sibson, SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method, Computer Journal, Vol. 16, No. 1, 1972, pp

Multi-Modal Data Fusion: A Description

Multi-Modal Data Fusion: A Description Multi-Modal Data Fusion: A Description Sarah Coppock and Lawrence J. Mazlack ECECS Department University of Cincinnati Cincinnati, Ohio 45221-0030 USA {coppocs,mazlack}@uc.edu Abstract. Clustering groups

More information

USING SOFT COMPUTING TECHNIQUES TO INTEGRATE MULTIPLE KINDS OF ATTRIBUTES IN DATA MINING

USING SOFT COMPUTING TECHNIQUES TO INTEGRATE MULTIPLE KINDS OF ATTRIBUTES IN DATA MINING USING SOFT COMPUTING TECHNIQUES TO INTEGRATE MULTIPLE KINDS OF ATTRIBUTES IN DATA MINING SARAH COPPOCK AND LAWRENCE MAZLACK Computer Science, University of Cincinnati, Cincinnati, Ohio 45220 USA E-mail:

More information

Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm

Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm Expert Systems: Final (Research Paper) Project Daniel Josiah-Akintonde December

More information

Appropriate Item Partition for Improving the Mining Performance

Appropriate Item Partition for Improving the Mining Performance Appropriate Item Partition for Improving the Mining Performance Tzung-Pei Hong 1,2, Jheng-Nan Huang 1, Kawuu W. Lin 3 and Wen-Yang Lin 1 1 Department of Computer Science and Information Engineering National

More information

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science Data Mining Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology Department of Computer Science 06 07 Department of CS - DM - UHD Road map Cluster Analysis: Basic

More information

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Abstract Mrs. C. Poongodi 1, Ms. R. Kalaivani 2 1 PG Student, 2 Assistant Professor, Department of

More information

Road map. Basic concepts

Road map. Basic concepts Clustering Basic concepts Road map K-means algorithm Representation of clusters Hierarchical clustering Distance functions Data standardization Handling mixed attributes Which clustering algorithm to use?

More information

Unsupervised Learning

Unsupervised Learning Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised

More information

Lecture Notes for Chapter 7. Introduction to Data Mining, 2 nd Edition. by Tan, Steinbach, Karpatne, Kumar

Lecture Notes for Chapter 7. Introduction to Data Mining, 2 nd Edition. by Tan, Steinbach, Karpatne, Kumar Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Hierarchical Clustering Produces a set

More information

CS570: Introduction to Data Mining

CS570: Introduction to Data Mining CS570: Introduction to Data Mining Scalable Clustering Methods: BIRCH and Others Reading: Chapter 10.3 Han, Chapter 9.5 Tan Cengiz Gunay, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei.

More information

Mining Quantitative Association Rules on Overlapped Intervals

Mining Quantitative Association Rules on Overlapped Intervals Mining Quantitative Association Rules on Overlapped Intervals Qiang Tong 1,3, Baoping Yan 2, and Yuanchun Zhou 1,3 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China {tongqiang,

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

Clustering Part 3. Hierarchical Clustering

Clustering Part 3. Hierarchical Clustering Clustering Part Dr Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Hierarchical Clustering Two main types: Agglomerative Start with the points

More information

Lesson 3. Prof. Enza Messina

Lesson 3. Prof. Enza Messina Lesson 3 Prof. Enza Messina Clustering techniques are generally classified into these classes: PARTITIONING ALGORITHMS Directly divides data points into some prespecified number of clusters without a hierarchical

More information

A Novel method for Frequent Pattern Mining

A Novel method for Frequent Pattern Mining A Novel method for Frequent Pattern Mining K.Rajeswari #1, Dr.V.Vaithiyanathan *2 # Associate Professor, PCCOE & Ph.D Research Scholar SASTRA University, Tanjore, India 1 raji.pccoe@gmail.com * Associate

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN: IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 20131 Improve Search Engine Relevance with Filter session Addlin Shinney R 1, Saravana Kumar T

More information

Clustering Lecture 3: Hierarchical Methods

Clustering Lecture 3: Hierarchical Methods Clustering Lecture 3: Hierarchical Methods Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced

More information

The Effect of Word Sampling on Document Clustering

The Effect of Word Sampling on Document Clustering The Effect of Word Sampling on Document Clustering OMAR H. KARAM AHMED M. HAMAD SHERIN M. MOUSSA Department of Information Systems Faculty of Computer and Information Sciences University of Ain Shams,

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No 08 Cluster Analysis Naeem Ahmed Email: naeemmahoto@gmailcom Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Outline

More information

Clustering of Data with Mixed Attributes based on Unified Similarity Metric

Clustering of Data with Mixed Attributes based on Unified Similarity Metric Clustering of Data with Mixed Attributes based on Unified Similarity Metric M.Soundaryadevi 1, Dr.L.S.Jayashree 2 Dept of CSE, RVS College of Engineering and Technology, Coimbatore, Tamilnadu, India 1

More information

Efficient Mining of Generalized Negative Association Rules

Efficient Mining of Generalized Negative Association Rules 2010 IEEE International Conference on Granular Computing Efficient Mining of Generalized egative Association Rules Li-Min Tsai, Shu-Jing Lin, and Don-Lin Yang Dept. of Information Engineering and Computer

More information

Clustering part II 1

Clustering part II 1 Clustering part II 1 Clustering What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods 2 Partitioning Algorithms:

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/28/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

More information

Cluster Analysis. Ying Shen, SSE, Tongji University

Cluster Analysis. Ying Shen, SSE, Tongji University Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

Hierarchical Clustering

Hierarchical Clustering What is clustering Partitioning of a data set into subsets. A cluster is a group of relatively homogeneous cases or observations Hierarchical Clustering Mikhail Dozmorov Fall 2016 2/61 What is clustering

More information

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey G. Shivaprasad, N. V. Subbareddy and U. Dinesh Acharya

More information

Unsupervised Learning Hierarchical Methods

Unsupervised Learning Hierarchical Methods Unsupervised Learning Hierarchical Methods Road Map. Basic Concepts 2. BIRCH 3. ROCK The Principle Group data objects into a tree of clusters Hierarchical methods can be Agglomerative: bottom-up approach

More information

An Approach to Improve Quality of Document Clustering by Word Set Based Documenting Clustering Algorithm

An Approach to Improve Quality of Document Clustering by Word Set Based Documenting Clustering Algorithm ORIENTAL JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY An International Open Free Access, Peer Reviewed Research Journal www.computerscijournal.org ISSN: 0974-6471 December 2011, Vol. 4, No. (2): Pgs. 379-385

More information

Datasets Size: Effect on Clustering Results

Datasets Size: Effect on Clustering Results 1 Datasets Size: Effect on Clustering Results Adeleke Ajiboye 1, Ruzaini Abdullah Arshah 2, Hongwu Qin 3 Faculty of Computer Systems and Software Engineering Universiti Malaysia Pahang 1 {ajibraheem@live.com}

More information

A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM

A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM Akshay S. Agrawal 1, Prof. Sachin Bojewar 2 1 P.G. Scholar, Department of Computer Engg., ARMIET, Sapgaon, (India) 2 Associate Professor, VIT,

More information

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms

More information

Lectures for the course: Data Warehousing and Data Mining (IT 60107)

Lectures for the course: Data Warehousing and Data Mining (IT 60107) Lectures for the course: Data Warehousing and Data Mining (IT 60107) Week 1 Lecture 1 21/07/2011 Introduction to the course Pre-requisite Expectations Evaluation Guideline Term Paper and Term Project Guideline

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

More information

Combining Distributed Memory and Shared Memory Parallelization for Data Mining Algorithms

Combining Distributed Memory and Shared Memory Parallelization for Data Mining Algorithms Combining Distributed Memory and Shared Memory Parallelization for Data Mining Algorithms Ruoming Jin Department of Computer and Information Sciences Ohio State University, Columbus OH 4321 jinr@cis.ohio-state.edu

More information

A Review on Cluster Based Approach in Data Mining

A Review on Cluster Based Approach in Data Mining A Review on Cluster Based Approach in Data Mining M. Vijaya Maheswari PhD Research Scholar, Department of Computer Science Karpagam University Coimbatore, Tamilnadu,India Dr T. Christopher Assistant professor,

More information

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AN EFFICIENT APPROACH FOR TEXT MINING USING SIDE INFORMATION Kiran V. Gaidhane*, Prof. L. H. Patil, Prof. C. U. Chouhan DOI: 10.5281/zenodo.58632

More information

Using Association Rules for Better Treatment of Missing Values

Using Association Rules for Better Treatment of Missing Values Using Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine Intelligence Group) National University

More information

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery : Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Hong Cheng Philip S. Yu Jiawei Han University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center {hcheng3, hanj}@cs.uiuc.edu,

More information

Keywords: clustering algorithms, unsupervised learning, cluster validity

Keywords: clustering algorithms, unsupervised learning, cluster validity Volume 6, Issue 1, January 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Clustering Based

More information

Keshavamurthy B.N., Mitesh Sharma and Durga Toshniwal

Keshavamurthy B.N., Mitesh Sharma and Durga Toshniwal Keshavamurthy B.N., Mitesh Sharma and Durga Toshniwal Department of Electronics and Computer Engineering, Indian Institute of Technology, Roorkee, Uttarkhand, India. bnkeshav123@gmail.com, mitusuec@iitr.ernet.in,

More information

Agglomerative clustering on vertically partitioned data

Agglomerative clustering on vertically partitioned data Agglomerative clustering on vertically partitioned data R.Senkamalavalli Research Scholar, Department of Computer Science and Engg., SCSVMV University, Enathur, Kanchipuram 631 561 sengu_cool@yahoo.com

More information

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining P.Subhashini 1, Dr.G.Gunasekaran 2 Research Scholar, Dept. of Information Technology, St.Peter s University,

More information

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation

More information

Hierarchical Clustering

Hierarchical Clustering Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram A tree like diagram that records the sequences of merges or splits 0 0 0 00

More information

Supervised and Unsupervised Learning (II)

Supervised and Unsupervised Learning (II) Supervised and Unsupervised Learning (II) Yong Zheng Center for Web Intelligence DePaul University, Chicago IPD 346 - Data Science for Business Program DePaul University, Chicago, USA Intro: Supervised

More information

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42 Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth

More information

Adopting Data Mining Techniques on the Recommendations of Library Collections

Adopting Data Mining Techniques on the Recommendations of Library Collections Adopting Data Mining Techniques on the Recommendations of Library Collections Shu-Meng Huang a, Lu Wang b and Wan-Chih Wang c a Department of Information Management, Hsing Wu College, Taiwan (simon@mail.hwc.edu.tw)

More information

Similarity Matrix Based Session Clustering by Sequence Alignment Using Dynamic Programming

Similarity Matrix Based Session Clustering by Sequence Alignment Using Dynamic Programming Similarity Matrix Based Session Clustering by Sequence Alignment Using Dynamic Programming Dr.K.Duraiswamy Dean, Academic K.S.Rangasamy College of Technology Tiruchengode, India V. Valli Mayil (Corresponding

More information

Clustering Web Documents using Hierarchical Method for Efficient Cluster Formation

Clustering Web Documents using Hierarchical Method for Efficient Cluster Formation Clustering Web Documents using Hierarchical Method for Efficient Cluster Formation I.Ceema *1, M.Kavitha *2, G.Renukadevi *3, G.sripriya *4, S. RajeshKumar #5 * Assistant Professor, Bon Secourse College

More information

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio

More information

Distributed and clustering techniques for Multiprocessor Systems

Distributed and clustering techniques for Multiprocessor Systems www.ijcsi.org 199 Distributed and clustering techniques for Multiprocessor Systems Elsayed A. Sallam Associate Professor and Head of Computer and Control Engineering Department, Faculty of Engineering,

More information

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for

More information

Cluster analysis. Agnieszka Nowak - Brzezinska

Cluster analysis. Agnieszka Nowak - Brzezinska Cluster analysis Agnieszka Nowak - Brzezinska Outline of lecture What is cluster analysis? Clustering algorithms Measures of Cluster Validity What is Cluster Analysis? Finding groups of objects such that

More information

EFFICIENT ALGORITHM FOR MINING FREQUENT ITEMSETS USING CLUSTERING TECHNIQUES

EFFICIENT ALGORITHM FOR MINING FREQUENT ITEMSETS USING CLUSTERING TECHNIQUES EFFICIENT ALGORITHM FOR MINING FREQUENT ITEMSETS USING CLUSTERING TECHNIQUES D.Kerana Hanirex Research Scholar Bharath University Dr.M.A.Dorai Rangaswamy Professor,Dept of IT, Easwari Engg.College Abstract

More information

Mining Quantitative Maximal Hyperclique Patterns: A Summary of Results

Mining Quantitative Maximal Hyperclique Patterns: A Summary of Results Mining Quantitative Maximal Hyperclique Patterns: A Summary of Results Yaochun Huang, Hui Xiong, Weili Wu, and Sam Y. Sung 3 Computer Science Department, University of Texas - Dallas, USA, {yxh03800,wxw0000}@utdallas.edu

More information

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Anotai Siltepavet 1, Sukree Sinthupinyo 2 and Prabhas Chongstitvatana 3 1 Computer Engineering, Chulalongkorn University,

More information

XML Clustering by Bit Vector

XML Clustering by Bit Vector XML Clustering by Bit Vector WOOSAENG KIM Department of Computer Science Kwangwoon University 26 Kwangwoon St. Nowongu, Seoul KOREA kwsrain@kw.ac.kr Abstract: - XML is increasingly important in data exchange

More information

Preparation of Data Set for Data Mining Analysis using Horizontal Aggregation in SQL

Preparation of Data Set for Data Mining Analysis using Horizontal Aggregation in SQL Preparation of Data Set for Data Mining Analysis using Horizontal Aggregation in SQL Vidya Bodhe P.G. Student /Department of CE KKWIEER Nasik, University of Pune, India vidya.jambhulkar@gmail.com Abstract

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Slides From Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining

More information

UMCS. Annales UMCS Informatica AI 7 (2007) Data mining techniques for portal participants profiling. Danuta Zakrzewska *, Justyna Kapka

UMCS. Annales UMCS Informatica AI 7 (2007) Data mining techniques for portal participants profiling. Danuta Zakrzewska *, Justyna Kapka Annales Informatica AI 7 (2007) 153-161 Annales Informatica Lublin-Polonia Sectio AI http://www.annales.umcs.lublin.pl/ Data mining techniques for portal participants profiling Danuta Zakrzewska *, Justyna

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

CS573 Data Privacy and Security. Li Xiong

CS573 Data Privacy and Security. Li Xiong CS573 Data Privacy and Security Anonymizationmethods Li Xiong Today Clustering based anonymization(cont) Permutation based anonymization Other privacy principles Microaggregation/Clustering Two steps:

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms. Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey of Clustering

More information

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018)

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018) 1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should

More information

A Comparative study of CARM and BBT Algorithm for Generation of Association Rules

A Comparative study of CARM and BBT Algorithm for Generation of Association Rules A Comparative study of CARM and BBT Algorithm for Generation of Association Rules Rashmi V. Mane Research Student, Shivaji University, Kolhapur rvm_tech@unishivaji.ac.in V.R.Ghorpade Principal, D.Y.Patil

More information

Data Mining Clustering

Data Mining Clustering Data Mining Clustering Jingpeng Li 1 of 34 Supervised Learning F(x): true function (usually not known) D: training sample (x, F(x)) 57,M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0 0

More information

Project Participants

Project Participants Annual Report for Period:10/2004-10/2005 Submitted on: 06/21/2005 Principal Investigator: Yang, Li. Award ID: 0414857 Organization: Western Michigan Univ Title: Projection and Interactive Exploration of

More information

Hierarchical Clustering

Hierarchical Clustering Hierarchical Clustering Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram A tree-like diagram that records the sequences of merges

More information

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2

More information

Knowledge Discovery in Databases

Knowledge Discovery in Databases Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Lecture notes Knowledge Discovery in Databases Summer Semester 2012 Lecture 8: Clustering

More information

Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data

Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data D.Radha Rani 1, A.Vini Bharati 2, P.Lakshmi Durga Madhuri 3, M.Phaneendra Babu 4, A.Sravani 5 Department

More information

Chapter DM:II. II. Cluster Analysis

Chapter DM:II. II. Cluster Analysis Chapter DM:II II. Cluster Analysis Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster Analysis DM:II-1

More information

4. Ad-hoc I: Hierarchical clustering

4. Ad-hoc I: Hierarchical clustering 4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat Flat methods generate a single partition into k clusters. The number k of clusters has to be determined by the user ahead of time. Hierarchical

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

CHAPTER 4: CLUSTER ANALYSIS

CHAPTER 4: CLUSTER ANALYSIS CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis

More information

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s]

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s] Fast, single-pass K-means algorithms Fredrik Farnstrom Computer Science and Engineering Lund Institute of Technology, Sweden arnstrom@ucsd.edu James Lewis Computer Science and Engineering University of

More information

HIMIC : A Hierarchical Mixed Type Data Clustering Algorithm

HIMIC : A Hierarchical Mixed Type Data Clustering Algorithm HIMIC : A Hierarchical Mixed Type Data Clustering Algorithm R. A. Ahmed B. Borah D. K. Bhattacharyya Department of Computer Science and Information Technology, Tezpur University, Napam, Tezpur-784028,

More information

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10) CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification

More information

Multi-Modal Data Fusion. Sarah Coppock

Multi-Modal Data Fusion. Sarah Coppock Multi-Modal Data Fusion Sarah Coppock A Dissertation Proposal Submitted to the Department of Electrical and Computer Engineering and Computer Science in partial fulfillment of the requirements for the

More information

Algorithm for Efficient Multilevel Association Rule Mining

Algorithm for Efficient Multilevel Association Rule Mining Algorithm for Efficient Multilevel Association Rule Mining Pratima Gautam Department of computer Applications MANIT, Bhopal Abstract over the years, a variety of algorithms for finding frequent item sets

More information

Conceptual Review of clustering techniques in data mining field

Conceptual Review of clustering techniques in data mining field Conceptual Review of clustering techniques in data mining field Divya Shree ABSTRACT The marvelous amount of data produced nowadays in various application domains such as molecular biology or geography

More information

Multiple Classifier Fusion using k-nearest Localized Templates

Multiple Classifier Fusion using k-nearest Localized Templates Multiple Classifier Fusion using k-nearest Localized Templates Jun-Ki Min and Sung-Bae Cho Department of Computer Science, Yonsei University Biometrics Engineering Research Center 134 Shinchon-dong, Sudaemoon-ku,

More information

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

COMP 465: Data Mining Still More on Clustering

COMP 465: Data Mining Still More on Clustering 3/4/015 Exercise COMP 465: Data Mining Still More on Clustering Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Describe each of the following

More information

An ICA-Based Multivariate Discretization Algorithm

An ICA-Based Multivariate Discretization Algorithm An ICA-Based Multivariate Discretization Algorithm Ye Kang 1,2, Shanshan Wang 1,2, Xiaoyan Liu 1, Hokyin Lai 1, Huaiqing Wang 1, and Baiqi Miao 2 1 Department of Information Systems, City University of

More information

Parallel Association Rule Mining by Data De-Clustering to Support Grid Computing

Parallel Association Rule Mining by Data De-Clustering to Support Grid Computing Parallel Association Rule Mining by Data De-Clustering to Support Grid Computing Frank S.C. Tseng and Pey-Yen Chen Dept. of Information Management National Kaohsiung First University of Science and Technology

More information

Concept Tree Based Clustering Visualization with Shaded Similarity Matrices

Concept Tree Based Clustering Visualization with Shaded Similarity Matrices Syracuse University SURFACE School of Information Studies: Faculty Scholarship School of Information Studies (ischool) 12-2002 Concept Tree Based Clustering Visualization with Shaded Similarity Matrices

More information

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 23 November 2018 Overview Unsupervised Machine Learning overview Association

More information

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which

More information

Types of general clustering methods. Clustering Algorithms for general similarity measures. Similarity between clusters

Types of general clustering methods. Clustering Algorithms for general similarity measures. Similarity between clusters Types of general clustering methods Clustering Algorithms for general similarity measures agglomerative versus divisive algorithms agglomerative = bottom-up build up clusters from single objects divisive

More information

Clustering. Chapter 10 in Introduction to statistical learning

Clustering. Chapter 10 in Introduction to statistical learning Clustering Chapter 10 in Introduction to statistical learning 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 1 Clustering ² Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 1990). ² What

More information

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017)

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017) 1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining

More information

Mining Association Rules in Temporal Document Collections

Mining Association Rules in Temporal Document Collections Mining Association Rules in Temporal Document Collections Kjetil Nørvåg, Trond Øivind Eriksen, and Kjell-Inge Skogstad Dept. of Computer and Information Science, NTNU 7491 Trondheim, Norway Abstract. In

More information

An Improved Apriori Algorithm for Association Rules

An Improved Apriori Algorithm for Association Rules Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan

More information

Feature Selection for Clustering. Abstract. Clustering is an important data mining task. Data mining

Feature Selection for Clustering. Abstract. Clustering is an important data mining task. Data mining Feature Selection for Clustering Manoranjan Dash and Huan Liu School of Computing, National University of Singapore, Singapore. Abstract. Clustering is an important data mining task. Data mining often

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster

More information

APD tool: Mining Anomalous Patterns from Event Logs

APD tool: Mining Anomalous Patterns from Event Logs APD tool: Mining Anomalous Patterns from Event Logs Laura Genga 1, Mahdi Alizadeh 1, Domenico Potena 2, Claudia Diamantini 2, and Nicola Zannone 1 1 Eindhoven University of Technology 2 Università Politecnica

More information