Generalization of Hierarchical Crisp Clustering Algorithms to Fuzzy Logic

Similar documents
Machine Learning and Data Mining. Clustering. (adapted from) Prof. Alexander Ihler

Machine Learning: Symbol-based

Machine Learning. Unsupervised Learning. Manfred Huber

Comparisons and validation of statistical clustering techniques for microarray gene expression data. Outline. Microarrays.

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Chapter 18 Clustering

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Information Retrieval and Web Search Engines

Fuzzy Reasoning. Linguistic Variables

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

What is all the Fuzz about?

CSE 158 Lecture 6. Web Mining and Recommender Systems. Community Detection

QUESTION BANK FOR TEST

FUZZY INFERENCE SYSTEMS

Information Retrieval and Web Search Engines

Experiments with Supervised Fuzzy LVQ

Mining Social Network Graphs

10701 Machine Learning. Clustering

Indexing in Search Engines based on Pipelining Architecture using Single Link HAC

Unsupervised Learning and Clustering

Dinner for Two, Reprise

AN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION

American University in Cairo Date: Winter CSCE664 Advanced Data Mining Final Exam

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Algorithms for Bioinformatics

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection

4 KARNAUGH MAP MINIMIZATION

Application of fuzzy set theory in image analysis. Nataša Sladoje Centre for Image Analysis

Clustering CS 550: Machine Learning

CHAPTER 3 MAINTENANCE STRATEGY SELECTION USING AHP AND FAHP

Supervised and Unsupervised Learning (II)

Unsupervised Learning and Clustering

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Exploiting Parallelism to Support Scalable Hierarchical Clustering

Lecture 4 Hierarchical clustering

Exploratory Analysis: Clustering

What is all the Fuzz about?

A top down approach for determining the load profiles of consumers. Nimai

Visualizing Metaclusters

Cluster Analysis: Agglomerate Hierarchical Clustering

10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2

Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri

Section 4 General Factorial Tutorials

University of Florida CISE department Gator Engineering. Clustering Part 2

Fast Associative Memory

Hierarchical Clustering 4/5/17

Clustering: Overview and K-means algorithm

Cluster Analysis. Ying Shen, SSE, Tongji University

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

COSC 6339 Big Data Analytics. Fuzzy Clustering. Some slides based on a lecture by Prof. Shishir Shah. Edgar Gabriel Spring 2017.

Social-Network Graphs

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

COMP5318 Knowledge Management & Data Mining Assignment 1

Ranked Retrieval. Evaluation in IR. One option is to average the precision scores at discrete. points on the ROC curve But which points?

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

CPSC 230 Extra review and solutions

MODELING FOR RESIDUAL STRESS, SURFACE ROUGHNESS AND TOOL WEAR USING AN ADAPTIVE NEURO FUZZY INFERENCE SYSTEM

Clustering Results. Result List Example. Clustering Results. Information Retrieval

COSC 6397 Big Data Analytics. Fuzzy Clustering. Some slides based on a lecture by Prof. Shishir Shah. Edgar Gabriel Spring 2015.

Unsupervised Learning

Note Set 4: Finite Mixture Models and the EM Algorithm

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms

Analysis of Biological Networks. 1. Clustering 2. Random Walks 3. Finding paths

CHAPTER 4: CLUSTER ANALYSIS

Fractional. Design of Experiments. Overview. Scenario

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

Clustering Algorithms for general similarity measures

Multivariate Analysis

Types of general clustering methods. Clustering Algorithms for general similarity measures. Similarity between clusters

Market baskets Frequent itemsets FP growth. Data mining. Frequent itemset Association&decision rule mining. University of Szeged.

Measure of Distance. We wish to define the distance between two objects Distance metric between points:

Search engines. Børge Svingen Chief Technology Officer, Open AdExchange

Oracle-based Mode-change Propagation in Hierarchical Components.

Clustering & Classification (chapter 15)

ELCT201: DIGITAL LOGIC DESIGN

CSE 258 Lecture 6. Web Mining and Recommender Systems. Community Detection

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic

ELCT201: DIGITAL LOGIC DESIGN

CSE Computer Architecture I Fall 2011 Homework 07 Memory Hierarchies Assigned: November 8, 2011, Due: November 22, 2011, Total Points: 100

SEE1223: Digital Electronics

Starting Boolean Algebra

Fuzzy if-then rules fuzzy database modeling

15-451/651: Design & Analysis of Algorithms October 11, 2018 Lecture #13: Linear Programming I last changed: October 9, 2018

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Fuzzy Clustering, Feature Selection, and Membership Function Optimization

Clustering and Dissimilarity Measures. Clustering. Dissimilarity Measures. Cluster Analysis. Perceptually-Inspired Measures

Methods for Intelligent Systems

Unsupervised Learning. Andrea G. B. Tettamanzi I3S Laboratory SPARKS Team

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Pattern Recognition Lecture Sequential Clustering

Modeling Effects and Additive Two-Factor Models (i.e. without interaction)

Redefining and Enhancing K-means Algorithm

Introduction to Information Retrieval

A Naïve Soft Computing based Approach for Gene Expression Data Analysis

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering

K-means and Hierarchical Clustering

Parallel development II 11/5/18

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

Clustering Lecture 5: Mixture Model

Transcription:

Generalization of Hierarchical Crisp Clustering Algorithms to Fuzzy Logic Mathias Bank mathias.bank@uni-ulm.de Faculty for Mathematics and Economics University of Ulm Dr. Friedhelm Schwenker friedhelm.schwenker@uni-ulm.de Institute of Neural Information Processing University of Ulm

Page 2 Fuzzy Logic Question: Answer: Is there a salami sandwich in the refrigator? 0.5 If probability: If measure: If fuzzy: then there is or isn t, with probability one half then there is half a salami sandwich there then there is something there, but it isn t really a salami sandwich. Perhaps it is some other kind of sandwich, or salami without the bread... Prof. Solomon W. Golomb University of Southern California

Page 3 Motivation for Hierarchical Fuzzy Clustering Fuzzy Documents deal with more than one topic overlapping topic categories Hierarchical Topic Analysis on different abstraction levels

Page 4 Precondition for different Clustering Approach Hierarchical agglomerative Clustering Hierarchical agglomerative Fuzzy- Clustering Deterministic Global Optimum (based on similarity) Different Linkage Metrics available Deterministic Global Optimum (based on similarity) Different Linkage Metrics reusable Overlapping topics per document Multi- Assignment

Page 5 Generalized agglomerative Clustering Algorithm D1 D2 D3 D4 D5 D6 Group Clusters and with highest similarity D1 1 S(2,1) S(3,1) S(4,1) S(5,1) S(6,1) D2 1 S(3,2) S(4,2) S(5,2) S(6,2) D3 1 S(4,3) S(5,3) S(6,3) D4 1 S(5,4) S(6,4) D5 1 S(6,5) D6 1

Page 6 Generalized agglomerative Clustering Algorithm D1 D2 D3 D4 D5 D6 D1 1 S(2,1) S(3,1) S(4,1) S(5,1) S(6,1) Group Clusters and with highest similarity Delete all similarity values of and D2 1 S(3,2) S(4,2) S(5,2) S(6,2) D3 1 S(4,3) S(5,3) S(6,3) D4 1 S(5,4) S(6,4) D5 1 S(6,5) D6 1

Page 7 Generalized agglomerative Clustering Algorithm D1 D2 D3 D4 D5 D6 D1 1 0 S(3,1) S(4,1) S(5,1) S(6,1) D2 1 S(3,2) S(4,2) S(5,2) S(6,2) Group Clusters and with highest similarity Delete all similarity values of and Delete similarity value of and D3 1 S(4,3) S(5,3) S(6,3) D4 1 S(5,4) S(6,4) D5 1 S(6,5) D6 1

Page 8 Generalized agglomerative Clustering Algorithm D1 D2 D3 D4 D5 D6 C1 D1 1 0 S(3,1) S(4,1) S(5,1) S(6,1) 0 D2 1 S(3,2) S(4,2) S(5,2) S(6,2) 0 D3 1 S(4,3) S(5,3) S(6,3) S(C1,3) Group Clusters and with highest similarity Delete all similarity values of and Delete similarity value of and Insert new cluster into similarity matrix D4 1 S(5,4) S(6,4) S(C1,4) D5 1 S(6,5) S(C1,5) D6 1 S(C1,6) C1 1

Page 9 Linkage metrics Cluster Document Similarity Cluster Cluster Similarity Well known Linkage metrics reusable Single Linkage, Complete Linkage, Well known Linkage metrics reusable Single Linkage, Complete Linkage, Except to documents :, 0 Except to clusters :, 0 How to calculate,, if? Use common documents for similarity calculation?

Page 10 Linkage metrics Cluster Document Similarity Cluster Cluster Similarity Well known Linkage metrics reusable Single Linkage, Complete Linkage, Well known Linkage metrics reusable Single Linkage, Complete Linkage, Except to documents :, 0 Except to clusters :, 0 How to calculate,, if? Use common documents for similarity calculation?

Page 11 Linkage metrics Cluster Document Similarity Cluster Cluster Similarity Well known Linkage metrics reusable Single Linkage, Complete Linkage, Well known Linkage metrics reusable Single Linkage, Complete Linkage, Except to documents :, 0 Except to clusters :, 0 How to calculate,, if? Generalized subgraph property: don t use common data points for similarity calculation.

Page 12 Fuzzyness specification D1 D2 D3 D4 D5 D6 D1 1 0 f * f * f * f * S(3,1) S(4,1) S(5,1) S(6,1) D2 1 f * f * f * f * S(3,2) S(4,2) S(5,2) S(6,2) D3 1 S(4,3) S(5,3) S(6,3) D4 1 S(5,4) S(6,4) D5 1 S(6,5) D6 1 Fuzzifier specify fuzziness 0; 1 Applied to selected rows / columns If,, =0 Definable for each level separately (level counted by document level) Crisp level definition possible

Page 13 Generalized agglomerative Clustering Algorithm While (!empty(similaritymatrix)) Group Clusters and with highest similarity Delete all similarity values of and Delete similarity value of and Apply fuzzifier to similarity values fo and Insert new cluster into similarity matrix End

Page 14 Fuzzy Membership Calculation AB BC µ, s 1 s 1 s 2 s 2 µ, A B C µ,,, p: Parents

Interpretable hierarchy size 5,10 possible

Page 16 Generalized agglomerative Clustering Algorithm While (!empty(similaritymatrix)) Group Clusters and with highest similarity Delete all similarity values of and Delete similarity value of and Apply fuzzifier to similarity values fo and Insert new cluster into similarity matrix End Calculate Fuzzy Membership, Shrink cluster level size to predefined level size

Page 17 Evaluation measure for external labeled data 2A2B 4A2B2C1D 4A2B2C 2A2C COS-Quality-Index Propagate document labels via membership Cluster quality defined as statistics of all cos similarities of document label vector and cluster label vector Fuzziness Statistics AB AB AC AC D Recall in fuzzy mode difficult to define Fuzziness analysis for recall information Data 7 randomly chosen document collections Based on.. RCV1 (26501, 23309, 10148 docs) RCV2-german (16604, 13146, 11600, 9350 docs) Fuzzyfier 0.0, 0.1,, 0.8 for document level, 0.0 else Hiearchy level 5 Quality analysis for document parents necessary

Page 18 Evaluation Results COS-Quality for document parents

Page 19 Evaluation Results Fuzziness per document

Page 20 Conclusion Agglomerative Fuzzy Cluster Algorithm Agglomerative Crisp Cluster algorithm generalizable to Fuzzy Logic Deterministic hierarchical fuzzy cluster algorithm Global optimal cluster based on similarity function Topic Groups per level shrinkage possible (crisp & fuzzy) Quality Results Fuzziness does not highly influence cluster quality 25% - 50% of random chosen textual data affected by multi assignment

Page 21 Discussion