A Pretopological Approach for Clustering ( )

Similar documents
A pretopological approach for structural analysis

Logic and Discrete Mathematics. Section 2.5 Equivalence relations and partitions

Data Structures and Algorithms for Pretopology: the JAVA based software library PretopoLib

Infinite locally random graphs

The Application of K-medoids and PAM to the Clustering of Rules

Enumerating Pseudo-Intents in a Partial Order

Power Set of a set and Relations

Open and Closed Sets

Acyclic fuzzy preferences and the Orlovsky choice function: A note. Denis BOUYSSOU

A Little Point Set Topology

Graphes: Manipulations de base et parcours

Lectures on Order and Topology

An Incremental Hierarchical Clustering

I. An introduction to Boolean inverse semigroups

arxiv: v2 [math.co] 13 Aug 2013

ISSN X (print) COMPACTNESS OF S(n)-CLOSED SPACES

Topological properties of convex sets

2.8. Connectedness A topological space X is said to be disconnected if X is the disjoint union of two non-empty open subsets. The space X is said to

ON DECOMPOSITION OF FUZZY BԐ OPEN SETS

CS 341 Homework 1 Basic Techniques

Binary Relations McGraw-Hill Education

ROUGH MEMBERSHIP FUNCTIONS: A TOOL FOR REASONING WITH UNCERTAINTY

A set with only one member is called a SINGLETON. A set with no members is called the EMPTY SET or 2 N

T. Background material: Topology

Generell Topologi. Richard Williamson. May 6, 2013

Online Appendix: Generalized Topologies

CSC Discrete Math I, Spring Sets

MATH 54 - LECTURE 10

= [ U 1 \ U 2 = B \ [ B \ B.

Data Structures and Algorithms for Pretopology: The JAVA based softwarelibrary PretopoLib

Lecture 6,

Characterization of Super Strongly Perfect Graphs in Chordal and Strongly Chordal Graphs

Characterization of Boolean Topological Logics

Chapter 2 Topological Spaces and Continuity

ON BINARY TOPOLOGICAL SPACES

Elementary Topology. Note: This problem list was written primarily by Phil Bowers and John Bryant. It has been edited by a few others along the way.

A. Benali 1, H. Dermèche 2, E. Zigh1 1, 2 1 National Institute of Telecommunications and Information Technologies and Communications of Oran

Kernel perfect and critical kernel imperfect digraphs structure

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Discrete Mathematics

1 Matchings in Graphs

arxiv: v3 [math.co] 25 Jun 2011

However, this is not always true! For example, this fails if both A and B are closed and unbounded (find an example).

INTRODUCTION Joymon Joseph P. Neighbours in the lattice of topologies Thesis. Department of Mathematics, University of Calicut, 2003

On Generalizing Rough Set Theory

Slides for Faculty Oxford University Press All rights reserved.

Generalized Infinitive Rough Sets Based on Reflexive Relations

A Note on Fuzzy Boundary of Fuzzy Bitopological Spaces on the Basis of Reference Function

[Ch 6] Set Theory. 1. Basic Concepts and Definitions. 400 lecture note #4. 1) Basics

Topology Homework 3. Section Section 3.3. Samuel Otten

The optimal routing of augmented cubes.

INTRODUCTION TO TOPOLOGY

A step towards the Bermond-Thomassen conjecture about disjoint cycles in digraphs

Antisymmetric Relations. Definition A relation R on A is said to be antisymmetric

On Soft Topological Linear Spaces

Journal of Asian Scientific Research WEAK SEPARATION AXIOMS VIA OPEN SET AND CLOSURE OPERATOR. Mustafa. H. Hadi. Luay. A. Al-Swidi

XML Document Classification using SVM

Chapter 11. Topological Spaces: General Properties

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np

Topology 550A Homework 3, Week 3 (Corrections: February 22, 2012)

THREE LECTURES ON BASIC TOPOLOGY. 1. Basic notions.

Intuitionistic Fuzzy g # Closed Sets

9/19/12. Why Study Discrete Math? What is discrete? Sets (Rosen, Chapter 2) can be described by discrete math TOPICS

M3P1/M4P1 (2005) Dr M Ruzhansky Metric and Topological Spaces Summary of the course: definitions, examples, statements.

Soft Regular Generalized Closed Sets in Soft Topological Spaces

A Decision-Theoretic Rough Set Model

9.5 Equivalence Relations

Rough Connected Topologized. Approximation Spaces

A metric space is a set S with a given distance (or metric) function d(x, y) which satisfies the conditions

Chapter 3: Propositional Languages

Soft Pre Generalized - Closed Sets in a Soft Topological Space

Weighted Geodetic Convex Sets in A Graph

Data Mining Algorithms

Section 17. Closed Sets and Limit Points

The Further Mathematics Support Programme

NICOLAS BOURBAKI ELEMENTS OF MATHEMATICS. General Topology. Chapters 1-4. Springer-Verlag Berlin Heidelberg New York London Paris Tokyo

Towards a Logical Reconstruction of Relational Database Theory

Left and right compatibility of strict orders with fuzzy tolerance and fuzzy equivalence relations

Real Analysis, 2nd Edition, G.B.Folland

Math 170- Graph Theory Notes

Compactness in Countable Fuzzy Topological Space

On Some Properties of Vague Lattices

Topology Between Two Sets

Visual tools to select a layout for an adapted living area

1. Draw the state graphs for the finite automata which accept sets of strings composed of zeros and ones which:

Rough Sets, Neighborhood Systems, and Granular Computing

CHAPTER 3 FUZZY RELATION and COMPOSITION

Application of the topological gradient method to color image restoration and classification

TOWARDS FORMING THE FIELD OF FUZZY CLOSURE WITH REFERENCE TO FUZZY BOUNDARY


SOME REMARKS CONCERNING D METRIC SPACES

The Set-Open topology

Solutions to Homework 10

Complete Bipartite Graphs with No Rainbow Paths

Chapter 1. Preliminaries

Topology and Topological Spaces

ACLT: Algebra, Categories, Logic in Topology - Grothendieck's generalized topological spaces (toposes)

The Connectivity Order of Links

Point-Set Topology 1. TOPOLOGICAL SPACES AND CONTINUOUS FUNCTIONS

Data Analytics and Boolean Algebras

A Particular Type of Non-associative Algebras and Graph Theory

Transcription:

A Pretopological Approach for Clustering ( ) T.V. Le and M. Lamure University Claude Bernard Lyon 1 e-mail: than-van.le@univ-lyon1.fr, michel.lamure@univ-lyon1.fr Abstract: The aim of this work is to define a clustering algorithm starting from the pretopological results related to the minimal closed subset concepts. However the minimal closed subsets algorithm generally does not lead us to a clustering of the population under interest. Thus, we propose to define a new clustering method, based on this algorithm. Our method involves two steps: the first one consists in structuring the population by the minimal closed subsets method and the second one consists in building a clustering of the population, starting from the previously obtained structuring. This method is tested on data from the CRAM of Lyon and from the emergency service of Hôpital Edouard Herriot of Lyon. Keywords: binary relation, pretopology, minimal closed subset, germ, structuration, classification. 1. Introduction The pretopology is a mathematical tool for the analysis model and construction in the most various fields: social sciences, game theories, graphs, networks, preferences, and mathematisation of discrete spaces. It probably establishes the powerful tools for the structure analysis and automatic classification (Hubert Emptoz 1983, Nicoloyannis N. 1988). It ensures the follow-up of the process s development of dilation, alliance, adherence, closed subset, acceptability (Belmandt Z. 1993, Lamure M. 1987, Duru G. 1980, Auray J. P. 1983). In the problem of data analysis, the pretopology provides us a structural method based on adherence and minimal closed subsets concepts (Belmandt Z. 1993, Bonnevay S. and all. 1999, 2000, Largeron C. and all. 1997). Given a finite set E, the adherence a(.) defined on its subsets has the possibility to express the extension phenomena. Contrarily to what occurs in topology, a(.) is not always a closure, but its successive aggregations lead to produce closed subsets which characterize homogenous or interdependent parts of E. This process is the principal mechanism of this structural method that will be detailed in the paragraph 2. However, this method does not deduce a partition for a given set because the nested groups exist in its structure. We thus propose a new method of automatic classification based on the minimal closed subsets approach and the idea from the germ of k-medoids method (P. Berkhin 2002, Raymond T. Ng and all. 2002). Our approach has two steps: producing minimal closed subsets and then separating the nested groups obtained in the previous step into the distinct groups. Its advantages will be discussed in the paragraph 3. ( ) We gratefully thank CRAM of Lyon and Professor Robert (head of the emergency service) for giving us access to their data

2. Pretopology: Basic Concepts 2.1. Pseudoclosure Definition 1 A mapping a(.) from P(E) into P(E) is called a pseudoclosure iff A P(E), a( ) = (1) A P(E), a(a) A (2) By using duality, we can define the interior mapping as follows: Definition 2 Given a pseudoclosure on E, we define the interior mapping by putting: A P(E), i(a) = [a(a c )] c (3) where A c denotes the subset E A. Then, (E, i, a) is said a pretopological space. According to properties of a(.) (and i(.)), we obtain more or less complex pretopological spaces from the most general spaces to topological spaces. Pretopological spaces of V type are the most interesting case. In that case, a(.) fulfills the following property : A P(E), B P(E), A B a(a) a(b) (4) Definition 3 Let (E, i, a) a pretopological space, A a subset of E is said a closed subset iff a(a) = A. Definition 4 Let (E, i, a) a pretopological space, A a subset of E is said an open subset iff i(a) = A. Definition 5 Given (E, i, a) a pretopological space, for any subset A of E, we can consider the whole family of closed subsets of E which contain A. If exist, we determine the smallest element of that family for the including relationship. That element is called the closure of A and denoted F(A). 2.2. Pretopology and binary relationships Suppose we have a family {R i } i=1..n of binary relationships (quantitative or qualitative) on a set E. In this section, we show how it is possible to define a V pretopological space from the family {R i }. Let us consider for any x E: B i (x) = {y E xr i y} {x}. (5) We call V(x), the family of the neighbourhoods of x, is defined by: V(x) = {V P(E) i, B i (x) V } (6) We can prove that {V(x) x E} is a prefilter of subsets of E, which means: x E, V V(x), / V (7)

x E, V V(x), W, W V W V(x). (8) Then from the family V(x), the pseudoclosure a(.) is defined by: A P(E), a(a) = {x E V V(x), V A } or equivalently : (9) A P(E), a(a) = {x E i, B i (x) A } (10) Proposition 1 Given a family {R i } i=1..n of binary relationships on a finite set E, the pretopological space(e, i, a) defined by using the pseudoclosure a(.) defined above is a V one. The reason for using the spaces of type V is that we can build them from a family of reflexive binary relations on the finite set E. That thus makes it possible to take various points of view (various relations) expressed in a qualitative way to determine the pretopological structure placed on E. The space of type V is the starting point for the definition of a classification of E. 2.3. Elementary closed subsets, minimal closed subsets We recall in a pretopological space (E, i, a), a subset K of E is a closed subset of E if and only if a(k) = K. And the smallest closed subset containing A is the closure of A. We get the following result: Proposition 2 In any pretopological space of type V, given a subset A of E, the closure of A always exists. We denote F e the family of elementary closed subset the set of closures of each singleton {x} of P(E). So in a V pretopological space, we get: - x E, F x : closure of{x} - F e = {F x x E} Definition 6 F is called a minimal closed subset if and only if F is a minimal element for inclusion in F e. In view to determine how E is structured by the pseudoclosure a(.), we use the concept of minimal closed subsets according to the following algorithm: Given the pseudoclosure a(.), we search for F e into E the following function provides the result. F e = ; for all x Edo{ F x = a({x}); W hile (a(f x ) F x )F x = a(f x ); If(F x / F e )F e = F e F x ; }

Then, we are able to determine the minimal closed subsets by using the following function by noting that we only need to extract these minimal closed subsets from F e (Bonnevay S.1999,2000). F m = ; While(F e ){ Choose F F e ; F e = F e {F }; minimal = true; K = F e ; While((K ) (minimal)){ Choose G K; If(G F )minimal = false; Else if(f G)F e = F e {G}; K = K G; } If((minimal == true)&& (F / F m )) F m = F m F ; } Figure 1: Example of structuration method. F m F e............ Example: In order to illustrate our method, we present the following example, E= 1,., 10, n=10, and given x, x = (x 1, x 2 ) and y, y = (y 1, y 2 ), we put xry iff (y2 y1) 0 and d(x, y) ε, for a given ε. (see Table 1). R(x) = {y E (y2 y1) 0, d(x, y) 2} Table 1: x x 1 x 2 R(x) a(x) = {y E F x a(x) R({y}) {x} = } 1 1 1 1,2,3 1 1* 1 2 1 2 2,3 1,2,3 1,2,3 3 3 2 2 2,3 1,2,3 1,2,3 3 4 3 4 4,5,7 4,5 4,5* 2 5 4 4 4,5,6 4,5 4,5* 2

Table 1: (continued) 6 5 5 6 5,6 4,5,6 2 7 3 6 7 4,7 4,5,7 2 8 4 1 8,9 8,9 8,9* 2 9 6 1 8,9,10,11 8,9 8,9* 2 10 7 2 10,11,12 9,10 8,9,10 2 11 6 3 11,12 9,10,11 8,9,10,11 3 12 7 4 12,13,15 10,11,12,13 8,9,10,11,12,13,14 4 13 9 4 12,13 12,13,14 8,9,10,11,12,13,14 3 14 9 3 13,14 14 14* 1 15 7 6 15,16 12,15,16 8,9,10,11,12,13,14,15,16 3 16 8 6 15,16 15,16 8,9,10,11,12,13,14,15,16 2 Using the structural method based on the minimal closed subset concept, we get the following result : Figure 2: Structuring process. The final structure shown in figure 2 is obtained as follows: In the first step, we get the minimal closed subsets {{1}, {4, 5}, {8, 9}, {14}} (the greyest areas in the above picture). Afterwards, we get the smallest elementary closed subsets which contain the minimal ones: {{1, 2, 3}, {4, 5, 6}, {4, 5, 7}, {8, 9, 10}}. And so on: {8, 9, 10, 11}, {8, 9, 10, 11, 12, 13, 14},{8, 9, 10, 11, 12, 13, 14, 15 16}. In the above table and image, we note that the sets marked by stars (*) are minimal element forming homogeneous groups of the population. They cannot transfer the poison to the other elements but they are influenced by the ones of the group which contains them. The advantage of this method is to help us to analyze the connection between the elements in discrete space. However, this method only provides a clustering of E in the case which the relationship between elements of E is a symmetric one. In many practical situations, it is not the case, so we propose using this minimal closed subsets algorithm as a pre-treatment for classical clustering methods, in particular for the K-medoids method. Two possible cases can thus occur at the end of the minimal closed subsets algorithm: F m provides a partition of E. The clustering is obtained. F m does not provide a partition of E. In this case, we must perform the second step in order to build a clustering based on the result obtained by the previous stage.

3. Our pretopological clustering method As we previously said, the minimal closed subsets algorithm generally does not provide a partition of the whole set E. However, in its first step, it provides the minimal closed subsets which play an important in the structuring process. So, this gives us the idea to use these minimal closed subsets as germs for the K-medoids method. The problem is that K-medoids methods use singletons as germs. As minimal closed subsets are not generally singletons (see the previous example), the first thing we have to deal with is to select one and only element in each minimal closed subset if needed. 3.1. Determining germs Let us recall some notations: - F m : family of minimal closed subsets. - F e = {F x x E}: family of elementary closed subsets - a({x}) = {y E R({y}) {x} = } Two possibilities can occur: F ({x}) = F m ({x}) = {x}, x is a germ of class F ({x}) = F m ({x}) = {x 1, x 2,..., x p }. Calculate a({x i }), for i = 1,...,p. Select x o such as a({x o }) = Max( a({x o }) ), for i = 1,...,p. In case where two such x o exist, the germ is randomly chosen, else, we continue by calculating a measurement dispersion τ x associated to a possible germ x. Case 1: Data are quantitative ones and we can define a metric d on the set E. For any subset A of E, for any x, x A, we compute τ x = d(x, y), and we y A, y x select the germ by taking x o such as τ xo = Min({τ x, x A}). Case 2: Data are qualitative ones and any element x can be represented by a binary string, by mean of a completely disjunctive table. We compute BitOne(x&&y) τ xy = where BitOne(x) returns the number of bit 1 in BitOne(x)+BitOne(y) z the binary string which represents x. Then, we select the germ by taking x o such as τ xo = Max({τ xy }, y A, y x). After having found the germ e for each class influenced by x, we choose the class to affect x to which such as the dispersion measurement between x and e is the most reasonable by the assignment approach. 3.2. The assignment approach Initialization: - G = F m = {G j }, F em = F e F m = {Fem}, i j = 1,.., F m, i = 1,.., F em ; - Sort F em ascending within the meaning of inclusion; - i=1 ; 1. Compute K = {G j G j Fem i }, H = Fem i G j. G j K

2. If K = 1, put G j = G j H and go to step 4, else go to step 3. 3. x H, calculate e(g j ), compute τ(x, e(g j ), G j K) affect x to G k such as : τ(x, e(g k )) = Min({τ x,e(gj )}), Case 1 τ(x, e(g k )) = Max({τ x,e(gj )}), Case 2 e(g j ) is the germ determined in G j. 4. F em = F em F i em. If F em =, stop, else, i = i + 1 and go to step 1. At the issue, G is a partition of E. In order to have a better understanding of this algorithm, let us return to the previous example. Initialization : - G = F m = {{1}, {14}, {4, 5}, {8, 9}} - F em = {{1, 2, 3}, {4, 5, 6}, {4, 5, 7}, {8, 9, 10}, {8, 9, 10, 11}, {8, 9, 10, 11, 12, 13, 14}, {8, 9, 10, 11, 12, 13, 14, 15, 16} 1. F 1 em = {1, 2, 3}, K = {1}, H = {2, 3}. Affect all elements of H to G 1, G 1 = {1, 2, 3}, F em = F em F 1 em 2. F 1 em = {4, 5, 6}, K = {4, 5}, H = {6}. Affect all elements of H to G 3, G 3 = {4, 5, 6}, F em = F em F 1 em 3.... 4. F 1 em = {8, 9, 10, 11}, K = {8, 9, 10}, H = {11}. Affect all elements of H to G 4, G 4 = {8, 9, 10, 11}, F em = F em F 1 em 5. F 1 em = 8, 9, 10, 11, 12, 13, 14, K = {{14}, {8, 9, 10, 11}} τ(12, 11) < τ(12, 14) => affect {12} to G 4 τ(13, 14) < τ(13, 11) => affect {13} to G 2 G 2 = {13, 14}, G 4 = {8, 9, 10, 11, 12}, e(g 2 ) = {13}, e(g 4 ) = {12} Result : G={{1,2,3},{4,5,6,7},{8,9,10,11,12,15,16},{13,14}} (see Figure 2). Figure 3: Clustering process. What are advantages of this method? - First, it provides a clustering of the population, - Second, the number of classes (the number of minimal closed subsets) is computed by the method while it must be chosen by the user in other methods as the k-medoids method. - Last, the germ of class is easy to extract from minimal closed subsets.

4. Conclusion This article present a new method for clustering based on the concept of minimal closed subsets of pretopology. Pretopology helps us to analyze the structure of a finite set in discrete space but it generally does not provide a clustering. This restriction is solved by the second stage of this method defining a concept of germ from which a clustering process is build. This new method gives us a possibility for making-decision in the field of social sciences where data often are complex and cannot lead us to consider metric spaces as representing the population. One typical example for applying that kind of method is the analysis of data from medico-economic bases as DRGs. References Abdul-Amier Hashom (1982) Plus proches voisins et classification automatique. Applications a des donnees industrielles, Thèse Th. Doct. 3e cycle Mathematiques des systemes : INSA Lyon. Auray J. P. (1983) Contribution à l analyse des structures pauvres, Thèse d Etat, Université Lyon 1. Belmandt Z. (1993) Manuel de prétopologie et ses applications, Edition Hermès. Bonnevay S., Lamure M., Largeron-Leteno C., Niconoyannis N. (1999) A pretopological approach for structuring data in non metric space, in: Electronic Notes in Discrete Mathematics, Melvin F. Janowitz, Elsevier Science Publishers, 2. Bonnevay S., Largeron C. (2000) Data analysis based on minimal closed subsets, in: Data Analysis, Classification and Related Methods, Kiers et al. editors, Springer, 303 308. Duru G. (1980) Contribution à l étude des structures des systèmes complexes dans les sciences humaines, Thèse d Etat, Université Lyon 1. Hubert Emptoz (1983) Modèle prétopologique pour la reconnaissance des formes. Applications en neurophysiologie, Thèse : Th. Sc. Univ. Cl. Bernard. Lyon I. Lamure M. (1987) Contribution à l analyse des espaces abstraits - application aux images digitales, Thèse d Etat, Université Lyon 1. Largeron C., Bonnevay S. (1997) Une méthode de structuration par recherche de fermés minimaux. Application à la modélisation de flux de migrations inter-villes, in: Société Francophone de Classification 97, 111 118. Nicoloyannis N. (1988) Structure prétopologiques et classification automatique: le logiciel DEMON, Thèse Lyon. P. Berkhin (2002) Survey of clustering data mining techniques. Raymond T. Ng and Jiawei Han (2002) CLARANS: A Method for Clustering Objects for Spatial Data Mining, in: IEEE transaction on knowledge and data engineering,14,5.