described with a predened list of attributes (or variables). In many applications, a xed list of

Size: px

Start display at page:

Download "described with a predened list of attributes (or variables). In many applications, a xed list of"

Verity Henry
5 years ago
Views:

1 228

2 22 Hierarchical Clustering of Composite Objects with a Variable Number of Components A. Ketterlin, P. Gancarski and J.J. Korczak LSIIT, URA 1871, CNRS 7, rue Rene Descartes Universite Louis Pasteur F Strasbourg Cedex France ABSTRACT This paper examines the problem of clustering a sequence of objects that cannot be described with a predened list of attributes (or variables). In many applications, a xed list of attributes cannot be determined without substantial pre-processing. An extension of the traditional propositional formalism is thus proposed, which allows objects to be represented as a set of components, i.e. there is no mapping between attributes and values. The algorithm used for clustering is briey illustrated, and mechanisms to handle sets are described. Some empirical evaluations are also provided to assess the validity of the approach Introduction The basic process of clustering consists of nding coherent groupings of objects in a data set. Each grouping must exhibit sucient internal cohesiveness, while maintaining enough distance to other members of the partition. This denition of clustering importantly constrains many techniques and algorithms devoted to the automated discovery of clusters [KR93]. The area of statistics concerned with this problem, namely the eld of cluster analysis, essentially deals with numeric and/or categorical data, where objects are represented as feature vectors [DLPT82]. This representation is equivalent, in essence, to propositional logic (or an attribute-value formalism). Derived clusters are usually represented as centroids in the instance-space. Centroids are used for classication of new data with the help of a distance measure, or with some implicit probability distribution. On the other hand, machine learning research on conceptual clustering [MS83], conducted during the last decade, focuses on the representation of instances and concepts; the goal is to derive some eective knowledge, in the form of generalizations (or concepts), about the domain under study. Concept formation is dened as incremental conceptual clustering [FPL91]: incrementality is the ability to maintain a conceptual structure that is updated after each observation. Many machine learning clustering algorithms still use propositional logic to describe the data, but recent work enriches the language used to describe concepts discovered through clustering. Some systems use a extended form of propositional logic, where con- 1 Learning from Data: AI and Statistics V. Edited by D. Fisher and H.-J. Lenz. c1996 Springer-Verlag.

3 230 A. Ketterlin, P. Gancarski and J.J. Korczak cepts are dened by logical formulae [MS83]. Others use probabilistic concepts [Fis87], where a concept is dened by a probability distribution on each dimension. Other work extends clustering techniques to higher level languages: KBG [Bis92] deals with rst-order logic, and Kluster [KM94] uses a Kl-one-like language *that mitigates computational complexity, while still retaining consider representational power. However, there are domains where numeric data must be considered. In such domains, logic-based formalisms are dicult to apply. In addition to our concern with numeric data, this paper describes an extension of attribute-value formalisms that allows objects to be represented as somewhat structured: in this framework an object is built up from any number of components. Earlier work on related extensions to propositional formalisms has been termed structured concept formation (see[tl91]), since objects exhibit several levels of structural detail. Section 22.2 briey reviews a simple clustering algorithm that we extend to cluster over objects represented as sets. This extension is described in Section Section 22.4 gives three examples of our clustering algorithm in action Concept Formation The underlying concept formation algorithm used throughout this paper is Cobweb. 2. This algorithm forms a hierarchy of clusters from a sequence of objects. Each object is an attribute-value list. Derived clusters (or concepts) are represented in a probabilistic form. Attributes are typed: the algorithm is able to handle nominal variables as well as numeric ones. In the case of numeric attributes, an underlying normal distribution is assumed, whose parameters are estimated. Therefore, for each numeric attribute, a cluster stores the estimated mean and variance with respect to the covered objects. The variances help dene a global predictivity score for one cluster, named, which is an average of the inverses of the standard deviations over all attributes. Each object is incorporated into the cluster hierarchy in turn. When incorporating a new object, a search is performed by hill-climbing through a space of cluster hierarchies. The object is sorted down through the current hierarchy. At each level several operators are tentatively applied to the current partition, some of them having restructuring properties. The best of these operators is denitely applied, and the process restarts one level deeper (see [Fis87, GLF89] for a complete description of the algorithm). When several distinct partitions are generated, a heuristic called category utility is used to select between them. Category utility evaluates the global quality of a single partition. This evaluation is based on the individual predictivity of each cluster involved. The partition (C; fc 1 ; : : : ; C K g) is evaluated by: 1 K KX k=1 P (C k )[(C k )? (C)] where each (C k ) measures the individual predictivity of C k. This expression is an average of the predictivity-gain when stepping from the cluster C to one of its sub-clusters. 2 In this paper, we consider Cobwebto be the algorithm described in [Fis87], with an extension to deal with numeric attributes developed in Classit(see [GLF89]), but without any other extension from Classit.

4 Hierarchical Clustering of Composite Objects with a Variable Number of Components 231 The individual predictivity of a cluster is dened as: (C k ) = 1 I IX i=1 (A i ; C k ) where (A i ; C k ) quanties how the attribute A i is predictive in C k (i.e. how precisely values of A i can be predicted for members of C k ). In this framework, a learning problem is dened on a set of attributes. Observations must be represented by a value for all (or some) of the attributes. Concepts (or clusters { both terms are used interchangeably in this context) must represent some distribution of values for each attribute of the learning problem. Attributes may be of dierent types. The denition of a type of attribute must thus include a value-space, a way to represent a distribution of such values, and a predictivity measure for such a distribution. As originally described in [Fis87], Cobweb provides the denitions for nominal (categorical) attributes. Classit [GLF89] provides the denitions for numeric (continuous) attributes. The Labyrinth system [TL91] further extends Cobweb to deal with `composite' objects by allowing `structured' attributes. The next section introduces a new type of attribute, called a `set' attribute, denes the representation of values and distributions for this attribute type, and gives a predictivity evaluation measure for distributions over the values of a set attribute Clustering Sets Representation of Objects The aim of this paper is to describe a conceptual clustering system that allows objects to be represented as sets of components (i.e. sub-objects). The representation formalism is inspired by propositional logic. In that framework each object is a list of attributevalue pairs. Our system allows a value to be a set of objects (instead of being a single numeric or categorical quantity); all objects of a set are described with the same set of attributes. Members of such a set value are less abstract objects, each of which describes some aspect of the more global (abstract) object. To illustrate this representation scheme consider an example domain of quadruped mammals that is found in the literature: each object consists of several sub-objects (cylinders), each of which is described along several, identical numeric attributes. An object of that domain may be written as: Dog1 = [ cyls={[r=13.9,l=25.1], [r=6.5,l=10.0], [r=4.7,l=6.3]} ] In this case the object Dog1 is described by only one attribute, named cyls, whose value is a set containing three sub-objects. Each of these sub-objects is described with the same two numeric attributes (r and l) In this paper only single attribute domains will be considered. The mechanisms described here apply to one attribute, whose values are sets of objects, but with no loss of generality, the method can be extended to domains described by several attributes, each of them having sets as values

5 232 A. Ketterlin, P. Gancarski and J.J. Korczak Clustering Strategy Section 22.2 briey explained how to build a cluster hierarchy. We noted that this problem can be cast as a problem of quantifying how predictive a cluster is. In the case of the domain described above, the question is: "How can one quantify the predictivity of a set of animals?". The answer to this problem lies in the fact that a separate cluster hierarchy can be built for each level of abstraction. In the animal domain it means that two cluster hierarchies are maintained: one for the cylinders (the component-cluster hierarchy) and one for the animals (the composite-cluster hierarchy). The results of incorporating the cylinders forming an animal are used during the incorporation of the animal. That is, the same process is repeated at each level of structural abstraction. This is a component- rst strategy, where parts are clustered before the whole [TL91]. The algorithm may be summarized as follows: 1. for each component 1a. incorporate the component in the component-cluster hierarchy 1b. update the description of the composite (replace the component by a reference to the cluster it reached) 2. incorporate the composite object into the composite-cluster hierarchy Each `incorporate' operation is a call to Cobweb. The rst call operates at a lower level (i.e. in the component space) than the last one, which operates in the composite space. It follows from this sketch that the integration of a composite object (e.g. an animal) is done with each component (e.g. a cylinder) replaced by a reference to a component-cluster. What is thus needed is a way to quantify the predictivity of a set of sets of componentcluster references. This problem divides itself into two distinct phases, explained in the next two sections Representation of Clusters The rst step is to nd an adequate representation for clusters covering objects whose description includes set-valued attributes. Since members of values (i.e. sub-objects or components) are hierarchically clustered, they can be replaced by their most specic covering cluster: the initial instance space is replaced by a cluster hierarchy (which is a partially ordered space). A conceptual (or intensional) representation of a set of composite objects must rely on a conceptual `vocabulary' for the components. This language is provided by the component-cluster hierarchy: clusters of components will be used to characterize clusters of composite objects. The second step is to select an adequate subset of composite clusters. An intuitive way to characterize a set of sets is by way of their mutual intersection (or overlap). However, instead of having simple members, one may apply the same line of reasoning to complex (structured objects). Generalizing a set of sets means nding a set of clusters describing all the sets of components, i.e. for each set under consideration, there exists a mapping from that set to the generalization. Hierarchically-organized clusters of components clearly help to nd such a description.

6 Hierarchical Clustering of Composite Objects with a Variable Number of Components 233 The conceptual representation of a set of composite objects will thus be a set of component-clusters: these clusters are called central-clusters, and must have the following properties: 1. Each central cluster must cover at least one member of each value. This means that each central cluster must characterize the intersection between all the sets under consideration. 2. All the members of the sets must be covered by central clusters. This means that central clusters must cover all the elements of all the sets, thus avoiding `marginal' intersections between only parts of the sets (that would leave some other components uncovered). 3. Central clusters have to be as specic as possible. This means that the set of central clusters is the most `precise' conceptual characterization of the intersection between all the sets. It is important to note that in cases where only one composite object is covered, the set of central clusters used to represent the composite cluster is the set of component-clusters covering the components of the object Predictivity Evaluation The second step is to evaluate the predictivity of the set of central clusters. Since each central cluster can be found in the component-cluster hierarchy, it is labeled with its individual predictivity (named ). A straightforward way to compute predictivity for a set of clusters is to average their individual predictivity score. This was the approach taken in our implementation, even though other methods of combination could be explored. However, since all central clusters do not cover the same proportion of objects, the contribution of each central cluster is weighted by the proportion of components it covers. The predictivity of a set of L central clusters f 1 ; : : : ; L g is thus: LX l=1 n o ( l ) N o ( l ) where n o ( l ) is the number of members covered by l, and N o = P L l=1 n o( l ) is the total number of components. Given the denition of central clusters, n o ( l ) is guaranteed to be at least equal to the number of covered objects. The weight associated with a central cluster will be called its coverage. The overall predictivity is thus a tradeo between the predictive ability and the coverage of central clusters Incrementality The original Cobweb algorithm, which worked with a purely propositional formalism, addressed the problem of concept formation, which is dened as incremental conceptual clustering. Incremental ability is crucial in most applications. It is thus important to check that incrementality is preserved when extending the knowledge representation language. The problem may be stated as: \Having a composite-cluster description and a new composite object to incorporate, how can one compute the new description of the compositecluster?" The process can be divided into two phases:

7 234 A. Ketterlin, P. Gancarski and J.J. Korczak - generalize any central cluster that does not cover at least one of the components of the new object - if any component of the incoming object remains uncovered by a central cluster, generalize the \nearest" central cluster This is only a sketch of the procedure. Importantly, any generalization must be performed carefully. A generalization is the replacement of one central-cluster by its parent in the component-cluster hierarchy. This process is equivalent to the application of a \climbgeneralization-tree" operator [Mic83], with the property that the generalization tree is itself built and maintained by the system. Nevertheless, this simple procedure ensures that the denition of central clusters is maintained. The two phases correspond to the rst two conditions in the denition of central clusters. It has to be noted that the addition of an object to a composite cluster may only generalize the central clusters describing it. This does not imply that the predictivity of that composite cluster decreases, since a loss in predictivity may be compensated by an increase in coverage Complexity This section investigates the computational cost of the set-clustering process, and focuses particularly on the integration of a new composite object into an existing compositecluster. The problem of structured concept formation has already been addressed by the Labyrinth system [TL91] 4. The dierence between Labyrinth and the system described in this paper is that Labyrinth is designed to nd a binding between the components and a predened set of attributes. Once the binding is determined, the task is reduced to composite-object clustering. But this binding process has high computational cost: O(d!) if an exhaustive search is performed (d being the number of components per object, which is xed in Labyrinth), O(d 2 ) or O(d 3 ) if heuristic solutions are used (see [TL91]). This binding is computed each time an object is added to a cluster. Note that this cost may be penalizing when d is high (see the next section for examples of such situations). In contrast, our system solves the mapping by nding a mutual intersection between all of the sets. Hence, the mapping process is replaced by the search for central clusters. Let us consider a composite cluster, C, covering N objects, S 1 ; : : : ; S N. Let n i =j S i j. It is easy to see that C will be represented by at most! = max i fn i g central clusters. Hence, the integration of the next composite object in C may lead to at most! generalizations during the rst step of the updating process. The second step may apply only j S N +1 j times. The whole updating process is thus linear in the average number of components per composite object. This notable decrease in complexity may be explained by the fact that instead of considering all possible bindings between the components and a predened set of attributes, the set-clustering mechanism uses the component-cluster hierarchy to solve the partialmatching problem. Moreover, the binding searched for by Labyrinth appears as an epiphenomenon: a global structure appears by way of the representation of composite clusters in terms of central clusters, rather than searching for a xed number of components. 4 Labyrinth's formalism also includes relational information between components, which are not discussed here.

8 Hierarchical Clustering of Composite Objects with a Variable Number of Components Empirical Results The Simplied Quadruped Mammals Domain This domain has already been used in the literature on structured concept formation to demonstrate the abilities of the Classit algorithm [GLF89]. It has been simplied here for explanatory purposes. In this domain objects represent some perceptual sketch of an animal: each object is composed of three cylinders (instead of eight in the original domain) representing the head, the torso and one leg of the animal. Each cylinder is described by two (instead of 9 in the original domain) numeric attributes (the radius and length). This domain illustrates a special case of set clustering. Since all the objects have the same number of components (three here), the problem is equivalent to clustering objects wih no knowledge of the binding between attributes and values. Since these objects are articially generated from four available models (cat, dog, horse or girae), 5 the goal is to discover these classes from the unlabeled data. In our experiment, 20 objects were randomly generated (ve from each model), each one being made of three cylinders. The system constructed a cluster hierarchy of animals: it thus built two hierarchies (one for the cylinders and one for the animals). Results are sketched on Figure 1. Note that each terminal class of Figure 1 is, in fact, further developed. Horse All : XyX 6 Gee X X X Small * HHY H Cat Dog FIGURE 1. Results in the quadruped mammals domain. The system found the four classes of animals even though all of them are not at the rst level. More important is the fact (not shown in Figure 1) that the conceptual representation of each of these classes included three central clusters, corresponding to the three cylinders. The cluster labelled `Small' covers all small animals (dogs and cats), and its conceptual representation includes only two cylinder clusters. The interpretation of such a conceptual representation is that a small animal is an animal made of two cylinders of one type (described by the rst central cluster) and one cylinder of another type. Such a representation could not have been obtained if a xed structure (i.e., a predetermined number of attributes) had been used Object Recognition Domains Character Recognition The second experiment involves a simplied form of pattern recognition. The basic idea is to consider a pattern as a set of pixels. To compensate for the loss of information on the spatial arrangement of pixels, the representation of each individual pixel is based on several convolutions. 5 The instance-generator is available at the \UCI Repository of Machine Learning Databases and Domain Theories" at Irvine, at ftp.ics.uci.edu, as /pub/machine-learning-databases/quadrapeds

9 236 A. Ketterlin, P. Gancarski and J.J. Korczak Data are drawn from a set of four alphabetic characters (shown on Figure 2), each of them printed in four directions. Each pattern was convolved with the Laplacian of a Gaussian at several dierent scales. Each pixel shown in black in the original pattern is considered as a component of that pattern. Each component is described by the values of the convolutions at its position. These values can be seen as local characteristics of the pixel in the original pattern. In fact, as shown in [MH80], convolution by the Laplacian of a Gaussian can be used as an edge detector (a value near zero, meaning that the pixel is on an edge in the original image). In the experiments reported here, three convolutions were used with the parameter respectively equal to 5=2, 3 and 7=2 (see [MH80] for details about the meaning of this parameter). The convolutions were directly applied to the iconic images. Note also that the four sets representing one character (one set for each `direction') contain exactly the same objects. Thus, objects from the same class have the same number of components, which is not true between classes. Data are shown on the left side of Figure 2. The system was thus given 16 composite-objects with a total of 1488 components. Results are shown on the right side of Figure 2. n = 103 n = 80 n = 102 n = 87 (a) The data. (b) The resulting hierarchy. FIGURE 2. The character-recognition data-set. As was expected, the system discovered one cluster for each class of characters. As noted earlier, a xed structure (i.e. a predened set of attributes) cannot model such a problem. Rotated Polygons The third experiment was similar to the second, but with more realistic rotation angles. The design was as follows: each pattern was computed from an underlying polygon, to which a rotation was applied, whose angle was randomly drawn. The rotated gure was then discretized, leading to a grey-levels icon. The icon was then treated the same way as the alphabetic characters in the previous experiment. Each non-white pixel was considered a component, and local characteristics (i.e. values of the convolution with the Laplacian of a Gaussian) were used to describe it. Parameters of the convolutions were set to the same values as in the previous experiment. Three `models' were used to generate the data. Each model was randomly rotated by three dierent angles, and each time expressed as a set of pixels. Figure 3 shows the data with the angle of rotation and the number of pixels. This domain is an illustration of the most complex representational case in which objects do not have the same number of

10 Hierarchical Clustering of Composite Objects with a Variable Number of Components 237 components, even if they are from the same class. = 122 n = 44 = 46 n = 51 = 190 n = 47 = 25 n = 46 = 306 n = 47 = 21 n = 47 = 357 n = 44 = 219 n = 50 = 47 n = 50 FIGURE 3. Nine rotated polygons. The result of the clustering is shown on Figure 4. Results are as good as in the nave case, and suggest that the set clustering algorithm is able to discover rotational-invariant clusters of patterns. FIGURE 4. Results in the pattern recognition domain Conclusion The set clustering mechanism described in this paper allows concept formation to take place in domains where a structure can be extracted. Particularly, it resolves the task of concept formation from composite objects. In the case where each set is restricted to one member, and several sets are used to describe each object, the problem reduces to composite-object clustering. In the general case, the overlap between sets is used to form generalizations. Experimental results illustrate the performance of the algorithm and demonstrate abilities that can not be attained by purely propositional algorithms. The experiments described in the previous section raise some questions about the nature of the discovered clusters, particularly in the case of the pattern-recognition problems. The representation of a pattern (as a set of pixels) is somewhat unusual, and the clusters that are derived are expressed in terms of pixel-classes. It is thus impossible to ask the system to `illustrate' what a cross is, for instance. But a new cross will be recognized as such. More precisely, a new cross will be recognized as being similar to previous ones. It seems that this is a case of a knowledge structure with no explicit, internal representation of known clusters. The component-cluster hierarchy may be said to reside at a sub-symbolic level (assuming that patterns are at the symbol level). The work presented in this paper is part of a project on learning and image processing. Earlier work on this project established the adequacy on concept formation techniques for image segmentation [KBK94]. The next step is to study the abilities of the same algorithm

11 238 A. Ketterlin, P. Gancarski and J.J. Korczak on an object recognition problem. Preliminary results are described in this paper. The important point is that the same algorithm is used for both tasks (image segmentation and object recognition), and that this algorithm is totally unsupervised. Future work will likely include the integration of both phases (i.e. object recognition in conjunction with segmentation) References [Bis92] G. Bisson. Conceptual clustering in a rst order logic representation. In Proceedings of the Tenth European Conference on Articial Intelligence, pages 458{462. J. Wiley and Sons, [DLPT82] E. Diday, J. Lemaire, J. Pouget, and F. Testu. Elements d'analyse de Donnees. Dunod, Paris, [Fis87] D. H. Fisher. Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2:139{172, [FPL91] D. H. Fisher, M. J. Pazzani, and P. Langley, editors. Concept Formation: Knowledge and Experience in Unsupervised Learning. Morgan Kaufmann, [GLF89] [KBK94] [KM94] [KR93] [MH80] J. H. Gennari, P. Langley, and D. H. Fisher. Models of incremental concept formation. Articial Intelligence, 40:11{61, J. J. Korczak, D. Blamont, and A. Ketterlin. Thematic image segmentation by a concept formation algorithm. In Proceeding of the European Symposium on Satellite Remote Sensing, J-U Kietz and K. Morik. A polynomial approach to the constructive induction of structural knowledge. Machine Learning, 14:193{217, J. J. Korczak and M. Rymarczyk. Application of classical clustering methods to digital image analysis. Technical report, Universite Louis Pasteur, Strasbourg, France, D. Marr and E. Hildreth. Theory of edge detection. Proceedings of the Royal Society of London, B. 207:187{217, [Mic83] R. S. Michalski. A theory and methodology of inductive learning. In R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, editors, Machine Learning: An Articial Intelligence Approach. Morgan Kaufmann, [MS83] [TL91] R. S. Michalski and R. E. Stepp. Learning from observation: Conceptual clustering. In R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, editors, Machine Learning: An Articial Intelligence Approach. Morgan Kaufmann, K. Thompson and P. Langley. Concept formation in structured domains. In Fisher et al. [FPL91].