Complex object comparison in a fuzzy context

Information and Software Technology 45 (2003) 431 444 www.elsevier.com/locate/infsof Complex object comparison in a fuzzy context N. Marín*, J.M. Medina, O. Pons, D. Sánchez, M.A. Vila Intelligent Databases and Information Systems Research Group, Department of Computer Science and Artificial Intelligence, University of Granada, 18071 Granada, Andalucía, Spain Abstract The comparison concept plays a determining role in many problems related to object management in an Object-Oriented Database Model. Object comparison is appropriately managed in a crisp object-oriented context by means of the concepts of identity and value equality. However, when dealing with imprecise or imperfect objects, questions like To which extent may two objects be the same one? or How similar are two objects? have not a clear answer, because the equality concept becomes fuzzy. In this paper we present a set of operators that are useful when comparing objects in a fuzzy environment. In particular, we introduce a generalized resemblance degree between two fuzzy sets of imprecise objects and a generalized resemblance degree to compare complex fuzzy objects within a given class. q 2003 Elsevier Science B.V. All rights reserved. Keywords: Object-oriented model; Fuzzy database; Objects resemblance; Inclusion degree 1. Introduction Probably one of the most important data paradigms in both the programming [1] and the databases world [2] is the Object-Oriented Data Model. Modeling the reality which is behind many software problems as a set of objects grouped around classes has proved to be a suitable approach for many developers, particularly when dealing with complex and dynamic problems. This well-deserved popularity has allowed the objectoriented data model to become one of the active research fields in the world of Computer Science. One of the major areas where this research has occurred has been in the field of fuzzy database modelling. As was the case with relational database modelling, the object-oriented data model is being widely studied in order to accept the representation of imperfect (i.e. imprecise, uncertain, vague) information in the database. As a result of this research many relevant works have appeared in the literature [3]. Identity equality is a fundamental concept in the objectoriented data model and constitutes the basic criterion used to distinguish objects. This notion of equality between objects states that two objects are identical if they are the same object, i.e. if they have the same object identifier. However, there exists situations where this criterion is * Corresponding author. E-mail addresses: nicm@decsai.ugr.es (N. Marín), medina@decsai. ugr.es (J.M. Medina), opc@decsai.ugr.es (O. Pons), daniel@decsai.ugr.es (D. Sánchez), vila@decsai.ugr.es (M.A. Vila). insufficient and we need to use the concept of value equality i.e. two objects are equal if the values of all their attributes are recursively equal. Query management in object-oriented databases is an example where this last criterion is commonly used. When the user orders the execution of a query he or she writes a set of value conditions that must be fulfilled by the objects that will belong to the query result. It should be noted that identity restrictions may also appear. When we deal with perfect information the usual set of relational operators (including the basic equality operator) allows us to solve these kinds of problems. If the database is affected by imperfection the classical concept of equality is not valid. In some cases it could be replaced by a similarity concept or more generally by a resemblance concept. In the context of object-oriented knowledge bases there are some approximations to this problem. For example Yazici et al. propose in Ref. [4] an approach to calculate the matching between an object and the conditions of a rule. The aim of this paper is to propose a more general way of managing fuzzy object comparisons in a fuzzy objectoriented data model similar to that presented in Ref. [5]. The paper is organized as follows. Section 2 presents the comparison problem, while Section 3 is devoted to the study of the generalization of the value equality concept when dealing with basic objects. In Section 4 we explain how to compare fuzzy sets of fuzzy objects. Based on the material presented in the previous sections, Section 5 demonstrates how to obtain a resemblance degree between two complex 0950-5849/03/$ - see front matter q 2003 Elsevier Science B.V. All rights reserved. doi:10.1016/s0950-5849(03)00014-4

432 N. Marín et al. / Information and Software Technology 45 (2003) 431 444 Fig. 1. The problem. objects in a fuzzy object-oriented environment. An explanatory example is presented in Section 6 and finally the paper ends with some concluding remarks and a discussion of future work in Section 7. 2. Value equality generalization: resemblance relationships Consider the example illustrated in Fig. 1. The rectangles represent two rooms characterized by their quality, their extension, the floor they are on, and the set of students which attend their lessons in each room. The three first attributes that characterize the class Room may take imprecise values: the quality is expressed by an imprecise label and the attributes extension and floor can be expressed using a numerical value or an imprecise label. The set of students is fuzzy, taking into account the percentage of time each student spends in the room receiving the lessons. If we want to compare these two objects we need to solve the following tasks: Firstly we have to handle resemblance in basic domains. Secondly we also have to be able to compare fuzzy collections of imprecise objects. Finally we need to aggregate the resemblance information that we have collected by studying the attributes and calculate a general resemblance opinion for the whole complex objects. The following sections are devoted to the study of each one of these problems. 3. Resemblance in basic domains If we wish to compare complex objects, the first level where resemblance must be studied corresponds to basic objects. That is, simple objects that have no OID (object identifier) and whose definition is not made up using attributes. We will refer to these kinds of objects as values of a given domain. We are going to consider the following classification of simple objects: Precise values. This category of values involves all the classical basic classes that usually appear in an objectoriented data model (e.g. numerical classes, string classes, etc.). Values of this kind of domains are easily compared using the classic set of relational operators (e.g. equal, less than, greater than, etc.). In these situations, resemblance is implemented using the model proposed by the classical equality. Imprecise values. The case of imprecise values is a bit more complex. Different types of imprecise values must be considered according to the semantics of the imprecise value. As we will see, the equality concept generalization depends on the domain nature. We are going to consider the different kinds of simple imprecise domains defined in the fuzzy object-oriented model proposed in Ref. [5]. In this work, we propose the use of linguistic labels [6 8] in order to express vagueness. Taking into account the model presented in Refs. [9,10], we consider three different types of imprecise basic domains (see Fig. 2). Domains made up by a set of labels whose semantics cannot be expressed over an underlying base domain (e.g. the domain used to express the quality of a room, The quality of the room is high, or the prospects of a student, Mary s prospects are good ). The only alternative to handle comparison of this type is to ask the designer of the domain for the definition of the fuzzy relation that manages the resemblance among the labels. As an example, Table 1 represents a similarity relation for the attribute quality. Domains where the labels can be expressed using a fuzzy set defined over an underlying domain. There are two possibilities: (1) The interpretation of the fuzzy set that represents the label is disjunctive. That is, it is a possibility distribution and only represents one value among a set of possible values (e.g. in the domain used to express the extension of a room, The room is big or the age of a student, Mary is young,

N. Marín et al. / Information and Software Technology 45 (2003) 431 444 433 Fig. 2. Three different kinds of imprecise information. and so on). In this case, the domain is made up of precise values (those of the underlying domain) and imprecise labels, and the comparison can be managed considering a generalization of the classical equality that holds in the underlying domain, taking into account the definitions of the labels. If B stands for the basic domain, L is the labels set, and D ¼ B < L then ;x; y; [ D we can use the following resemblance relation (^ stands for a t-norm): 8 1 ðx¼yþ^ðx;y[bþ 0 ðx yþ^ðx;y[bþ >< m S ðx;yþ¼ m l ðzþ ððx¼l[lþ^ðy¼z[bþþ _ððy¼l[lþ^ðx¼z[bþþ >: sup z[b ðm x ðzþ^m y ðzþþ otherwise ð1þ Consider for example the attribute extension of the class Room. The basic underlying domain of this attribute is the interval [0,1)and we add to the domain the set of labels {small, middel-size, big} whose definition is represented in Fig. 3. Taking into account the definitions of the labels and Eq. (1) we have that m S ð30; 30Þ ¼1; m S ð30; 35Þ ¼0; m S ð30; bigþ ¼0:9; and m S ðbig; middle-sizedþ ¼0:5: Although Eq. (1) is an accepted resemblance approach for these situations, there are several alternative proposals in the literature that describe a calculus for similarity and resemblance measures between fuzzy sets. A complete study of them can be found in Ref. [11]. (2) If the interpretation of the fuzzy set that represents the labels is conjunctive then we are not comparing imprecise values but sets. In this last case, whether labels are used to express the set (e.g. academic years of a student, Mary is in her final academic years ) or not (e.g. fuzzy collections of objects the students of a room), the comparison process is more complicated. Section 4 will analyze this problem in depth. 4. Resemblance in fuzzy sets of fuzzy objects In Section 3, we have presented the way of comparing the values of the two first attributes that characterize a room. This section studies the comparison of fuzzy collections of objects. Fig. 4 shows the two sets of students that we have to compare in order to evaluate the resemblance between our two rooms. To compare the two fuzzy sets of students we need to generalize the fuzzy set comparison operators, taking into account that the objects of the set may be imprecise. 4.1. Resemblance driven inclusion degree Conjunctive fuzzy set comparison is often done by means of the concept of inclusion: A ¼ B if; and only if; ða # BÞ ^ ðb # AÞ ð2þ Several proposals for the calculus of this inclusion degree can be found in the literature. In Ref. [12] the inclusion of A in B is calculated as follows: NðBlAÞ ¼min {Iðm AðuÞ; m B ðuþþ} ð3þ u[u where I stands for an implication operator, and m X is the membership function that describes the fuzzy set X: The implication operator can be chosen in accordance to the properties we want the inclusion degree to fulfill. Nevertheless, independently of the chosen implication operator, this formulation supposes that both A and B are defined over a reference universe U made up of precise elements, where the classical equality is the basis of Table 1 Domain for the quality attribute High Regular Low High 1 0.8 0 Regular 1 0.8 Low 1

434 N. Marín et al. / Information and Software Technology 45 (2003) 431 444 in a lower degree the implication condition between the membership degrees of this element to both sets. In Eq. (4) since the membership degrees of similar (but not necessarily equal) elements can be compared, we restrict the implication degree using the resemblance degree of the two elements. In summary, for each element that belongs with a certain degree to the set A we look for a quite similar object in U that belongs to the set B with a higher degree. Fig. 3. Extension. the comparisons. That is, the implication operator compares the degree with which each element of the universe U belongs to each one of the fuzzy sets (i.e. we compare the membership degrees of the same object to both sets). However in the context of fuzzy databases it frequently happens that the elements of the universe U are imprecise objects between which classical equality cannot be applied. Instead a similarity or a resemblance relationship must be used. That is, for a given element in the set A; it is not clear which element of B has to be taken in order to compare the membership degrees. In our example, the students may be defined with imprecision and two different students (from the identity equality point of view) may be the same one (from the value equality point of view). Let us study a way to solve this problem. If the reference universe U is formed by imprecise elements, Eq. (3) can be generalized as follows. Definition 1. (Resemblance driven inclusion degree). Let A and B be two fuzzy sets defined over a finite reference universe U, S be a resemblance relation defined over the elements of U, and ^ be a t-norm. The inclusion degree of A in B driven by the resemblance relation S is calculated as follows: Q S ðblaþ ¼min max u A;B;Sðx; yþ x[u y[u where u A;B;S ðx; yþ ¼^ðIðm A ðxþ; m B ðyþþ; m S ðx; yþþ In Eq. (3), the inclusion degree is calculated taking into account the element of the reference universe U that fulfills ð4þ ð5þ Consider the following example: let A ¼ 0:9=a þ 1=d and B ¼ 1=a þ 0:7=b þ 0:9=c be two fuzzy sets, where {a; b; c; d} is the reference universe U over which a resemblance relation S is defined, such that m S ða; bþ ¼m S ða; cþ ¼ m S ða; dþ ¼m S ðb; cþ ¼m S ðb; dþ ¼0 and m S ðc; dþ ¼0:7: In this situation: Q S ðblaþ¼min{ max{^ðiðm A ðaþ;m B ðaþþ;m S ða;aþþ; ^ðiðm A ðaþ; m B ðdþþ;m S ða;dþþ}; ; max{^ðiðm A ðdþ;m B ðaþþ;m S ðd;aþþ; ;^ðiðm A ðdþ; m B ðdþþ;m S ðd;dþþ}}: If we use the product as t-norm and Eq. (6) as implication operator, then: Q S ðblaþ¼min{max{1;0;0;0};max{0;1;0;0}; max{0;0;1;0:7};max{0;0;0:63;0}} ¼ min{1;1;1;0:63} ¼ 0:63: ( Iðx;yÞ¼ 1; if x # y y=x; otherwise ð6þ 4.1.1. Properties of the resemblance driven inclusion degree The first property that the resemblance driven inclusion degree ðqþ must fulfill is to be valid when the resemblance relation is the classic equality. Proposition 1. Let A and B be two fuzzy sets defined over a finite reference universe U over which a resemblance relation S is defined. Let S be the classic equality relation. Then, Q ¼ ðblaþ ¼NðBlAÞ: Proof 1. Q ¼ ðblaþ ¼min x[u max y[u u A;B;¼ ðx; yþ: Since u A;B;¼ ðx; yþ ¼^ðIðm A ðxþ; m B ðyþþ; m ¼ ðx; yþþ and taking into account that ;x y; m ¼ ðx; yþ ¼0 then ;x y; u A;B;¼ ðx; yþ ¼ ^ðiðm A ðxþ; m B ðyþþ; 0Þ ¼0 which yields ;x [ U; max y[u u A;B;¼ ðx; yþ ¼u A;B;¼ ðx; xþ ¼^ðIðm A ðxþ; m B ðxþþ; 1Þ ¼ Iðm A ðxþ; m B ðxþþ and thus Q ¼ ðblaþ ¼min x[u {Iðm A ðxþ; m B ðxþþ} ¼ NðBlAÞ (as we want to demonstrate). A Fig. 4. Resemblance in fuzzy collections. The inclusion degree must be monotonic with respect to the inclusion relationship.

N. Marín et al. / Information and Software Technology 45 (2003) 431 444 435 Proposition 2. Let A, B, and C be three fuzzy sets defined over a finite reference universe U over which a resemblance relation S is defined. Then, A # B ) Q S ðclaþ $ Q S ðclbþ: Proof 2. A # B ) ;x [ U; m A ðxþ # m B ðxþ: That is, ;x [ U; m A ðxþþ1 ¼ m B ðxþ; with 1 $ 0: Also SupportðAÞ # SupportðBÞ: By the properties of implications, Q S ðclaþ ¼ min x[u max y[u u A;C;S ðx; yþ ¼min x[u max y[u^ðiðm A ðxþ; m C ðyþþ; m S ðx; yþþ $ min x[u max y[u^ðiððm A ðxþþ1þ; m C ðyþþ; m S ðx; yþþ; ;1 $ 0: Particularly, Q S ðclaþ $ min x[u max y[u^ ðiðm B ðxþ; m C ðyþþ; m S ðx; yþþ ¼ Q S ðclbþ: A The following property also holds. Proposition 3. Let A, B, and C be three fuzzy sets defined over a finite reference universe U over which a resemblance relation S is defined. Then, A # B ) Q S ðblcþ $ Q S ðalcþ: Proof 3. A # B ) ;x [ U; m A ðxþ # m B ðxþ: That is, ;x [ U; m A ðxþþ1 ¼ m B ðxþ; with 1 $ 0: Also SupportðAÞ # SupportðBÞ: By the properties of implications, Q S ðblcþ ¼ min x[u max y[u u C;B;S ðx; yþ ¼ min x[u max y[u^ ðiðm C ðxþ; m B ðyþþ; m S ðx; yþþ $ min x[u max y[u^ðiððm C ðxþþ; m B ðyþ 2 1Þ; m S ðx; yþþ; ;1 $ 0: Particularly, Q S ðblcþ $ min x[u max y[u^ ðiðm C ðxþ; m A ðyþþ; m S ðx; yþþ ¼ Q S ðalcþ: A 4.2. Matching resemblance opinions When comparing two fuzzy sets of imperfect elements we can consider both inclusion directions, as shown in Eq. 2, by means of a resemblance driven inclusion degree (Fig. 5). However to take this approach we need to propose a way to match both inclusion degrees in order to obtain the resemblance degree between the two fuzzy sets. Using Eq. (2) as a basis we can define a resemblance degree operator between two fuzzy sets as follows. Definition 2. (Generalized resemblance between fuzzy sets). Let A and B be two fuzzy sets defined over a finite reference universe U over which a resemblance relation S is defined, Fig. 5. Two directions of inclusion. and ^ be a t-norm. The generalized resemblance degree between A and B restricted by ^ is calculated by means of the following formulation: S;^ðA; BÞ ¼^ðQ S ðblaþ; Q S ðalbþþ ð7þ Since every t-norm is commutative and the inclusion operator Q is idempotent, then the above measure is a resemblance relation. For example, let A ¼ 1=a þ 1=b þ 1=c and B ¼ 1=d be two fuzzy sets, such that {a; b; c; d} is a subset of a universe U over which a resemblance relation S is defined, and m S ða; dþ ¼m S ðb; dþ ¼m S ðc; dþ ¼0:5: In this situation: Q S ðblaþ ¼0:5 ¼ QðAlBÞ: Using minimum as t-norm, S;minðA; BÞ ¼0:5: 4.2.1. Cardinality ratio In the last example, the generalized resemblance operator ( ) has indicated that both fuzzy sets of objects have a resemblance equal to 0.5. Now, suppose that C ¼ 1=a is another fuzzy set defined over the same universe. In this case, Q S ðblcþ ¼0:5 ¼ Q S ðclbþ and S;min ðc; BÞ ¼ 0:5: The above example illustrates one drawback of the use of the generalized resemblance operator ( ). The use of resemblance relations instead of equality when calculating inclusion degrees, may make the generalized resemblance between two fuzzy sets be distinct to 0, even if there is a great difference on terms of cardinality. In the above example the cardinality of A and B is different whereas for B and C it is the same. However this is not taken into account when determining the resemblance degree. In some situations cardinality may not be important. Imagine that we are comparing two sets of tools: we are probably looking for similar capabilities and thus the number of tools is not relevant. However in the case of students in a room the actual number of students may be important when comparing the sets. If we wish to distinguish between these kinds of situations we need to weight the generalized resemblance degree with a factor that takes into account the distance between the cardinalities of the fuzzy sets that are being compared (Fig. 6). Definition 3. (Cardinality ratio). Let A and B be two fuzzy sets. We define the cardinality ratio between A and B as a measure of the relative resemblance between their cardinalities. This ratio can be calculated as follows: 8 >< 1; if A ¼ B ^ B ¼ B FðA; BÞ ¼ minðlal; lblþ >: maxðlal; lblþ ; otherwise ð8þ The ratio reaches its maximum value (1) when both sets present the same cardinality and approaches 0 when the cardinality of one set is much greater than the other.

436 N. Marín et al. / Information and Software Technology 45 (2003) 431 444 Fig. 6. Need for a cardinality ratio. The cardinality ratio F preserves commutativity and idempotency. Consequently its use as a weighting factor for does not prevent it being a resemblance measure. Using the cardinality ratio in the example, we have that S;minðA; BÞ ¼0:17 and S;min ðc; BÞ ¼0:5: 4.2.2. Consistency degree between fuzzy sets In Section 4.2.1 we have studied the way of matching both directions of inclusion into the one resemblance degree. To achieve this we have proposed combining the degrees using a t-norm. The use of a t-norm as an aggregation function may be a bit restrictive, but it is suitable if we want to use resemblance to generalize the concept of equality. In some cases it is interesting to study the consistency degree between the definition of two fuzzy sets instead of studying the resemblance degree. Imagine we are interested in the resemblance of two collections and we only want to know to which extent one of the collections can be completed to match the other collection. In this case a less restrictive resemblance measure would be of interest. Definition 4. (Consistency degree between fuzzy sets). Let A and B be two fuzzy sets defined over a finite reference universe U over which a resemblance relation S is defined, and % a t-conorm. The consistency degree between A and B restricted by % is calculated as follows: S;%ðA; BÞ ¼%ðQ S ðblaþ; Q S ðalbþþ ð9þ With the consistency degree we intend to calculate to which extent there exists an inclusion relation between the contents of both sets (in at least one of the two directions). This measure, as the reader can easily see, is also a resemblance measure. 5. Calculating objects resemblance The problem we want to solve is to establish the resemblance degree between two objects that belong to the same class. In the previous sections we have seen the way to calculate the resemblance between basic objects and fuzzy sets of objects. Let us now consider the comparison of more general kind of objects. An object can be viewed as a set of values that correspond with the set of attributes that describe the type of the class the object belongs to. Each of these values can be in turn a basic object, another object, or a fuzzy set of objects (basic or not). In general, an object is considered to be complex if other objects take part in its definition. When comparing two objects, the starting point is to study the resemblance between the attribute values that describe their definition. The previous sections have provided us with tools to face the attribute comparison. In this section we are going to explain the way to aggregate the resemblance degrees obtained from the comparison of the attributes values in order to get a general resemblance measure of the two objects. 5.1. Formal definition of the problem Firstly, let us formally define the problem. Let o 1 and o 2 be two objects of the class C; characterized by the type T C ; whose structural component Str C is characterized by the attributes set {a 1 ; a 2 ; ; a n }: Our goal is to find the resemblance between o 1 and o 2 from the aggregation of the resemblance degrees that can be observed in the values of their attributes. We can outline the problem as a multi criteria aggregation problem, where different resemblance opinions (one for each pair of values of the same attribute) must be aggregated to obtain a resemblance consensus. Let S ai ðo 1 ; o 2 Þ stand for the resemblance degree observed between the values of attribute a i in objects o 1 and o 2 ; and Sðo 1 ; o 2 Þ stand for the aggregated resemblance opinion we want to calculate. 5.2. Attribute importance It is natural to think that not all of the resemblance degrees calculated for the attribute values of the objects being compared have the same weight when computing the resemblance between the objects. For example, if we are comparing people, some attributes may be more determining (e.g. name, father, brother) while others may be less determining (e.g. weight, age). In order to reflect this fact, let us consider that every attribute a i has associated a weight p ai that points out the importance that the resemblance in this attribute must have when computing the resemblance degree between objects of this class. Without sacrificing generality we are going to consider that ;i; p ai [ ½0; 1Š: 5.3. Resemblance values aggregation The calculus of the resemblance degree between two objects of the same class is governed by the following

N. Marín et al. / Information and Software Technology 45 (2003) 431 444 437 special vague sentence: Two objects are similar if: Most of the important attributes of the class present similar values in the objects. Let us consider some interesting points with respect to the previous sentence: The resemblance between the values that both objects have for a given attribute can be measured by means of a degree. In general, we will have a set of values {S ai ðo 1 ; o 2 Þ [ ½0; 1Š}: The importance of an attribute in determining the resemblance between two objects must be given by the class designer. In general, we will have a set of values {p ai }; also belonging to the [0,1] interval. The term most is a linguistic quantifier that denotes the extent to which we want the whole set of attributes to be taken into account when computing the resemblance degree between two objects of the class. In the literature there are two main types of vague sentences involving linguistic quantifiers [13,14]. They can be represented as follows: Type I: Q of X are A (e.g. Most of the students are tall ). Type II: Q of D are A (e.g. Most of the tall students are intelligent ). where Q is a linguistic quantifier, X is a finite reference universe, and A and D are vague properties, both defined as fuzzy sets over X with membership functions m A and m D ; respectively. The vague expression that begins this section can be easily transformed in a Type II sentence. We only have to consider the following: Q is the quantifier most. X ¼ {a 1 ; a 2 ; ; a n } is the set of attributes that characterize the type of the class. D is the set of the relevant attributes for the resemblance calculus. A is the set of the attributes that have similar values in both objects. The membership function of these latter fuzzy sets are as follows: m D ða i Þ¼p ai ; ;a i [ X m A ða i Þ¼S ai ðo 1 ; o 2 Þ; ;a i [ X ð10þ ð11þ The next step in the resemblance calculation process is to calculate the accomplishment degree of the Type II sentence from the quantifier and the previously defined membership functions. The literature offers very different approaches to solve this kind of problems in the fuzzy query environment: Zadeh s approach [14] calculates the accomplishment degree using the relative cardinal of A with respect to D ða A=D Þ; computing the compatibility degree of this cardinal with the quantifier Q definition: 0 X 1 ðm A ðxþ>m D ðxþþ x[x Z Q ða=dþ¼m Q ða A=D Þ¼m Q B X C @ A m D ðxþ ð12þ x[x Yager s approach [13] is founded on the use of the Ordered Weighted Average (OWA) [15,16] aggregation operator. This operator involves the calculus of two sequences of values and its ranking from the highest to the lowest value: {c i ¼m A ðx i Þ_ð12m D ðx i ÞÞ}; with x i [X: {d i ¼m D ðx i Þ}; with x i [X: If {b i } and {e i } are the descendant ordered permutations of {c i } and {d i } (respectively), then the accomplishment degree is given by the following formulation: Y Q ða=dþ¼ Xn w i b i i¼1 ð13þ where w i ¼m Q ðe i = P n 1 d i Þ2m Q ððe i = P n 1 d i Þ21Þ Vila s approach [17] is founded on the concept of coherent family of quantifiers. Let us look at this proposal in more detail. Definition 5. (Coherent family of quantifiers). Let Q ¼ {Q 1 ; ; Q l } be a set of linguistic quantifiers. Q is a coherent family of quantifiers if it verifies the following properties: (i) The membership functions of the elements of Q are non-decreasing. (ii) A partial order relation X is defined in Q, and this relation has as maximal element Q 1 ¼ and as minimal element Q l ¼ ;: Besides, ;Q i ; Q j [ Q; Q i # Q j ) Q j X Q i : (iii) The membership function of the quantifier is m Q1 ðxþ ¼1 if x 0 and m Q1 ð0þ ¼0; and the membership function of the ; quantifier is m Ql ðxþ ¼ 0ifx 1 and m Q1 ð1þ ¼1: The basic idea is to consider that the fulfillment degree of the sentence is between the accomplishment degree of the extreme sentences There exists a D that is A and All Ds are A : The accomplishment degree of the sentence There exists a D that is A is calculated as follows: x [ X; ðm D ðxþ ^ m A ðxþþ ¼ max ðm D ðxþ ^ m A ðxþþ ð14þ x

438 N. Marín et al. / Information and Software Technology 45 (2003) 431 444 The accomplishment degree of the sentence All Ds are A is the next one: ;x[x;ðm D ðxþ!m A ðxþþ¼minðm A ðxþ_ð12m D ðxþþ ð15þ x From the above Vila s approach for the calculus of the accomplishment degree of a Type II sentence can be carried out as follows. Definition 6. (Accomplishment degree V Q ). Let D and A be two fuzzy sets with the same domain X and Q be a coherent family of quantifiers such that every Q i has associated a value g i in [0,1] verifying: g ¼ 1; g ; ¼ 0; ;Q; Q 0 [ Q; Q X Q 0 Þ)g Q $ g Q 0: ð16þ In these conditions, the accomplishment degree of the sentence Q Ds are A is calculated by means of the following formulation: V Q ða=dþ ¼g Q max ðm D ðxþ ^ m A ðxþþ x þð12g Q Þ min ðm A ðxþ _ ð1 2 m D ðxþþþ ð17þ x As g i values, the authors propose the use of a value based on the orness measure given by Yager in Ref. [15]. Such an approximation is calculated as follows: o Q ¼ ð1 0 m Q ðxþdx ð18þ The final expression for the calculus of the accomplishment degree is: V Q ða=dþ ¼o Q max ðm D ðxþ ^ m A ðxþþ x þð12o Q Þ min ðm A ðxþ _ ð1 2 m D ðxþþþ ð19þ x We propose this latter approach for the aggregation of resemblance opinions. We do so because it has similar behaviour to that of Yager when the quantifier is close to the ; quantifier (as is the case here) and is easier to calculate (the calculus of o Q has to be performed only once for each quantifier). With respect to Zadeh s approach, this method is less strict and has a similar calculus complexity. A comparative study of the three approaches can be found in Ref. [18]. Fig. 7 summarizes the process. In order to compare objects o 1 and o 2 ; we first compare their attribute values, obtaining the partial resemblance opinions S ai ðo 1 ; o 2 Þ: Then using V Q we obtain a general resemblance measure between the two objects. In order to apply Vila s approach to the problem of resemblance aggregation we only need to determine the semantics of the quantifier we want to use. Instead of using the same quantifier to compute resemblance in all the classes, it is reasonable to let the designer of the class type choose the quantifier that must be used to compare the objects of the class. That is, the designer will give the value of g Q he wants to use to combine both extreme cases. In this way, at the same time that we give freedom to the designer, we simplify a lot the calculus of the accomplishment degree. 5.4. Complex objects: recursivity From the previous sections it may seem that the problem of object resemblance is solved: couples of attribute values are compared obtaining partial resemblance degrees, and then a final degree is obtained by means of an aggregation process. However, the high interrelation amongst data in the object-oriented data model causes complexity in the calculus of object resemblance in a given class. We refer to the complex objects case and the cycle presence in the relationships graph associated to a given schema. In order to deal with the resemblance calculus between complex objects, the only solution is to propagate the problem by means of recursivity. Suppose that we are comparing objects o 1 and o 2 ; and the values of a given attribute in these objects are objects o 3 and o 4 : To compute the resemblance between o 1 and o 2 ; we previously need to compute the resemblance between o 3 and o 4 ; and so on. The use of recursivity is not a problem unless there are cycles in the relationships graphs. For example: Let us consider the graph in Fig. 8. Suppose that we have two objects of class A; namely, o A1 and o A2 ; and we want to calculate their resemblance degree. When we analyze attribute a; we will need to study the resemblance degree between two objects of the class B; for example o B1 and o B2 : When trying to solve this latter resemblance, we will need to study the resemblance between two objects o C1 and o C2 : Finally, to know the resemblance between these objects, we will need to compute the resemblance between two objects of the initial class A: The cycle in the above example introduces the following problems: Firstly, we may have to solve a comparison problem in the same class that was our starting point. This increases the complexity of the problem and suggests the necessity of exploring a wide data set in order to establish the objects resemblance. Secondly, it may be that the resemblance degree of the objects we want to compare is part of the recursive tree generated in the computation. For example, if attribute c led back to object o A1 and o A2 again, we would have an infinite cycle. The first problem is unavoidable because is due to the high interrelation of data in the object-oriented model. The

N. Marín et al. / Information and Software Technology 45 (2003) 431 444 439 Fig. 7. Dealing with complex object comparison. second is far more dangerous because it prevent us from ending the computation. We can solve it by means of the following: Not propagating recursively, ignoring this in the general calculus. Approximating the value in some way, making several iterations until reaching the final value. Directly approximating the value with another semantically valid one. The first alternative (not propagating recursively) is not suitable because we may be ignoring important information. Although the second alternative (the use of an initial approximation and then iterate) is acceptable it requires a considerable calculus effort and a higher algorithmic complexity (more than one cycle may appear in a normal recursivity process). Our proposal is to use the third alternative. We can unfold the objects resemblance in two different ways: one that expresses the surface resemblance between the objects and the other based on an object exploration in depth. The first will be based on the object attributes which will not involve the cycling problem and the other will be a resemblance obtained taking into account the attributes that need recursive monitoring. The surface resemblance can be used as an approximation when, in the calculus of the second one (the real resemblance), a cycle is detected (see Fig. 9). Let us explore this in more detail. 5.4.1. Surface resemblance We are going to separate, into two different partitions, the attributes that characterize the structural component of the type of a class, in accordance with their possibility of getting into a cycle within the application scheme where they are defined. Definition 7. (Superficial attributes). Let C be a class whose type is made up by the set of attributes Str C ¼ {a 1 ; a 2 ; ; a n }: We define the set SStr C of superficial attributes of C as the subset of attributes of Str C that cannot get into a cycle that returns to C in the schema where the class is defined. 5.4.2. Deep resemblance: resemblance between two objects Using Definition 7 and ideas presented in the previous sections, we can define the calculus of the resemblance between two objects of a given class as follows. Definition 8. (Resemblance between two objects). Let C be a class whose type is made up by the set of attributes Str C ¼ {a 1 ; a 2 ; ; a n }: Let o 1 and o 2 be two objects of C: We define the resemblance (SR) between the objects o 1 and o 2 as the value returned by the following function: SR:F C OðF C Þ OðF C Þ PðOðF C Þ OðF C ÞÞ {0;1}!½0;1Š Fig. 8. The problem of cycles.

440 N. Marín et al. / Information and Software Technology 45 (2003) 431 444 where F C is the family of all the classes and OðF C Þ is the set of all the objects that exist in the database. The calculus of SRðC;o 1 ;o 2 ;V;tÞ involves the following case selections: If o 1 ¼o 2 ; then: SRðC;o 1 ;o 2 ;V;tÞ¼1 ð20aþ If exists a resemblance relation S defined in C; then: SRðC;o 1 ;o 2 ;V;tÞ¼m S ðo 1 ;o 2 Þ ð20bþ If o 1 and o 2 are fuzzy sets then: SRðC;o 1 ;o 2 ;V;tÞ¼ SRV;^ ðo 1 ;o 2 Þ¼ ^ðq SRV ðo 2 lo 1 Þ;Q SRV ðo 1 lo 2 ÞÞ If ðo 1 ;o 2 Þ[V^t 1; then: SRðC;o 1 ;o 2 ;V;1Þ Otherwise: ð20cþ ð20dþ o Q max ðp a i:a i [A i ^SRðC ai ;o 1 a i ;o 2 a i ;V<ðo 1 ;o 2 Þ;0ÞÞþ ð12o Q Þ min ðsrðc a i:a i [A i ;o 1 a i ;o 2 a i ;V<ðo 1 ;o 2 Þ;0Þ_ð12p ai ÞÞ where Fig. 9. Two ways of recursive call to deal with cycles. Q SRV ðolo 0 Þ¼ min x[supportðo 0 Þ max {Iðm o 0ðxÞ;m oðyþþ; y[supportðoþ ð20eþ SRðC D ;x;y;v;0þ} where C D stands for the class that is the reference universe of the sets, ( A¼ SStr C ift¼1 Str C ift¼0 and C ai is the domain class of the attribute a i : The latter function, in spite of the complexity of its definition, is just a case selection that proposes the different tools that can be used to compare both objects: There are two basic cases: When an identity equality holds between the objects. When a known defined resemblance relation exists in the class. The third case uses the generalized resemblance degree to compare two fuzzy sets of objects, using recursivity in order to compare the elements of the sets. The fourth case uses the variable V to determine the existence of a cycle. If this is the case, a recursive call is made that only focuses on the superficial attributes. The last case is a general recursivity model, in which, according to the parameter t; the aggregation operator V Q is applied over the recursive calls using couples of attribute values. If t is equal to 1, only the surface is explored. Otherwise all attributes are studied in depth. Because the properties of the operators used in each of the basic cases are those of a resemblance relation, the operator SR is also a resemblance measure amongst objects of a given class. 5.5. Organization in an object oriented model The representation in the object oriented data model of this approach for the calculus of objects resemblance can be established implementing a method for the resemblance calculus which satisfies the necessities of every defined class, taking into account the structure of these classes. The designer will have to consider the importance of every attribute, he or she will point out the superficial attributes and will have to determine the policy that must be applied when comparing null values. 1 Besides, the designer will have to consider the way in which fuzzy sets of imprecise objects must be compared in case that this kind of complex object may appear in the class definition. A formal parameter of the comparison method will perform the task of a stack, representing the history of the comparison computation (i.e. the parameter V of the latter function SR). If the programming language that describes the classes allows the use of meta-data (i.e. data about data), the effort of implementing the comparison function for every class may be avoided. For example, in Java [19], the Reflection API can be used to elaborate reusable code that implements the comparison method independently of the class. 1 When two null values are compared, a degree equal to 1 would be an acceptable optimistic consideration, unless the null values had different nature; in this case a degree equal to 0 would be more appropriate.

N. Marín et al. / Information and Software Technology 45 (2003) 431 444 441 Table 2 Domain for the extension attribute 6. An example Now let us compare the two rooms introduced at the beginning of the paper. We will assume that the following two classes are defined: a class Room, described by the set of attributes {quality, extension, floor, students} and a class, Student, described by the attributes {name, age, height}. To compare objects of class Rooms, the importance weights (Section 5.2) of the attributes are p floor ¼ p students ¼ 1; p extension ¼ 0:8; and p quality ¼ 0:5: On the other hand, to compare objects of class Students, we have that p name ¼ 1 and p age ¼ p height ¼ 0:75: The quality of a given room can be expressed using values of the labels domain described in Table 1. The extension will be described by the disjunctive imprecise domain shown in Table 2 and Fig. 3. Although this table describes the labels that belong to the domain, numerical values can be used as well. 2 The floor is an imprecise attribute whose semantics is also disjunctive, described in Table 3. Every room will have a fuzzy set of students, whose membership degree will be determined by the percentage of the day time they spend in the room attending their lessons. The student age is expressed by the disjunctive imprecise domain shown in Table 4, as well as by the use of numerical values. Height is expressed using the set of values shown in Table 5, as well as numerical values. As was the case for the attribute extension of the class Room, we again omit the semantics of the labels. In the database there are two room and six student objects (Fig. 10). To compute the resemblance between the two rooms we have to apply Eqs. (20a) (20e) as follows: SR(Room, room 1, room 2, B, 0) Big Middle-sized Small Big 1 0.5 0 Middle-sized 1 0.5 Small 1 This computation involves the following calculus (Eq. (20e)): SR(Quality, room 1 quality, room 2 quality, {(room 1, room 2 )},0) ¼ m S (high, regular) ¼ 0.8 (from Eq. (20b) and Table 1). SR(Extension, room 1 extension, room 2 extension, {(room 1, room 2 )}, 0) ¼ m big (30) ¼ 0.9 (from Eqs. (20b) and (1)). 2 To simplify, we omit the semantic definition of the labels. When necessary, we will give the membership degree of a given numerical value. Table 3 Domain for floor attribute 1 2 3 4 Low Intermediate High 1 1 0 0 0 1 0 0 2 1 0 0 0.8 1 0 3 1 0 0 1 0.7 4 1 0 0 1 Low 1 0.8 0 Intermediate 1 0.7 High 1 SR(Floor, room 1 floor, room 2 floor, {(room 1, room 2 )}, 0) ¼ m S (4, high) ¼ 1 (from Eq. (20b) and Table 3). To determine the resemblance between the fuzzy sets of students of the two classes we apply the generalized resemblance operator (Eq. (20c)): SR(Student, room 1 students, room 2 students, {(room 1, room 2 )}, 0) ¼ SR {room 1 ;room 2 } (room 1 students, room 2 - students). Taking into account that the attribute name is managed by the classic equality, we can clearly see that only the following pairs of students may have a resemblance far from 0: Pairs of the same student (SR(Student, stdnt i, stdnt i, V; t) ¼ 1), students 2 and 5, and students 4 and 6. To compute the resemblance between stdnt 2 and stdnt 5 we have to apply Eqs. (20a) (20e) as follows: SR(Student, stdnt 2, stdnt 5, {(room 1, room 2 )}, 0) This computation involves the following calculus (Eq. (20e)): SR(Name, stdnt 2 name, stdnt 5 name, {(room 1, room 2 ), (stdnt 2, stdnt 5 )}, 0) ¼ 1 (from Eqs. (20b) and (1)). SR(Height, stdnt 2 height, stdnt 5 height, {(room 1, room 2 ), (stdnt 2, stdnt 5 )}, 0) ¼ m med (1.7) ¼ 1 (from Eqs. (20b) and (1)). SR(Age, stdnt 2 age, stdnt 5 age, {(room 1, room 2 ), (stdnt 2, stdnt 5 )}, 0) ¼ m young (25) ¼ 0.8 (from Eqs. (20b) and (1)). Taking into account Eq. (20e) with a i [ {name, age, height} and applying the aggregation with o Q ¼ 0:2; we have: SR(Student, stdnt 2, stdnt 5, {(room 1, room 2 )}, Table 4 Domain for the age attribute Old Middle-aged Young Old 1 0.6 0 Middle-aged 1 0.6 Young 1

442 N. Marín et al. / Information and Software Technology 45 (2003) 431 444 Table 5 Domain for height attribute Tall Medium Short Tall 1 0.5 0 Medium 1 0.5 Short 1 0) ¼ o Q max i ðp ai ^ SR(C ai ; stdnt 2 a i ; stdnt 5 a i ; {(room 1, room 2 ), (stdnt 2, stdnt 5 )}, 0)) þð1 2 o Q Þ min i (SR(C ai ; stdnt 2 a i ; stdnt 5 a i ; {(room 1, room 2 ), (stdnt 2, stdnt 5 )},0) _ ð1 2 p ai Þ) ¼ (0.2) p (1) þ (1 2 0.2) p (0.8) ¼ 0.84. To compute the resemblance between stdnt 4 and stdnt 6 we have to apply Eqs. (20a) (20e) as follows: SR(Student, stdnt 4, stdnt 6, {(room 1, room 2 )}, 0) This computation involves the following calculus (Eq. (20e)): SR(Name, stdnt 4 name, stdnt 6 name, {(room 1, room 2 ), (stdnt 4, stdnt 6 )}, 0) ¼ 1 (from Eqs. (20b) and (1)). SR(Height, stdnt 4 height, stdnt 6 height, {(room 1, room 2 ), (stdnt 4, stdnt 6 )}, 0) ¼ m high (1.9) ¼ 1 (from Eqs. (20b) and (1)). SR(Age, stdnt 4 age, stdnt 6 age, {(room 1, room 2 ), (stdnt 2, stdnt 5 )}, 0) ¼ m young (24) ¼ 0.9 (from Eqs. (20b) and (1)). Taking into account Eq. (20e) with a i [ {name, age, height} and applying the aggregation with o Q ¼ 0:2; we have: SR(Student, stdnt 4, stdnt 6, {(room 1, room 2 )}, 0) ¼ o Q max i ðp ai ^ SR(C ai ; stdnt 4 a i ; stdnt 6 a i ; {(room 1, room 2 ), (stdnt 4, stdnt 6 )}, 0)) þð1 2 o Q Þ min i (SR(C ai ; stdnt 4 a i ; stdnt 6 a i ; {(room 1, room 2 ), (stdnt 4, stdnt 6 )}, 0) _ ð1 2 p i Þ) ¼ (0.2) p (1) þ (1 2 0.2) p (0.9) ¼ 0.92. Taking these latter calcula into account we can compute the resemblance between both sets of students using Eq. (7) (we will use the implication operator from Eq. (6)): SR {room 1 ;room 2 } (room 1 students, room 2 students) ¼ ^( Q SR {room 1 ;room 2 } (room 2 studentslroom 1 students), Q SR {room 1 ;room 2 } (room 1 studentslroom 2 students)) ¼ ^( min{1, 0.84, 0.94, 0.92}, min{1, 0.84, 1, 0.77}) ¼ 0.77 If we weigh this resemblance using the cardinality ratio (Definition 3), then we have that: SR(Student, room 1 students, room 2 students, {(room 1, room 2 )}, 0) ¼ F(room 1 students, room 2, students) p 0.77 ¼ 0.98 p 0.77 ¼ 0.76. Finally, taking into account Eq. (20e) with a i [ {quality, extension, floor, students} and applying the aggregation with o Q ¼ 0:2; we have: SR(Room, room 1, room 2, B,0)¼ o Q max i ðp ai ^ SR(C ai ; room 1 a i ; room 2 a i ; {(room 1, room 2 )}, 0)) þð1 2 o Q Þ min i (SR(C ai ; room 1 a i ; room 2 a i ; {(room 1, room 2 )}, 0) _ ð1 2 p i Þ) ¼ (0.2) p (1) þ (1 2 0.2) p (0.76) ¼ 0.81. Fig. 10. Rooms example s data.

N. Marín et al. / Information and Software Technology 45 (2003) 431 444 443 7. Conclusion and further work We have generalized the notion of state equality in order to compare fuzzy objects. As a result we can compute a resemblance between two objects of a given class. We have proposed: 1. A policy to handle resemblance between basic objects. 2. A set of operators for computing the resemblance between fuzzy sets of fuzzy objects. 3. A recursive way of comparing two complex objects of a given class, handling cycles using two definitions of resemblance. To accomplish this we have used the concept of similarity and resemblance from the theory of the fuzzy subsets. In particular: Resemblance relations are the basis that allows us to manage the different labels domains that can be defined in order to manage imprecision. The resemblance driven inclusion degree (Definition 1) allows us to build different operators to compute resemblance relations between fuzzy sets of imprecise objects. The generalized resemblance degree (Definition 2), the cardinality ratio (Definition 3) and the consistency degree (Definition 4) are founded on this important concept. The resemblance calculus between two objects of a same class is based on the use of both fuzzy resemblance relations and the latter operators. This calculus is a valuable tool in classes where, due to their ability to store fuzzy information, the classic equality cannot be applied. These tools are the foundation over which the information retrieval must be built in the objectoriented database. Future and ongoing work includes the following: The value equality generalization we have performed here allows resemblance comparisons. However, the retrieval of the database information may require the use of order relationships (partial or total) defined among the imprecise domains elements. Although the literature contains proposals that allow the management of both imprecision and uncertainty when dealing with fuzzy information [20], these propositions are insufficient in some cases. The set of operators proposed in this paper are focused on imprecision management. Now we need to extend them to deal with uncertainty in data. The operators studied here are resemblance measures whatever the possibility chosen by the designer of the class to configure them. As future work we will also study different configurations, pointing out which of them are similarity measures (studying transitivity properties). We are currently incorporating this comparison capability into our FOODBI (Fuzzy Object Oriented Data Base Interface) prototype. Acknowledgements Supported in part by the Spanish R&D project TIC99-0558. References [1] B. Stroustrup, What is object-oriented programming?, IEEE Software (1988). [2] M. Berler, J. Eastman, D. Jordan, C. Russell, O. Schadow, T. Stanienda, F. Velez, The Object Data Standard: ODMG 3.0, Morgan Kaufmann, Los Altos, CA, 2000. [3] J. Lee, J.-Y. Kuo, N.-L. Xue, A note on current approaches to extend fuzzy logic to object oriented modeling, International Journal of Intelligent Systems 16 (2001) 807 820. [4] M. Koyuncu, A. Yazici, A fuzzy database and knowledge base environment for intelligent retrieval, Proceedings of the IFSA/ NAFIPS World Congress (2001). [5] N. Marín, I.J. Blanco, O. Pons, M.A. Vila, Softening the object-oriented database-model: imprecision, uncertainty, and fuzzy types, Proceedings of the IFSA/NAFIPS World Congress (2001). [6] L.A. Zadeh, The concept of linguistic variable and its application to approximate reasoning I, Information Sciences 8 (1975) 199 251. [7] L.A. Zadeh, The concept of linguistic variable and its application to approximate reasoning II, Information Sciences 8 (1975) 301 357. [8] L.A. Zadeh, The concept of linguistic variable and its application to approximate reasoning III, Information Sciences 9 (1975) 43 80. [9] E.H. Ruspini, Imprecision and uncertainty in the entity-relationship model, in: H. Prade, C.V. Negiota (Eds.), Fuzzy Logic and Knowledge Engineering, Verlag TUV Reheiland, 1986, pp. 18 28. [10] M.A. Vila, J.C. Cubero, J.M. Medina, O. Pons, A conceptual approach for dealing with imprecision and uncertainty in object-based data models, International Journal of Intelligent Systems 11 (1996) 791 806. [11] R. Zwick, E. Carlstein, D.V. Budescu, Measures of similarity among fuzzy concepts: a comparative analysis, International Journal of Approximate Reasoning 1 (1987) 221 242. [12] J.-P. Rossazza, D. Dubois, H. Prade, A hierarchical model of fuzzy classes, in: Fuzzy and Uncertain Object-Oriented Databases. Concepts and Models, Advances in Fuzzy Systems Applications and Theory, vol. 13, 1998, pp. 21 61. [13] R.R. Yager, Fuzzy quotient operators, Proceedings of the IPMU (1992) 317 322. [14] L.A. Zadeh, A computational approach to fuzzy quantifiers in natural languages, Computers and Mathematics 9 (1983) 149 184. [15] R.R. Yager, On ordered weighted averaging aggregation operator in multi-criteria decision making, IEEE Transactions on Systems, Man, and Cybernetics (18) (1988) 183 190. [16] R.R. Yager, Families of owa operators, Fuzzy Sets and Systems (59) (1993) 125 148.