A Calculus f Fuzzy Queries on Fuzzy Entity-Relationship Model Dr. Narasimha Bolloju Department of Infmation Systems City University of Hong Kong Tat Chee Avenue, Kowloon, Hong Kong Ph.: (852) 2788-7545 Fax: (852) 2788-8694 Email: isnarsi@is.cityu.edu.hk Abstract Most query languages are designed to retrieve infmation from databases containing precise and certain data using precisely specified commands. Application of fuzzy set they to relational data models has been studied extensively in recent years. This paper presents a calculus f fuzzy queries on a fuzzy entity-relationship model. The paper, first, defines a fuzzy entity-relationship model capable of representing imprecision and uncertainty in entities, attributes, and relationships. Then, it describes a calculus f fuzzy queries along with operational semantics. Some of the key aspects of this calculus are the provision of multiple terms, aggregate functions, and various fms of quantification. 1
1. INTRODUCTION It is assumed, in many query situations, that the databases contain precise and certain data. The queries on such data are also meant to be precisely and certainly expressed. In recent years there has been a significant deviation to these assumptions. The application of fuzzy set they to relational data models and the associated query system is one maj shift in addressing the vagueness in the data and the query specification. The research in this direction includes extensions to SQL to facilitate vague queries on relational databases (Bosc et al., 1988; 1994), functional dependencies in fuzzy relational data models (Raju and Majumdar, 1988), fuzzy extensions to relational calculus and relational algebra (Lee et al., 1993; Lee and Kim, 1993a, 1993b; Takahashi, 1993), and a logic based approach to the fuzzy relational databases to deal with various fms of fuzziness and a domain calculus based fuzzy query language (Villa et al.,1994). Gogolla and Hohenstein, (1991), while commenting on the appropriateness and the expressiveness of Chen's Entity-Relationship model (Chen, 1976), state "... it (ER model) captures most of the imptant phenomenon of the real wld and expresses them in a natural and easily understandable way." However, the representation of imprecise and uncertain data, among numerous extensions and enhancements to this model, has not been considered. An extension in this direction would make the ER model capable of representing one of the imptant phenomenon of the real wld. Bolloju (1990) discusses the possible situations where the need exists to represent imprecision and uncertainty in entities, attributes, relationships and integrity constraints. This paper presents an extension related to this imptant but not yet considered aspect of representing and manipulating vagueness in the ER model. We extend the ER model using fuzzy set they and the possibility they (Zadeh 1978, 1983; Prade, 1985a, 1985b) and various developments in ER calculus f query specification (Gogolla and Hohenstein, 1991; Hohenstein and Engels, 1991; 2
Parent et al., 1990). In the next section, we describe the fuzzy entity relationship model and illustrate the model with a simple example. Later in section 3, we present the calculus f fuzzy queries on the fuzzy ER model. Section 4 concludes by identifying further wk. 2. FUZZY ER MODEL The fuzzy ER model is an enhanced ER model with extensions to represent imprecision and uncertainty in the entities, attributes and relationships using fuzzy sets and necessity-possibility measures. A Fuzzy Entity-Relationship model can be defined as comprising: a) a set of entity types E 1,E 2,...,E m, b) each entity type E i having a set of attributes a i1, a i2,.., a in, c) each attribute a ij is defined on an extended domain B k {F 1,F 2,...,F f } where B k is the base domain and F i 's are either fuzzy subsets defined on B k fuzzy subset expressions (union, intersection, modifiers such as very, rather), d) a necessity-possibility measure (discussed below) is associated with each entity of E i to represent the certainty of that entity belonging to E i, e) a set of relationship types R 1,R 2,...,R r, f) each relationship type R i is defined on two me entity types (not necessarily distinct), and g) a necessity-possibility measure is associated with each relationship of R i s to represent the certainty of that instance belonging to R i. The representation of uncertainty using necessity-possibility measures is based on possibility they and fuzzy logic (Zadeh, 1985; Prade, 1985a,1985b; Prade and Testemele, 1987). Under this approach a pair of measures, [n,p], (each on interval [0,1]) is associated with entities and relationships to indicate the certainty measure. The measure p represents the possibility that an entity a relationship belongs to a given type, and the measure n represents the impossibility that an entity a relationship not belonging (i.e., the opposite) to the given type. 3
An example of fuzzy ER model f a simplified crime-criminal database can include the following (note the crespondence to the above definition): a) entity types CRIME, CRIMINAL, SUSPECT b) attributes of CRIME - type, location, description, date, time, weaponused attributes of CRIMINAL - id, name, address, height, weight, complexion c) domain HTS f the attribute height 50 to 200 cm with fuzzy subsets such as TALL, MEDIUM, SHORT, TALL MEDIUM, etc. d) entity <1245, goonda, rowdy street, TALL, 60, dark>, [0.8,1] of SUSPECT entity type indicating the certainty of the suspect as [0.8,1] e) relationship types COMMITTED, ACCOMPLICE f) relationship type COMMITTED on entity types CRIMINAL, CRIME relationship type ACCOMPLICE on entity type CRIMINAL, CRIMINAL g) relationship <1245, 1345>, [0.9,1] of ACCOMPLICE relationship type. 3. CALCULUS FOR FUZZY QUERIES The maj elements considered in the calculus f fuzzy queries are multiple fuzzy terms, aggregate functions on fuzzy terms, and quantifications including the fuzzy ones. We define a fuzzy query, using the notation of -[... ]- f bags in calculus (Gogolla and Hohenstein 1991), as: -[ T X # C ]- where T is a list of terms t 1,t 2,...,t n, X is a range expression of the fm E 1 (x 1 ) E 2 (x 2 )... E m (x m ) ψ(x 1,x 2,...,x m ) where ψ(x 1,x 2,...,x m ) is a well-fmed fmula (see below) with all free variables x i s in T are existentially quantified, and C is a threshold necessity-possibility measure [n t, p t ]. The terms t 1,t 2,...,t n define the target infmation, the expression E 1 (x 1 ) E 2 (x 2 )... E m (x m ) defines the respective finite ranges of the variables x 1, x 2,... x m, 4
and ψ(x 1,x 2,...,x m ) is a qualifying fmula. Evaluation of the range expression X results in the truth value as a necessity-possibility measure [n,p] such that n n t and p p t in addition to the binding of variables to entities. A term t can be : c x.a i f(t 1,t 2,...,t n ) g(t, X) h(x, X) where c is a constant (fuzzy crisp), x.a i is the value (fuzzy crisp) of attribute a i of the entity identified by variable x, f is a function (fuzzy crisp) with terms t 1,t 2,...,t n as the arguments, g and h are aggregate functions on t and x over the range expression X. An atomic fmula can be: E i (x) R i (x 1, x 2,..., x m ) t 1 θ t 2 where E i R i is an entity type, is a relationship type, x i s are the variables in relationship R i, θ is a relational (comparison) operat, fuzzy crisp, and t 1 and t 2 are the operands of θ. 5
Evaluation of atomic fmulae result in a truth value expressed in terms of a necessity-possibility measure. F the first two of the above fms, it is the certainty measure associated with the respective entities and relationships. The result of comparison of two terms can be defined by adapting the definition of pattern matching (Prade 1985a, 1985b). Given fuzzy subsets of pattern P and datum D defined on domain B, the possibility and necessity of P θ D can be defined as: p(p D) = sup a b a,b B min( P (a), D (b)) n(p D) = 1 - p(p D) where µ P and µ D are fuzzy membership functions of fuzzy subsets P and D respectively. Now, we can define the well-fmed fmula (wff) as: a) every atomic fmula is a wff, and b) if ψ 1 and ψ 2 are wffs then the following are also wffs: ψ 1 ψ 2 ψ 1 ψ 2 ψ 1 (ψ 1 ) x X : ψ 1 x X : ψ 1 Q x X : ψ 1 6
Qm Q x X : ψ 1 where,, are conjunction, disjunction and negation operats respectively,, are universal and existential quantifiers, Q Qm stands f a fuzzy quantifier such as most, few, etc. and stands f a modifier to the fuzzy quantifier Q such as very, rather, etc. The truth values of ψ 1 and ψ 2 are combined using t-nm and co-t-nms (e.g., min f, max f ) to arrive at the combined necessity-possibility measure.. The above can also be applied to the universal and existential quantifications. F fuzzy quantification Σ count (Zadeh 1983) can be adapted to evaluate the combined truth value. Some examples of fuzzy queries based on the example fuzzy ER model presented in the previous section are given below. In these examples we employ the notation of upper case symbols f entity names and relationship names, lower case symbols f attributes and variables, italics f various fuzzy subsets, comparison operats and modifiers. Example 1: Find the criminals who are me than 175cm in height. -[ c.name CRIMINAL(c) c.height > 175 # [1,1] ]- This is a precise query with the variable c defined to range over all the criminal entities. 7
Example 2: Find the tall criminals with heavy build. ]- -[ c.name CRIMINAL(c) c.height = TALL c.weight = HEAVY_BUILD Each of the selected entities will be associated with an NP measure that indicates the degree to which that entity satisfies the specified condition. It is possible to define a threshold, to select only the entities having a minimum NP measure, as shown below: -[ c.name CRIMINAL(c) c.height = TALL c.weight = HEAVY_BUILD # [0.5,1] ]- Example 3: Find very tall criminals and the number of crimes they had committed. -[ c.name, count(cr, CRIME(cr) COMMITTED(c,cr)) CRIMINAL(c) c.height = very TALL # [1,1] ]- Example 4: Find the sht criminals who have committed at least 3 burglaries. -[ c.name CRIMINAL(c) c.height = SHORT ^ count(cr, CRIME(cr) COMMITTED(c,cr) cr.type = burglary) > 2 ]- Example 5: Find the criminals who have past recd of using scisss as a weapon. -[ c.name CRIMINAL(c) cr CRIME(cr) COMMITTED(c,cr) : cr.weaponused = scisss) ]- The above query illustrates the use of existential quantifier. 8
Example 6: Find the criminals who have committed all burglaries in the early hours. -[ c.name CRIMINAL(c) cr CRIME(cr) COMMITTED(c,cr) : cr.type = burglary ^ cr.time = EARLY_HOURS) ]- Example 7: Find the criminals who have committed most of the burglaries in the early hours during winter. -[ c.name CRIMINAL(c) most cr CRIME(cr) COMMITTED(c,cr) : cr.type = burglary cr.time = EARLY_HOURS month_of(cr.date) = WINTER) ]- Fuzzy quantifier most is used in the above query to specify a quite realistic situation. 4. CONCLUSIONS Extensions to the ER model to represent and manipulate imprecision and uncertainty existing in the real wld are described in this paper. A calculus f fuzzy queries on the fuzzy ER model is presented, and its operational semantics are defined. This calculus includes the use of multiple terms, aggregate functions, and quantification. The examples presented in the above section illustrate the expressive power of the queries based on this calculus. Some of the possible directions f further wk are the representation of imprecision and uncertainty in the integrity constraints, proof of completeness both with respect to and similar to relational calculus, and efficient implementation of query languages based on the above calculus f fuzzy queries. 9
5. REFERENCES Bolloju, N. (1990) Modelling of Imprecise and Uncertain Infmation, in: Prakash, N. (Ed.) Current Trends in Management of Data, Tata-McGraw Hill: New Delhi. Bosc, P. and Pivert, O. and Farquhar, K. (1994) Integrating Fuzzy Queries into an Existing Database Management System: An Example, International Journal of Intelligent Systems, 9, 475-492. Chen, P.P. (1976) The Entity-Relationship Model - Towards a Unified View of Data, ACM Transactions on Database Systems, 1, 9-36. Gogolla, M. and Hohenstein, U. (1991) Towards a Semantic View of an Extended Entity-Relationship Model, ACM Transactions on Database Systems, 16, 369-416. Hohenstein, U. and Engels, G. (1991) Fmal Semantics of An Entity-Relationship- Based Query Language in: Kangassalo, H. (Ed.) Entity-Relationship Approach: The Ce of Conceptual Modelling, Elsevier Science Publishers: Nth-Holland, 177-194. Lee, D., Kim, M.H., Lee-Kwang, H., and Lee, Y-H. (1993) A Fuzzyfication of the Relational Data Model in: Moon, S. and Ikeda, H. (Eds) Database Systems f Advanced Applications '93, Wld Scientific: Singape, 360-367. Lee, D.H., and Kim, M.H. (1993a) Accommodating Subjective Vagueness Through a Fuzzy Extension to the Relational Data Model, Infmation Systems, 18, 6, 363-374. Lee, D.H., and Kim, M.H. (1993b) Extending Semantics of Relational Operats f Vague Queries, Microprocessing and Micro-programming, 39, 165-168. Parent, C., Rolin, H., Yetongnon, K, and Spaccapietra, S. (1990) An ER Calculus f the Entity-Relationship Complex Model in: Lochovsky, F.H. (Ed.) Entity- Relationship Approach to Database Design and Querying, Elsevier Science Publishers: Nth-Holland, 361-384. Prade, H. (1985a) A Quantitative Approach to Approximate Reasoning in Rulebased Expert Systems, in: Bolc, L. and Coombs M.J. (Eds), Expert System Applications, Springer-Verlag. Prade, H. (1985b) A Computational Approach to Approximate Reasoning and Plausible Reasoning with Applications to Expert Systems, IEEE Transactions on PAMI, PAMI-7, 3. Prade, H. and Testemale, C. (1987) Application of Possibility and Necessity Measures to Documentaty Infmation Retrieval, in: Bouchon, B. and Yager, R.R. (Eds) Uncertainty in Knowledge-Based Systems, Springer-Verlag. 10
Raju, K.V.S.V.N. and Majumdar, A.K. (1988) Functional Dependencies and Lossless Join Decomposition of Fuzzy Relational Database System, ACM Transactions on Database Systems, 13, 2, 129-166. Takahashi, Y. (1993) Fuzzy Database Query Languages and Their Relational Completeness Theem, IEEE Transactions on Knowledge and Data Engineering, 5, 1, 122-125. Villa, M. A., Cubero, J. C., Medina, J. M. and Pons, O. (1994) A Logic Approach to Fuzzy Relational Databases, International Journal of Intelligent Systems, 9, 449-460. Zadeh, L.A. (1978) Fuzzy Sets as a Basis f a They of Possibility, Fuzzy Sets and Systems, 1, 3-28. Zadeh, L.A. (1983) The Role of Fuzzy Logic in the Management of Uncertainty in Expert Systems, Fuzzy Sets and Systems, 11, 199-227. 11