Granular Computing: Models and Applications

Size: px

Start display at page:

Download "Granular Computing: Models and Applications"

Suzanna Haynes
5 years ago
Views:

1 Granular Computing: Models and Applications Jianchao Han, 1, Tsau Young Lin 2, 1 Department of Computer Science, California State University, Dominguez Hills, Carson, CA Department of Computer Science, San Jose State University, CA INTRODUCTION Granular computing (GrC) is a general computing paradigm that effectively deals with elements and granules, vaguely generalized subsets. The objective of granular computing research is to build an efficient computational model for handling huge amounts of data, information, and knowledge. The terminology of granular computing was first proposed by Professor T. Y. Lin in as a label of family of theories, methodologies, and techniques that make use of granules, although its basic ideas and principles have been studied in various application domains for a long time. Especially in the form of partitions, the theory has been accumulated for thousands of years in mathematics. So the focus of GrC is on the nonpartition models. Let us first recall some results and thoughts in the pre-grc era, namely before the terms was invented. The explicit study of granular computing can be dated back to the late 1970s. In 1979, Zadeh 2 introduced the notion of information granulation and suggested that fuzzy set theory might find potential applications in this respect. Although we address the nonpartition theories, nevertheless, the partition case was the main source of inspiration. In 1982, Pawlak 3 proposed rough set theory to deal with inexact information. It is an uncertainty theory using a special form of granules, called equivalence classes. It is primarily the rough set theory (partition theory) that causes researchers to realize the importance of the systematic study of the generalized notion, GrC. In 1985, Hobbes 4 presented a theory of granularity as the base of knowledge representation, abstraction, heuristic search, and reasoning. In his theory the problem world is represented as various grains and only interesting ones are abstracted to learn concepts. The conceptualization of the world can be performed at different granularities and switched between them. Even though his discussion mainly focused on the partition cases, his model is more general than rough sets. It includes reflexive and symmetric binary relations. Author to whom all correspondence should be addressed: jhan@csudh.edu. tylin@cs.sjsu.edu. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, VOL. 25, (2010) C 2009 Wiley Periodicals, Inc. Published online in Wiley InterScience (

2 112 HAN AND LIN In , from the approximation retrieval, Lin 5 introduced the notion of neighborhood systems as models of uncertainty; a neighborhood is a unit of uncertainty. Its mathematics originates from topology, called topological neighborhood systems. It attaches to every point p a collection of subsets that satisfy a set of axioms (the axioms of topology). Each such subset is called a neighborhood of p, and the point p is called a center of the neighborhood. Lin removed the axioms and extended the theory. This extended notion is quite general, for example, 1. The family of α-cuts of a fuzzy set is a neighborhood system of a real set. A real set consists of those points whose memberships are exactly one A partition (equivalence relation) is a topological neighborhood system and forms a very special case of topological spaces, called Clopen space (or Pawlak topological space) A binary relation is a special form of neighborhood systems, where each point contains at most one neighborhood. In 1989, Lin 7 used the neighborhood as a unit of basic knowledge (a list of foes) and applied it to computer security; see also Ref A covering ={X i i = 1, 2,...} is an open neighborhood system. 9 Each X i is a neighborhood of every point in it. In other words, every point in the neighborhood is a center point. In 1992, Giunchigalia and Walsh 10 presented a theory of abstraction to improve the conceptualization of granularities. In 1996, Lin, based on Zadeh s granular mathematics (GrM), proposed the term granular computing and formed a special interesting group. In Refs. 11, 12, Zadeh outlined his views of GrC/GrM. Since then, granular computing has received more and more attentions and much research has been conducted in various aspects of this area and has begun to play important roles in various fields, such as approximate retrieval, machine learning, data mining, bioinformatics, e-business, computer security, control, highperformance computing, and wireless mobile computing. 5,7,13 23 In this special issue, we collect six articles to reflect the developments and applications of some special models/views of granular computing, including lattice model, rough set model, association analysis model, and classification model. 2. MODELS AND APPLICATIONS Granular computing has been a shifting paradigm; various views have been proposed. Informally, some computing theories and models that deal with granules may be called granular computing or softer version of granular computing, where granules are generalized subsets, which are regarded as given basic knowledge. In 1980s, although many AI and database/knowledge researchers, including Japanese Fifth generation computing project, had proposed many complex knowledge models, a simpler and effectively computable view of knowledge is in need. Rough set theory takes a courageous step and assumes that partitions (classifications) are essence of human knowledge and focuses on the development of knowledge engineering and uncertainty managements, at this level. The developments have been rewarded with great success. Intuitively, elements are the data, and granules are the (units of) basic knowledge or lack of knowledge (uncertainty). So granular computing

3 GRANULAR COMPUTING: MODELS AND APPLICATIONS 113 provides the infrastructure for data and knowledge engineering and uncertainty management, or more generally AI-engineering. Recently, Lin proposed a set of examples, including Zadeh s intuitive view of GrC, to define GrC implicitly and a category theory based GrC model to define GrC formally. 24,25 In his model, there are two forms, commutative and noncommutative, of granules. For example, the family of ordered keyword sets (text is a linearly ordered words), a collection of committees in social networks (each member has distinct roles), and a collection of tuples (in relations) are collections of noncommutative granules. The neighborhoods are commutative granules. Hierarchy theory 26 and lattice model have been applied to construct and formalize concept hierarchies. In an article of this issue, The Design and Application of Structured Types in Ptolemy II, Zhao, Xiong, Lee, Liu, and Zhong organize all base data types such as integer, double, Boolean, char into a lattice to model subtyping relations among them, and type constraints in components and across components are described over the lattice as inequalities that can be solved efficiently. Structured types including arrays, records, and unions can be added to the base type lattice. Authors expose and discuss some technique challenges of adding structured types, especially the infiniteness of the lattice, recursive structured types, and the inequality constraints on structured types. They suggest answers to these challenging questions and propose corresponding solutions, which are applied to design and model a simplified charity organization and wireless protocols based on IEEE media access control and physical specifications. According to Agrawal et al., 27 association rules mining from a large database of customer transactions that consist of items purchased by customers is to find significant associations between items such that which items are always or often bought together with which items. These significant associations are described as a set of associations and quantitatively measured with support and confidence. Various algorithms to mining associations have been developed The problem of mining association rules can be reduced to GrC. 5,15,16 Qiu, Chen, Liu, and Huang present a Granular Computing Approach to Finding Association Rules in Relational Database in this special issue. Elementary granules are defined in two aspects: intension and extension, where the intension is an attribute-value pair, while the extension is the collection of objects (records in a database table) that satisfy the intension. All elementary granules are generated by scanning a relational database table and stored in the elementary granule table in memory. The elementary granule table contains all elementary granules as 3-tuples consisting of number of objects, intension, and a pointer to the linked list of objects (extension) contained. Frequent 1-itemsets can be checked out from the elementary granule table to form a frequent 1-itemset granule table. By keeping attributes in order, frequent 2-itemsets are generated from frequent 1-itemsets and stored in frequent 2-itemset granule tables, frequent 3-itemsets from frequent 2-itemsets and in frequent 3-itemset granule tables, and so on. Generally, frequent k-itemsets are generated by combining two frequent (k 1)-itemsets from the different nodes of the same linked list if the combination satisfies the support threshold. Corresponding algorithms are described and illustrated with a simplified example and a testing data set. They claim that the algorithms can reduce the number of candidate itemsets and save the computing time.

4 114 HAN AND LIN From some sense, Qiu et al. take advantages of rough set computing model to define the elementary granules as the partitions based on attribute-value pairs, which are the special cases of indiscernibility equivalent relations extensively used in rough set theory. In another article, A Method of Discovering Important Rules Using Rules as Attributes, Li and Cercone apply rough set theory to find significant and important association rules. First, they use existing reduct generation algorithm based on rough set theory to find attribute reducts from the original data set and then generate a set of association rules in terms of each reduct using the classic Apripri algorithm. The resulting association rules are in the form such that the antecedents of a rule is from the value of condition attributes in a reduct, and the consequents of a rule are from the value of decision attributes from the original data set. Since reducts contain the most representative and important condition attributes of a decision table, they assume that rules extracted based on these reducts are representative of the original decision table and therefore are considered more important than the rules generated without using reducts. With this intuition, the rules generated from these reducts are used to construct a decision table, with each of the individual rules being a condition attribute and decision attributes being kept the same. The reduct extracted from such a decision table would contain representative and important attributes, which are the association rules. The reduct generation algorithm is, in turn, applied to this newly constructed decision table, and the result is the reduct consisting of a set of rules that are most important. In their article, Mining Hidden Connections among Biomedical Concepts from Disjoint Biomedical Literature Sets through Semantic-Based Association Rule, Hu, Zhang, Li, Yoo, Zhou, Xu, and Wu adapt the association rule mining approach to automatically identify implicit novel connections among biomedical concepts from disjoint biomedical literature sets. The adapted rules are called semantic-based association rules. The idea behind is described as follows: Assume that three biomedical concepts A, B, and C occur in biomedical literature, where concepts A and C occur in two disjoint sets of documents, but B co-occurs with A and C in some documents. If both A is associated with B and B is associated with C, we have rationales to expect that A and C may have some association or correlation. To find such kind of associations, the authors first apply association rule mining algorithm to the two disjoint sets of biomedical literature separately to generate two sets of association rules in the implication form of A B from the first set of documents and B C from the second set of documents, and then apply the transitive law to conclude the novel implication A C. Owing to the characteristics of association rules which are not actually logical implications, this transitive operation may be too weak to guarantee the association between A and C in the sense of association rules based on the support-confidence framework. On the other hand, this simple application of transition generates a huge number of possible connections among the millions of biomedical concepts and many of these hypothetical connections are spurious, useless, and/or biologically meaningless. To get the problem around, Hu et al. develop a new approach, called biomedical semantic-based association rule system, or Bio-SARS for short, to generate highly likely novel and biologically relevant connections among the biomedical concepts. With semantic filtering,

5 GRANULAR COMPUTING: MODELS AND APPLICATIONS 115 Bio-SARS can significantly reduce the number of spurious, useless, and biologically irrelevant connections, where semantic is based on biomedical anthologies such as MeSH and unified medical language system (UMLS). With a given concept C, the algorithm searches for a set of biomedical documents in which C occurs from online biomedical databases. The algorithm then extracts a set of concepts B from these documents searched and utilizes the semantic knowledge in UMLS to check the semantic type of each concept in B with that of C. All concepts in B that have different semantic types from C are filtered out, and those remaining concepts in B that have strong enough associations with C are selected and used as given concepts to find another set of concepts A in the same way such that all concepts in A co-occur with concepts in B in some documents but never co-occur in any documents with C. After concepts in A that have week associations with concepts in B are removed, the remaining concepts in A are verified to have strong associations with the concept C. Their experiments demonstrate that the discovered associations among biomedical concepts are novel and can be useful for domain expert to conduct new experiment, try new treatment, and so forth. Some authors think classification and clustering analysis as the same model of granular computing, but views are different. Clustering in data mining is partitioning a data set by the similarity. In other words, it is to transform a granulation (granulated by similarity) into partition: see Ref. 9 for the notion of induced partition. Classification problem is, given a training set of examples with class labels, to construct a classifier that is able to assign a class label to a new example without class label that is not in the training set. All examples are described with a given set of features (attributes). It is one form of partitioning. In both models, the granules are groups of the given set of examples, and the granulation is to partition the set of examples into groups. In this special issue, an article Ranking and Selecting Terms for Text Categorization via SVM Discriminate Boundary is presented by Kuo and Yajima. They utilize support vector machines (SVM) to discriminate boundary between classes and rank and select terms for text categorization in document classification. To classify documents into predefined categories, documents are converted into vectors that are composed of a list of words that occur in documents, and then document classification is performed on these document vectors. However, since documents usually contain too many words that are redundant, it is mandate to select most significant, discriminant, and representative words to represent corresponding documents. Currently dominating approaches based on LSI (latent semantic indexing) 9 and χ 2 statistics values are commonly used in document classification and retrieval research, but bear some weakness. The authors exploit several properties of the SVM with RBF (radial basis function) kernel functions to calculate the data points that lie on the nonlinear discriminant boundary and show that these data points, as well as their gradient vectors, can be calculated efficiently only by the elementary matrix and vector calculation. Gradient vectors on the boundary are defined and the contribution of each word to the document is given by the diagonal of the associated decision boundary feature matrix (DBFM) generated from the collection of these gradient vectors. The discriminant and significant words can be ranked and selected according to the words contributions. The authors also demonstrate with

6 116 HAN AND LIN experiments of real-world data sets that the method proposed in the paper has much better performance than existing approaches. Finally, we come to the paper A Novel Clustering Algorithm Using Hypergraph Based Granular Computing. The authors, Liu, Liao, Yang, and He, apply granular computing and hypergraph 9 in clustering a set of documents. Conceptwise, this article presents a very interesting view on partitioning a granulated space into a partition. Hypergraph (or simplicial complex) itself is a granular model; vertices are the underlying universe, and hyperedges (simplexes) are the granules. Basically, the approach extracts the frequent item sets via association rule mining algorithm, and these frequent item sets form hyperedges (simplexes) in a hypergraph (simplicial complex). Then the multilevel hypergraph partitioning algorithm is used to partition the hypergraph into k parts. Two criteria fitness and connectivity to prune the bad clusters and bad vertices are defined. Experiments of this method have been conducted with various data sets, and the results are compared between different data sets in terms of the clusters entropy and the algorithm response time. References 1. Lin TY. Granular computing. Announcement of the BISC Special Interest Group on Granular Computing, Zadeh LA. Fuzzy sets and information granularity. In: Gupta M, Ragade R and Yager R, editors. Advances in fuzzy set theory and applications. Amsterdam: North-Holland; pp Pawlak Z. Rough sets. Int J Comput Inf Sci 1982;11: Hobbs JR. Granularity. In: Proc of the 9th Int Joint Conf on Artificial Intelligence, pp Lin TY. Neighborhood systems and approximation in database and knowledge base systems. In: Proc of the Fourth Int Symp on Methodologies of Intelligent Systems, Poster Session, October 12 15, pp Lin TY. Topological and fuzzy rough sets. In: Slowinski R, editor. Decision support by experience application of the rough sets theory. Boston, MA: Kluwer; pp Lin TY. Chinese wall security policy an aggressive model. In: Proc of the Fifth Aerospace Computer Security Application Conf, December 4 8, pp Lin TY. Chinese wall security policy models: information flows and confining trojan horses. Proc 17th IFIP11.3 Working Conf on Database and Application Security, Estes Park, CO, August 4 6, 2003, pp Lin TY. Granular computing on binary relations I: data mining and neighborhood systems and II: rough set representations and belief functions. In:. Skowron A, Polkowski L. editors. Rough sets in knowledge discovery. Heidelberg, Germany: Physica-Verlag; pp , Giunchglia F, Walsh T. A theory of abstraction. Artif Intell 1992;56: Zadeh LA. Towards a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Syst 1997;19: Zadeh LA. Some reflections on soft computing, granular computing and their roles in the conception, design and utilization of information/ intelligent systems. Soft Comput 1998;2: Bargiela A, Pedrycz W. Granular computing: An introduction. Boston, MA: Kluwer; Lin TY. A rough logic formalism for fuzzy controllers: A hard and soft computing view. Int J Approx Reason 1996;15(4):

7 GRANULAR COMPUTING: MODELS AND APPLICATIONS Lin TY. Data mining: granular computing approach. Proc the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Beijing, China, April 26 28, 1999, pp Lin TY. Data mining and machine oriented modeling: a granular computing approach. J Appl Intell 2000;13(2): Lin TY, Chiang I-J. A simplicial complex, a hypergraph, structure in the latent semantic space of document clustering. Int J Approxi Reason 2005;40(1 2): Pawlak Z. Granularity of knowledge, indiscernibility and rough sets. In: Proc IEEE Int Conf on Fuzzy Systems, pp Peters JF, Pawlak Z, Skowron A. A rough set approach to measuring information granules. In: Proc COMPSAC 2002, Oxford, England. pp Polkowski L, Skowron A. Towards adaptive calculus of granules, In: Proc IEEE Int Conf on Fuzzy Systems, pp Skowron A, Stepaniuk J. Information granules: towards foundations of granular computing. Int J Intell Syst 2001;16: Yao YY. Perspectives of Granular Computing, Proc. Of IEEE Int Conf on Granular Computing 1: Beijing, China 2005, pp Zhang L, Zhang B. The quotient space theory of problem solving. LNCS 2003;2639: Lin TY. Granular computing, practices, theories and future directions. In: Meyers RA, editor. Encyclopedia on complexity and systems. Berlin: Springer; pp Lin TY. Granular computing I: the concept of granulation and its formal model. Int J Granular Comput Rough Sets Intell Syst 2009;1(1): Ahl V, Allen TFH. Hierarchy theory, a vision, vocabulary and epistemology. Irvington, NY: Columbia University Press; Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases. In: Proc ACM SIGMOD Conf, Washington DC; Agrawal R, Srikant R. Fast algorithms for mining association rules. In: Proc of the 20th Int Conf Very Large Data Bases, Santiago, Chile; pp Houtsma M, Swami A. Set-oriented mining for association rules in relational databases. In: Proc of the IEEE Int Conf on Data Engineering; pp Klemettinen M, Mannila H, Ronkainen P, Toivonen H, Verkamo A. Finding interesting rules from large sets of discovered association rules. In: 3rd Int Conf on Information and Knowledge Management; pp Savasere A, Omiecinski E, Navathe S. An efficient algorithm for mining association rules. In: Proc of 21th VLDB Conf, Switzerland; 1995.

Granular Computing. Y. Y. Yao

Granular Computing. Y. Y. Yao Granular Computing Y. Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: yyao@cs.uregina.ca, http://www.cs.uregina.ca/~yyao Abstract The basic ideas