Mathematical Foundation of Association Rules - Mining Associations by Solving Integral Linear Inequalities

Size: px

Start display at page:

Download "Mathematical Foundation of Association Rules - Mining Associations by Solving Integral Linear Inequalities"

Lesley Reynolds
5 years ago
Views:

1 Mathematical Foundation of Association Rules - Mining Associations by Solving Integral Linear Inequalities Tsau Young ( T. Y. ) Lin Department of Computer Science San Jose State University San Jose, CA 95192, USA tylin@cs.sjsu.edu Abstract Association(rules)s (not in rule forms) as patterns in data are critically analyzed. We build the theory based only on what data says, and no on other implicit assumptions. Data mining is regarded as a deductive science. First, we observe that isomorphic relations have isomorphic associations. Somewhat a surprise, such a simple observation turns out to have far reaching consequences. It implies that associations are properties of an isomorphic class, not an individual relation. Similar conclusion can be made for the probability theory based on item counting, hence it cannot characterize the interesting-ness, since the latter one is a property of individual relation. As a byproduct of this analysis, we find that all generalized associations can be found by simply solving a set of integral linear inequalities - this is a very striking result. Finally, we observe that from the structure of relation lattice, we may conclude that random sampling may loose substantial information about patterns. Keywords: attribute, feature, data mining, granular computing 1. Introduction In a superficial view, data mining and knowledge discovery are transformations (extracting or discovering) of data into patterns or knowledge: Schematically 1. DM: Patterns Data, or 2. KD: Knowledge Data In this paper, we will focus on a very narrow, but very popular special topics, namely, the association(rule)s mining (Two standard measures called support and confidence are used for association rules; the one that only concerns with support is called associations). For this problem, the transformation is mathematical deductions. In other words, we treat data as the axiom and associations as interesting theorems to be derived from the data. 2 The Raw Data - Relations in Relational Databases What are the raw data in association rule mining? Obviously, they are the relational tables in relational databases. Unfortunately, such a naive statement often has many unexpected hidden side effects. In traditional database processing, a relational table always lives under a relational schema. A relational schema is a data structure in which each data carries its full semantics perceived by a human mind. For example, the attribute, COLOR, in a relational schema means exactly what human thinks. Therefore its possible values are yellow, blue, and etc. DBMS, then, process these data based on such (human perceived) semantics. To echo such an embedded human view, we will call such a relational table a table-in-mind. This is the traditional database view of a relational table, which is, however, not the correct view for data mining. In data mining the raw data is still the relational table, however, they are treated intrinsically different. In data mining, all algorithms process the relational table without consulting (1) relational schema nor (2) the human view of the data. In data mining, an attribute value is merely a symbol stored in the computer system. Changing the symbols will not affect the outcomes, more precisely, they will produce the same associations, but with different set of symbols. In data mining, a relational table is merely a stored table without any additional semantics. So we shall call it a table-in-system.

2 2.1 Tables-in-Systems We formalize the notion of tables-in-systems, which is a view in relational database, but from data mining prospect. Fact 1 An attribute is merely a name or symbol of a column during the processing of data mining; ; it has no human perceived semantics (during data mining). Fact 2 Fact 1 implies that an attribute values in a table-insystem is a merely a symbol; it has no human perceived semantics (during data mining). Next we will assume there is no redundancy in data, Assumption All attributes of a table-in system are nonisomorphic; To see why such an assumption does not loose the generality, please see [12]. For convenience, we will regard a table-in-system as a knowledge representation of some entities. The set of such entities will be denoted by, and called the universe. Let be a set of attributes, and their (active) attribute domains be, where active is a database term to emphasize the fact that is the set of actual values that have occurred in the representation. Each, often denoted by, is a Cantor set. A table-in-a-system is a map (= a single valued function) One can also regard an attribute as a map. So the relation,, is a join of attributes (single column knowledge representations); see [16, 18]. In traditional database theory, the image of the map, is the relational table (table-in-system). The independent variable plays no explicit role. However, in data mining, it is more convenient to have independent variables in the formulation. So in this paper, we will use the graph, which is called the information table in rough set community. Throughout the whole paper, by abuse of notation, may mean (1) the knowledge representation (2) the information table (3) the classical relation, the image of, which is a set of tuples (actually should be a bag). (4) Since and determines and vice versa, we may use and interchangeably. (5) We will simply refer any of them as the table(-insystem). 2.2 Target - High Frequency Patterns Association rule mining is originated from on the market basket data [1]. However, in many software systems, the data mining tools are added to general DBMS. So we will be interested in data mining on general relations. For definitive, we have the following translations: an item is an attribute value, a -itemset is a subtuple of length, a large -itemset is a high frequency -pattern. In other words, A subtuple of length is a high frequency q-patterns, or simply -pattern, if its occurrences are greater than or equal to a given threshold. Whne q is understood, we will drop the. 3 Interesting-ness of Associations 3.1 Isomorphic Relations and Patterns We take this section from [12] almost verbatim. Attributes and are isomorphic iff there is a one-to-one and onto map, such that. The map is called an isomorphism. Intuitively, two attributes (columns) are isomorphic iff one column turns into another one by properly renaming its attribute values. Let and be two information tables, where and. Then, and are said to be isomorphic if every is isomorphic to some, and vice versa. By our assumption (all attributes are distinct), and have the same degree (number of attributes), that is, ; See more general version in [12]. The following theorem should be obvious. Theorem Isomorphic relations have isomorphic patterns. The impacts of this simple theorem are rather far reaching. It essentially declares that patterns are syntactic in nature. They are patterns of the whole isomorphic class, even though many of isomorphic relations may have very different semantics; see next Section 3.2. Theorem The interesting-ness (of associations) defined by the item counting and its probability theory is a property of isomorphic class. 3.2 Illustration on Interesting-ness The two relations, Table 1, 2, are isomorphic, but their semantics are completely different, one table is about (hardware) parts, the other is about suppliers (sales persons). These two relations have isomorphic associations;

3 K ( Business Birth CITY) Amount (in m.) Day ( TWENTY MAR NY ( TEN MAR SJ ( TEN FEB NY ( TEN FEB LA ( TWENTY MAR SJ ( TWENTY MAR SJ ( TWENTY APR SJ ( THIRTY JAN LA ( THIRTY JAN LA Table 1. A Table K K ( Weight Part Material Name ( 20 SCREW STEEL ( 10 SCREW BRASS ( 10 NAIL STEEL ( 10 NAIL ALLOY ( 20 SCREW BRASS ( 20 SCREW BRASS ( 20 PIN BRASS ( 30 HAMMER ALLOY ( 30 HAMMER ALLOY Table 2. An Table K 1. Length one: TEN, TWENTY, MAR, SJ, LA in Table 1 and 10, 20, SCREW, BRASS, ALLOY in Table 2 2. Length two: (TWENTY, MAR), (MAR, SJ), (TWENTY, SJ) in Table 1, (20, SCREW), (SCREW, BRASS), (20, BRASS), Table 2 However, they have non-isomorphic interesting rules: We have assumed: Support 1 In Table 1, (TWENTY, SJ) is interesting rules; it means the business amount at San Jose is likely 20 millions. 1 However, it is isomorphic to (20, BRASS), which is not interesting at all, because 20 is referred to PIN, not BRASS. 2 In Table 2, (SCREW, BRASS) is interesting; it means the screw is most likely made from BRASS. 2 However, it is isomorphic to (MAR, SJ), which is not interesting, because MAR is referred to a supplier, not to a city. 4. Canonical Models of Isomorphic Classes Again, we take this section almost verbatim from [12]. In this section, we construct the canonical models for each isomorphic class. We have observed that (Section 2) each is a map, which naturally induces an equivalence relation. Two elements are equivalent if they are mapped to the same element. We denote this equivalence relation by. The set, which consists of all elementary granules (equivalence classes), is called the quotient set. The equivalence class of the given attribute(equivalence relation) is called the elementary granules. The intersection of elementary granules is called a granule; it is an equivalence class of the equivalence relation of intersection. The map is called the natural projection, where [v] is the granule containing. It should be clear (a standard mathematical fact) that is mapped on-to-one onto. We call the latter map naming map or interpretation; the element of could be considered as a meaningful name (to human) of the granule; to system it is merely a symbol. 4.1 Canonical Model and Granular Data Model A relation, as a map, can be factored through the natural projection and the naming map. Note is the product of and is often referred to as the interpretation. Table 3 illustrates how is factored. 1. The natural projection can be regarded as a knowledge representation of the universe into quotient sets. It is called the canonical model of. 2. The interpretation induces an isomorphism from to (both are appropriate Cartesian products). The interpretation assigns a tuple of granules to a tuple of elementary concepts (attribute values). Each can be regard as a meaningful name of, and an attribute value is a meaningful name of a granule(equivalence class). 3. is an attribute of, called a canonical attribute (an uninterpreted attribute). is called a canonical domain; a granule is a canonical attribute value [16]. is iso- Theorem Patterns of the canonical model morphic (via interpretation) to the patterns of. This is a corollary of Theorem To find all patterns of, we only need to find the patterns on (and vice versa).

4 Canonical Model Relation ( ) ( ) ( ) ( NY) ( ) ( SJ) ( ) NAME ( SJ) ( ) ( SJ) ( ) ( SJ) ( ) ( SJ) ( ) ( LA) ( ) ( LA) ( ) ( LA) Table 3. The canonical model at left-hand-side is mapped to at right- hand-side The canonical model is uniquely determined by its universe, and the family of equivalence relations. In other words, the pair determines and is determined by. From the prospect of first order logic, is a model of some rather simple kind of first order logic, where the only predicates are equivalence predicates (predicates that satisfy the reflexive, symmetric and transitive properties) [23]. 1. One can regard the canonical model as a table format of. 2. Granules of are called elementary granules. 3. A -tuple of corresponds to an intersection of elementary granules in ; the intersection is called -granule. 4. High frequency patterns of are q-granule whose cardinality is greater than the given threshold. 5. We have assumed all attribute are distinct, to see more general version, we refer to [8]. Definition The pair is called granular data model; it is a special case of granular structure [19]. Corollary The patterns of, and are isomorphic. 5 Universal Model - Capturing all Features 5.1 Derived Attributes (Features) An attribute is also called a feature, especially in AI; they have been used interchangeably. In the table-in-mind, an attribute is a representation of property, characteristic, and etc.; see e.g., [27, 28]. However, in a table-in-system, an attribute is merely a named equivalence relation on the universe. So the study of attributes (features) is reduced to that of equivalence relations. Let be the set of all equivalence relations (partitions) on. Proposition is a derived attribute of, iff is a coarsening of, where and. Proposition There is a map that respects the meet, but not the join, operations. Lee called the image,, the relation lattice and observe that [7] 1. The join in is different from that of. 2. So is a subset, but not a sublattice, of. Such an embedding is an unnatural one, but Lee focused his efforts on it. However, we will, instead, take a natural embedding Definition The smallest lattice generated by, by abuse of language, is called the (Lin s) relation lattice, denoted by L(Q). This definition will not cause confusing, since we will not use Lee s notion at all. The difference between and is that former contains all the join of distinct attributes. The pair relation lattice. is the granular data model of the (Lin s) 5.2 Lattice and Universal Model The smallest lattice, denoted by, that consists of all coarsening of is called the complete relation lattice. Main Theorem is the set of all derived attributes of the canonical model.

5 Definition The pair is the completion of and is called the universal model of. A relation can be uniquely factored into and, so can be regarded as a pair. If in addition there is a given concept hierarchy [4, 22], that is, an extension of the interpretation is defined on a subset of. This additional information is called background knowledge Data mining with this additional background knowledge is called data mining on derived attributes. Basically, it is data mining on an extended table; the extended table has granular data model. If covers the attributes that support the invisible patterns, We can mine the invisible patterns. Since is finite, in theory we can always find it. The bound is Bell number [3] of the cardinal number of the smallest partition in. 6. Associations and Generalized Associations We will illustrate the idea by examples. Two standard measures called support and confidence are used for mining associations. In this paper we will focus on support only, we call the sub-tuples associations, if they meet the support requirement; it is one form of high frequency patterns. The Corollary in Section 5.2 tell us that association can be expressed by granules. We will illustrate the idea using the canonical model in Table 3 (support The association can be expressed as granules: 1. Associations of length one: (a) TEN = (b) SJ = (c) LA = 2. Associations of length two: (a) (TEN, SJ) = TEN SJ= 3. No associations of length. Now let us examine the universal model in Table 4. The column in Table 4 is the smallest element in the complete relation lattice. So every element of is a coarsening of. In other words, every granule in is a union of some granules from the partition (by the expression a granule in we mean a granule belong to one of its partitions. In this example, the granules in are TWENTY NY, TEN SJ, TWENTY LA, THIRTY LA. Let be the cardinality of. The following expression represents the cardinality of granules in, which is a union of some granules from the partition. TWENTY NY + TEN SJ + TWENTY LA + THIRTY LA. By taking the actual value of the cardinalities of the granules, we have, We will express the solutions in vector form,. It is an integral convex set in 4- dimensional space: The boundary solutions are: 1 (0, 1, 0, 0); this solution means s cardinality by itself already meets the threshold ( ). 2 (0, 0, 1, 1); it means we need the union of two granules, TWENTY LA and THIRTY LA, to meet the threshold. In other words, we need a generalized concept that covers both the sub-tuple (TWENTY, LA)= TWENTY LA and (THIRTY, LA)= THIRTY LA. For this particular case, since LA = (TWENTY, LA) (THIRTY, LA), hence LA is the desirable generalized concept. 3 (1, 0, 0, 1); we need the union of two granules, TWENTY NY THIRTY LA, as a single generalized concept. Internal points are 4 (1, 1, 0, 0); we skip the interpretations. 5 (0, 1, 1, 0) 6 (0, 1, 0, 1) 7 (1, 1, 1, 0) 8 (1, 1, 0, 1) 9 (1, 1, 1, 1) 10 (1, 0, 1, 1) By expressing in granular form; Boundary patterns are: 1 TEN SJ = TEN = SJ 2 TWENTY LA THIRTY LA 3 TWENTY NY THIRTY LA Internal Patterns are: 4 TWENTY NY TEN SJ 5 TEN SJ TWENTY LA 6 TEN SJ THIRTY LA 7 TWENTY NY TEN SJ TWENTY LA

6 Universal Model; attribute vales are names of granules =TWENTY =NY = TWENTY NY = TEN = SJ =TEN SJ = TEN = SJ = TEN SJ = TEN =SJ =TEN SJ = TEN =SJ =TEN SJ = TEN =SJ =TEN SJ =TWENTY =LA =TWENTY LA =THIRTY =LA =THIRTY LA =THIRTY =LA = THIRTY LA Table 4. The Universal Model of, partially displayed; it should have 15 (= ) columns 8 TWENTY NY TEN SJ THIRTY LA 9 TWENTY NY TEN SJ TWENTY LA THIRTY LA 10 TWENTY NY TWENTY LA THIRTY LA To identify associations from generalized associations, we rewrite the expression in disjunction normal forms. If the re-written expression is a single clause, it is the (nongeneralized) associations. We have the following associations 1 TEN = SJ = TEN SJ 2 LA (=TWENTY LA THIRTY LA)) The experimental results will be reported soon. 7. Sampling Let be a sample (subset) of. Let be the relation lattice of the partitions on. Each partition on induces a partition on. We will use to denote the derived partitions on. So there is a natural map from to. We say V is admissible sample, if the map from to induces an isomorphism of lattices. For random sampling, the possibility of to satisfy this admissibility is not high. 8 Conclusions Data, patterns and transformations are three key ingredients for data mining. In this paper, we focus on the most simplicity case. Data is a set of bare data; it has absolutely no additional assumptions. Patterns are the most obvious ones; something happened repeatedly. The transformations are the most conservative and reliable one, namely, mathematical deductions. The results of this exploration are surprisingly rich: 1. Associations are the properties of the isomorphic class, in other words, isomorphic relations have isomorphic associations. 2. The probability theory based on item counting is a property of isomorphic class. Hence can not be used to characterize the interesting-ness, since the latter one is properties of individual relation 3. All possible attributes (features) can be enumerated 4. Generalized associations can be found by solving integral linear inequalities 5. A sample of the universe is called (complete) admissible sample if the (complete) relation lattice of is isomorphic to that of. Random sample may not be admissible. Some items in the above seem indicate that relations with some additional semantics need to be explored; some initial results have been reported, more work are needed [24, 16, 18, 21, 22, 11]. References [1] R. Agrawal, T. Imielinski, and A. Swami, Mining Association Rules Between Sets of Items in Large Databases, in Proceeding of ACM-SIGMOD international Conference on Management of Data, pp , Washington, DC, June, 1993 [2] G. Birkhoff and S. MacLane, A Survey of Modern Algebra, Macmillan, 1977 [3] Richard A. Brualdi, Introductory Combinatorics, Prentice Hall, 1992.

7 [4] Y.D. Cai, N. Cercone, and J. Han. Attribute-oriented induction in relational databases. In Knowledge Discovery in Databases, pages AAAI/MIT Press, Cambridge, MA, [5] C. J. Date, C. DATE, An Introduction to Database Systems, 7th ed., Addison- Wesley, [6] A. Barr and E.A. Feigenbaum, The handbook of Artificial Intelligence, William Kaufmann 1981 [7] T. T. Lee, Algebraic Theory of Relational Databases, The Bell System Technical Journal Vol 62, No 10, December, 1983, pp [8] T. Y. Lin, Database Mining on Derived Attributes, to appear in the Spring- Verlag Lecture Notes on AI, [9] T. Y. Lin, Issues in Data Mining, in:the Proceeding of 26th IEEE Internaational Conference on Computer Software and Applications, Oxford, UK, Aug 26-29, [10] T. Y. Lin Feature Completion, Communication of IICM (Institute of Information and Computing Machinery, Taiwan) Vol 5, No. 2, May 2002, pp (the proceeding for the workshop Toward the Foundation on Data Mining in PAKDD2002, May 6, [11] Ng, R., Lakshmanan, L.V.S., Han, J. and Pang, A. Exploratory mining and pruning optimizations of constrained associations rules, Proceedings of 1998 ACM- SIGMOD Conference on Management of Data, 13-24, [12] T. Y. Lin, Attribute (Feature) Completion The Theory of Attributes from Data Mining Prospect, in: Proceeding of IEEE international Conference on Data Mining, Maebashi, Japan, Dec 9-12, 2002 [13] T. Y. Lin The Lattice Structure of Database and Mining Multiple Level Rules. Presented in COMPSAC 2001, Chicago, Oct 8-12, 2001; the exact copy appear Feature Transformations and Structure of Attributes. In: Data Mining and Knowledge Discovery: Theory, Tools, and Technology IV, B. Dasarathy (ed), Proceeding of SPIE Vol 4730, Orlando,Fl, April 1-5, 2002 [14] T. Y. Lin and J. Tremba Attribute Transformations for Data Mining II: Applications to Economic and Stock Market Data, International Journal of Intelligent Systems, to appear [15] T. Y. Lin, Association Rules in Semantically Rich Relations: Granular Computing Approach JSAI International Workshop on Rough Set Theory and Granular Computing May 20-25, The Post Proceeding is in Lecture note in AI 2253, Springer-Verlag, 2001, pp [16] T. Y. Lin, Data Mining and Machine Oriented Modeling: A Granular Computing Approach, Journal of Applied Intelligence, Kluwer, Vol. 13, No 2, September/October,2000, pp [17] T. Y. Lin, Attribute Transformations on Numerical Databases, Lecture Notes in Artificial Intelligence 1805, Terano, Liu, Chen (eds), PAKDD2000, Kyoto, Japan, April 18-20, 2000, [18] T. Y. Lin, Data Mining: Granular Computing Approach. In: Methodologies for Knowledge Discovery and Data Mining, Lecture Notes in Artificial Intelligence 1574, Third Pacific-Asia Conference, Beijing, April 26-28, 1999, [19] T. Y. Lin, Granular Computing on Binary Relations I: Data Mining and Neighborhood Systems. In: Rough Sets In Knowledge Discovery, A. Skoworn and L. Polkowski (eds), Springer-Verlag, 1998, [20] T. Y. Lin Discovering Patterns in Numerical Sequences Using Rough set Theory, In: Proceeding of the Third World Multi-conferences on Systemics, Cybernatics, and Informatics, Vol 5, Computer Science and Engineering, Orlando, Florida, July 31-Aug 4, 1999 [21] T. Y. Lin, N. Zhong, J. Duong, S. Ohsuga, Frameworks for Mining Binary Relations in Data. In: Rough sets and Current Trends in Computing, Lecture Notes on Artificial Intelligence 1424, A. Skoworn and L. Polkowski (eds), Springer-Verlag, 1998, [22], T. Y. Lin and M. Hadjimichael, Non-Classificatory Generalization in Data Mining, in Proceedings of the 4th Workshop on Rough Sets, Fuzzy Sets, and Machine Discovery, November 6-8, Tokyo, Japan, 1996, [23] T.Y. Lin, Eric Louie, Modeling the Real World for Data Mining: Granular Computing Approach Joint 9th IFSA World Congress and 20th NAFIPS Conference, July 25-28, Vancouver, Canada, 2001 [24] E. Louie,T. Y. Lin, Semantics Oriented Association Rules, In: 2002 World Congress of Computational Intelligence, Honolulu, Hawaii, May 12-17, 2002, (paper # 5702) [25] E. Louie and T. Y. Lin, Finding Association Rules using Fast Bit Computation: Machine-Oriented Modeling, in: Foundations of Intelligent Systems, Z. Ras and S. Ohsuga (eds), Lecture Notes in Artificial Intelligence

8 1932, Springer-Verlag, 2000, pp (ISMIS00, Charlotte, NC, Oct 11-14, 2000) [26] Hiroshi Motoda and Huan Liu Feature Selection, Extraction and Construction, Communication of IICM (Institute of Information and Computing Machinery, Taiwan) Vol 5, No. 2, May 2002, pp (proceeding for the workshop Toward the Foundation on Data Mining in PAKDD2002, May 6, [27] H. Liu and H. Motoda, Feature Transformation and Subset Selection, IEEE Intelligent Systems, Vol. 13, No. 2, March/April, pp (1998) [28] H. Liu and H. Motoda (eds), Feature Extraction, Construction and Selection - A Data Mining Perspective, Kluwer Academic Publishers (1998). [29] Z. Pawlak, Rough sets. Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, 1991 [30] Z. Pawlak, Rough sets. International Journal of Information and Computer Science 11, 1982, pp [31] R. Ng, L. V. S. Lakshmanan, J. Han and A. Pang, Exploratory Mining and Pruning Optimizations of Constrained Associations Rules, Proc. of 1998 ACM- SIGMOD Conf. on Management of Data, Seattle, Washington, June 1998, pp

Attribute (Feature) Completion The Theory of Attributes from Data Mining Prospect

Attribute (Feature) Completion The Theory of Attributes from Data Mining Prospect Tsay Young ( T. Y. ) Lin Department of Computer Science San Jose State University San Jose, CA 95192, USA tylin@cs.sjsu.edu