Attribute (Feature) Completion The Theory of Attributes from Data Mining Prospect

Size: px

Start display at page:

Download "Attribute (Feature) Completion The Theory of Attributes from Data Mining Prospect"

Cora Cobb
5 years ago
Views:

1 Attribute (Feature) Completion The Theory of Attributes from Data Mining Prospect Tsay Young ( T. Y. ) Lin Department of Computer Science San Jose State University San Jose, CA 95192, USA tylin@cs.sjsu.edu Abstract A correct selection of attributes (features) is vital in data mining. As a first step, this paper constructs all possible attributes of a given relation. The results are based on the observations that each relation is isomorphic to a unique abstract relation, called canonical model. The complete set of attributes of the canonical model is, then, constructed. Any attribute of a relation can be interpreted (via isomorphism) from such a complete set. Keywords: attributes, feature, data mining, granular, data model 1. Introduction Traditional data mining algorithms search for patterns only in the given set of attributes. Unfortunately, in a typical database environment, the attributes are selected primarily for record keepings, not for the understanding of real world. Hence, it is highly possible that there are no visible patterns in the given set of attributes; see Section 2. The fundamental question is: Is there a suitable set of attributes so that The invisible patterns can be mined? Fortunately, the answer from this paper is yes. For this purpose, we examine the fundamental issues, such as what is the raw data, target patterns, and Build a mathematical model that captures exactly what data says Based on such a model, we are able to develop a theory of attributes, and Construct the complete set of all attributes of a given relation This is the main result of this paper. The paper is roughly organized into 3 parts. First is the motivational example (Section 2), followed by two sections of fundamental formulations(section 3, refgdm), and conclude with the theory of attributes based on some foundational investigation of data mining 2 Motivation - Invisible Patterns Let us consider a 3-column numerical relation, Table 1. The first column is the independent variable, namely the universe of the entities (directed segments). It has three attributes, which consists of the beginning points, and polar coordinates, the Length and the Degree. This table has One association rule of length 2, that is, ¾ ¼µ. By switching to Cartesian coordinate system, the table is transformed to Table 2; Interestingly, The only association rule disappears. A moment of reflection, one realize that since the association rule is a real world phenomenon (a geometric fact), the same information should be still carried in Table 2. The question How can this invisible association rule be mined? It is obvious that we need the derived attribute Ä Ò Ø, which is a function of ÀÓÖ ÞÓÒØ Ð and Î ÖØ Ð. This phenomenon prompts us to consider the foundation of data mining, in particular The foundation of attributes (features) from data mining prospect. We would like to note that attribute (feature) theory from the prospect of database processing is very different this one.

2 Segment# Ò ÔÓ ÒØ Length Direction Ë ¼ ½ ¼ ¼ Ë ½ ¾ ¼ ¼ Ë ¾ ¾ ¼ ¼ Ë ¾ ¼ ½¾¼ ¾ ¼ ½ ¾ ¼ ½ ¼ ¾ ¼ ½ ¼ ¾ ¼ ¾½¼ ¾ ¼ ¾¾ ¾ ¼ ¾ ¼ Table 1. Ten directed segments in polar coordinates Segment# Begin Point Horizontal Vertical Ë ¼ ½ ¼ Ë ½ ¾ Ô Ë ¾ ¼ ¾ Ë Ô½ Ô ¾ Ô Ô ¾ ½ Ô ¾ ¼ Ô Ô ½ Ô ¾ ¾ ½ Table 2. Ten directed segments in (X,Y)- coordinates 3 Basic Structures - the Data and Patterns 3.1 Raw Data - the Relations The central objects of the study should be bag relations (we allow repeated tuples). However, without losing the essential idea, for simplicity, we focus on (set theoretical) relations, or more emphatically the relation instances. We will also assume All attributes are distinct (non-isomorphic); see Section 4. Let Î be the universe. Let ½ ¾ Ò be a set of attributes, and their attribute domains be ½ ¾ Ò. Each, often denoted by ÓÑ µ, is a set of elementary concepts (attribute values). Technically, they are the so-called semantics primitives in AI [6] or undefined primitive in mathematics. In other words, the semantics of these symbols are not part of the formal system. The main raw data is a relation, which is a set (not a bag) of tuples. We will view a relation as a knowledge representation Î ÓÑ µ, where ÓÑ µ is the Cartesian product of the ÓÑ µ s. It is clear, we can view each as a map Î ÓÑ µ (single column representation). Then, the relation, Ã, is a join of attribute maps; see [15, 17]. If one uses the information table (see below), the join is actual join of the relational algebra. A map or function naturally induces a partition on its domain (the collection of all inverse images of the map), so each induces a partition on Î (and hence an equivalence relation); we use É to denote both. We let É be the collection of É s. In traditional database theory, the image of the map Ã (knowledge representation) is called the relation. The independent variable Î plays no explicit role. However, in data mining, it is more convenient to have independent variables in the formulation. So in this paper, we may also use the graph Ú Ã Úµµ, called the information table. Throughout the whole paper Ã may mean the map, the image, or the graph, by abuse of notation. Since Ã is determined by on Î, we may use Î µ for Ã. 3.2 Target - High Frequency Patterns In association rule mining, two measures, called the support and confidence, are important. In this paper, we will be concerned the high frequency patterns, not necessarily in the form of rules. So only with the support will be considered. Association rule mining is originated from on the market basket data [1]. However, in many software systems, the data mining tools are added to general DBMS. So we will be interested in data mining on general relations. For definitive, we have the following translations: an item is an attribute value, a q-itemset is a subtuple of length q, a large q-itemset is a high frequency q-pattern. In other words, A subtuple of length q is a high frequency q-patterns, or simply pattern, if its occurrences are greater than or equal to a given threshold. 4 What are we mining? - Isomorphic class This paper focuses on database mining, more specifically, extracting high frequency patterns from a given relation (freeze at one database). In this section, we offer somewhat a surprised observation that the target patterns, such as association rules are the common patterns of whole isomorphic class, NOT an individual relation alone.

3 4.1 Isomorphic Relations and Patterns Attributes and are isomorphic iff there is a oneto-one and onto map, ÓÑ µ ÓÑ µ such that Úµ Úµµ Ú ¾ Î. The map is called an isomorphism. Intuitively, two attributes (columns) are isomorphic iff one column turns into another one by properly renaming its attribute values. Let Ã Î µ and À Î µ be two information tables, where ½ ¾ Ò and ½ ¾ Ñ. Then, Ã and À are said to be isomorphic if every is isomorphic to some, and vice versa. By our assumption (all attributes are distinct), Ã and À have the same degree (number of attributes), that is, Ò Ñ; See more general version in Section 11. The following theorem should be obvious. Isomorphic relations have isomorphic pat- Theorem 4.1. terns. The impacts of this simple theorem are rather far reaching. It essentially declares that patterns are syntactic in nature. They are patterns of the whole isomorphic class, yet many of isomorphic relations may have very different semantics; see Section??. The interesting-ness (of association rules) may not be captured by the mere counting of the items (and hence the probability theory based on it). Of course, something like unexpected-ness (which is probabilistic in nature) can be captured; the research on this topic will be reported in future. 5. Modeling what data says Canonical Models In classical data model, the (intension) functional dependency can never be expressed by the raw data, however, the data does express the extension functional dependency. So it is important to examine very fundamental question, What is the raw data (a given relation)really saying? In this section, we construct the canonical models for each isomorphic class. In other words, the canonical model express exactly What raw data says about patterns. Earlier, we have called them machine oriented models [15, 16], and have shown that it is very fast in computing the high frequency patterns [24]. 5.1 Attributes and Equivalence Relations We have observed that (Section 3.1) each induces an equivalence relation É on Î. The set Î É, which consists of all granules (equivalence classes), is called the quotient set. The map È Î Î É Ú Ú is the natural projection, where [v] is the granule containing Ú. Next, we state an observation ([17], pp. 25): Proposition 5.1. An attribute, as a map, can be factored as È ÆÆ Å, where the naming map, Æ Å Î É ÓÑ µ Ú Æ Å Ú µ, is referred to as the interpretation. 1. The interpretation induces an isomorphism from Î É to ; The interpretation assigns each granule an elementary concept (attribute value); we can regard it as a meaningful name of the granule. is a meaningful name of É ; 2. The natural projection È is a map from Î to Î É. Formally, it can be regarded as an attribute. It is a single column representation of Î into the quotient set. 3. The natural projection and the induced partition determine each other, we may use É to denote the partition, the equivalence relation, including the natural projection. 5.2 Canonical Model and Granular Data Model A relation Ã, as a map, can be factored through the natural projection Ã Î Î É ½ Î É Ò and the naming map Æ Å Î É ½ Î É Ò ½ Ò. Note ÆÅ is the product of Æ Å and is often referred to as the interpretation. Table 3 illustrates how Ã is factored. 1. The natural projection Ã can be regarded as a knowledge representation of the universe Î into quotient sets. It is called the canonical model of Ã. 2. The interpretation induces an isomorphism from Î É to (both are appropriate Cartesian products). The interpretation assigns a tuple of granules to a tuple of elementary concepts (attribute values). Each can be regard as a meaningful name of É, and an attribute value is a meaningful name of a granule(equivalence class). 3. É is an attribute of Ã, called a canonical attribute (an uninterpreted attribute). ÓÑ É µ Î É is called a canonical domain; a granule is a canonical attribute value [15]. Theorem Patterns of the canonical model Ã is isomorphic (via interpretation) to the patterns of Ã. This is a corollary of Theorem 4.1. To find all patterns of Ã, we only need to find the patterns on Ã (and vice versa).

4 Canonical Model Ã Relation Ã Î (É ¼ É ¾ É ) (Ë ËÌ Ì ÍË ÁÌ ) Ú ½ ( Ú ½ Ú ½ Ú Ú ½ ) (Ë ½ Ì Ï ÆÌ NY) Ú ¾ ( Ú ¾ Ú ¾ Ú Ú Ú Ú Ú ¾ Ú Ú Ú Ú ) (Ë ¾ Ì Æ SJ) Ú ( Ú Ú ¾ Ú Ú Ú Ú Ú ¾ Ú Ú Ú Ú ) NAME (Ë Ì Æ SJ) Ú ( Ú Ú ¾ Ú Ú Ú Ú Ú ¾ Ú Ú Ú Ú ) (Ë Ì Æ SJ) Ú ( Ú Ú ¾ Ú Ú Ú Ú Ú ¾ Ú Ú Ú Ú ) (Ë Ì Æ SJ) Ú ( Ú Ú ¾ Ú Ú Ú Ú Ú ¾ Ú Ú Ú Ú ) (Ë Ì Æ SJ) Ú ( Ú Ú ½ Ú Ú Ú Ú ) (Ë Ì Ï ÆÌ LA) Ú ( Ú Ú Ú Ú Ú Ú ) (Ë Ì ÀÁÊÌ LA) Ú ( Ú Ú Ú Ú Ú Ú ) (Ë Ì ÀÁÊÌ LA) Table 3. The canonical model Ã at left-hand-side is mapped to Ã at right-hand-side The canonical model Ã is uniquely determined by its universe Î, and the family É of equivalence relations. In other words, the pair Î Éµ determines and is determined by Ã. From the pospect of first order logic, Î Éµ is a model of some rather simple kind of first order logic, where the only predicates are equivalence predicates (predicates that satisfy the reflexive, symmetric and transitive properties) [22]. 1. One can regard the canonical model Ã as a table format of Î Éµ. 2. We will call a granule of those original É an elementary granule. 3. A q-tuple of Ã corresponds to an intersection, called a q-granule, of q elementary granules in Î Éµ. 4. High frequency patterns of Î Éµ are q-granule whose cardinality is greater than the given threshold. 5. We have assume all attribute are distinct, to see more general version, we refer to [8]. Definition The pair Î Éµ is called granular data model; it is a special case of granular structure [18]. Î Éµ is considered by both Pawlak and Lee. In his book, Pawlak call it knowledge base; implicitly Pawlak assumed all attributes are non-isomorphic [28], as we have done here. Since Knowledge base often has different meaning, we will not use it. Tony T. Lee considered the general case see Section 11. Corollary The patterns of Î Éµ Ã, and Ã are isomorphic. 6 Theory of derived Attributes (Features) An attribute is also called a feature, especially in AI; they have been used interchangeably. In the classical data model, an attribute is a representation of property, characteristic, and etc.; see e.g., [26, 27]. In other words, it represents a human perception about the data (intensional view [5]). However, we should note that in a given relation instance (extensional view [5]), the data itself cannot fully reflect such a human perception. As we have pointed out, the existence of an (extension) function dependency in a given table cannot imposes an (intension) function dependency on the data model. So in data mining, we should note that attributes are defined by the given instance of data (extension view), not what human perceived. Many very distinct attributes in intensional view (as human perceives) are actually isomorphic from the extensional view (as data says); see examples in [9, 8]. 6.1 Attribute Transformations and Function Dependency We will examine how a new attribute, that is transformed from the given ones, is related to the given them. Let be a subset of and let be a function defined on ÓÑ µ ÓÑ ½ µ ÓÑ µ. We collect all function values in a set. Using mapping notation, we have ÓÑ µ ; it is called an attribute transformation. Since attributes can be regarded as maps, we have: Æ Î ÓÑ µ The map Æ Î is a new attribute. We write Æ ½ ½ µ and ÓÑ µ. Y is called a derived attribute, and an attribute transformation. By joining Ã and, we have a new relation Ã ¼ : (joining in the sense of relational algebra on the information tables) Ã ¼ Ã ½ Ò Î ÓÑ ½ µ ÓÑ Ò µ ÓÑ µ Next we see how a new derived attribute is related the given attributes in the new relation Ã ¼.

5 Proposition 6.1. is a derived attribute of iff is extension functionally depended (EFD) on. By definition, the occurrence of an (extension) functionally dependency (EFD) means there is an attribute transformation ÓÑ ½ µ ÓÑ µ ÓÑ µ such that Úµµ Úµ Ú ¾ Î. By definition, ½ ½ µ; this completes our arguments. Table 4 illustrates the notion of EFD and attribute transformations. ½ ¾ ½ ½ ¾ ½ ½ Ý ½ ½ ½ ¾ ½ ½ µ ½ ¾ ¾ ¾ ¾ Ý ¾ ½ ¾ ¾ ¾ ¾ µ ½ ¾ Ý ½ ¾ µ ½ ¾ Ý ½ ¾ µ Table 4. An Attribute Transformation in Ã 6.2 Feature Extractions and Constructions Feature extractions and constructions in intensional view are much harder to describe formally since features represent human view, and their mathematical relations have to be set up for all possible instances consistently. We will take extensional view, the view from data s prospect. Let us examine some assertions (in traditional view) from [25]: All new constructed features are defined in terms of original features,.. and Feature extraction is a process that extracts a set of new features from the original features through some functional mapping. By taking the data view, it is easy to see both assertions imply that the new constructed feature is a function (functional mapping) of old features. Note that Let ½ Ò be the attributes before the extractions or constructions, and Ò ½ Ò Ñ be the new attributes. From the analysis above, the new attributes (features) are functions of old ones, we have ÓÑ ½ µ ÓÑ Ò µµ ÓÑ Ò µ. From the analysis on Section 6.1, Ò is a derived attribute of. We summarize the analysis in: Proposition 6.2. The features constructed from classical feature extractions and constructions are derived attributes in extension view. 6.3 Derived Attributes in the Canonical Model From Proposition 5.2.1, Ã is isomorphic to the canonical model Ã. So there is a corresponding Table 4 in the canonical model. In other words, there is a map, Î ½ Î Î ½ µ Î This map between quotient sets implies a refinements in the partitions; that is, is a coarsening of ½. So we have the following: Proposition 6.3. is a derived attribute of, iff is a coarsening of ½, where ¾ and 7 Granular Data Model of Relation Lattice In this section, we modify Lee s work: At the beginning of Section 3.2, we have recalled the observation of [29, 7] that any subset of induces a partition on Î ; the partition induced by is denoted by É. The power set ¾ is Boolean algebra and hence, a lattice, where meet and join operations are the union and intersection of the respectively. Let Î µ be the set of all partitions on Î (equivalence relations); Î µ forms a lattice, where meet is the intersection of equivalence relations and join is the union, where the union, denoted by É, is the smallest coarsening of all É ½ ¾ Î µ is called the partition lattice. Recall the convention, all attributes are non-isomorphic attributes. Hence all equivalence relations are distinct; see Section 3.1. Next proposition is due to Lee: Proposition 7.1. There is a map ¾ Î µ that respects the meet, but not the join, operations. Lee called the image, ÁÑ, the relation lattice and observe that 1. The join in ÁÑ is different from that of Î µ. 2. So ÁÑ is a subset, but not a sublattice, of Î µ. Such an embedding is an unnatural one, but Lee focused his efforts on it; he established many connections between database concepts and lattice theory. However, we will, instead, take a natural embedding Definition 7.2. The smallest lattice generated by ÁÑ, by abuse of language, is called the (Lin s) relation lattice, denoted by L(Q). This definition will not cause confusing, since we will not use Lee s notion at all. The difference between Ä Éµ and ÁÑ is that former contains all the join of distinct attributes. The pair Î Ä Éµµ is the granular data model of the (Lin s) relation lattice. It should be clear

6 Definition 7.3. The high frequency Õ-patterns of Î Éµ Õ is the high frequency patterns of length one in Î ÁÑ µ, and is a subset of the high frequency patterns of length one in Î Ä Éµµ. 8 Universal Model - Capture the invisibles The smallest lattice, denoted by Ä Éµ, that consists of all coarsening of Ä Éµ is called the complete relation lattice. Main Theorem 8.1. Ä Éµ is the set of all derived attributes of the canonical model. Proof: (1) Let È ¾ Ä Éµ, that is, P is coarser than some É ½ É. We will show it is a derived attribute. The coarsening implies a map on their respective quotient sets, Î É ½ Î É ¾ Î É Î É ½ É ¾ É µ Î È In terms of relational notations, that is ÓÑ É ½ µ ÓÑ É µ ÓÑ È µ Using the notations of functional dependency, we have (equivalence relations are attributes of the canonical model) È É ½ É ¾ É µ So g, as a map between attributes, is an attribute transformation. Hence P is a derived attribute. (2) Let P be a derived attribute of Ã. That is, there is an attribute transformation ÓÑ É ½ µ ÓÑ É µµ ÓÑ È µ As Ã is the canonical model, it can be re-expressed in terms of quotient sets, Î É ½ Î É Î È Observe that Î É ½ Î É Î É ½ É µ, so the existence of implies that È is coarser than É ½ É. By definition P is an element in Ä Éµ. Q.E.D Note that Ä Éµ is finite, since Î µ is finite. The pair Î Ä Éµµ is a granular data model, and its relation format É Í Ã Î È¾Ä Éµ Î È. is a knowledge representation. Its attributes are all the partitions in Ä Éµ, which contains all possible derived attributes of Ã Î Éµ, by the theorem. We will not distinguish betweenthe granular data model and its realtiion format: Definition 8.2. The pair Í Ã Î Ä Éµµ is the completion of Ã Î Éµ and is called the universal model of Ã. 9 Isomorphic Relations Î K (Ë Business Birth CITY) Amount (in m.) Day Ú ½ (Ë ½ TWENTY MAR NY Ú ¾ (Ë ¾ TEN MAR SJ Ú (Ë TEN FEB NY Ú (Ë TEN FEB LA Ú (Ë TWENTY MAR SJ Ú (Ë TWENTY MAR SJ Ú (Ë TWENTY APR SJ Ú (Ë THIRTY JAN LA Ú (Ë THIRTY JAN LA Table 5. An Information Table K Î K (Ë Weight Part Material Name Ú ½ (È ½ 20 SCREW STEEL Ú ¾ (È ¾ 10 SCREW BRASS Ú (È 10 NAIL STEEL Ú (È 10 NAIL ALLOY Ú (È 20 SCREW BRASS Ú (È 20 SCREW BRASS Ú (È 20 PIN BRASS Ú (È 30 HAMMER ALLOY Ú (È 30 HAMMER ALLOY Table 6. An Information Table K The two relations, Table 5, 6, are isomorphic, but their semantics are completely different, one table is about part, the other is about suppliers. These two relations have Isomorphic association rules; 1. Length one: TEN, TWENTY, March, SJ, LA in Table 5 and 10, 20, Screw, Brass, Alloy in Table 6 2. Length two: (TWENTY, MAR), (Mar, SJ), (TWENTY, SJ)in one Table 5, (20, Screw), (screw, Brass),(20, Brass), Table 6 However, they have non-isomorphic interesting rules: 1. Table 5: (TWENTY, SJ), that is, the business amount at San Jose is likely 20 millions; it is isomorphic to (20, Brass), which is not interesting. 2. Table 6: (SCREW, BRASS), that is, the screw is most likely made from Brass; it is isomorphic to (Mar, SJ), which is not interesting.

7 10 Conclusions In this paper, we successfully enumerate all possible derived attributes of a given relation. The results seem striking; however, they are of theoretical nature. Even though Ä Éµ contains a complete list of all attributes, the number is insurmountably large; it is bounded by the Bell number Bn, where n is the cardinality of the smallest partiton in Ä Éµ. The exhaustive search of association rules on all those attributes are beyond the current reach. However, by combining the classical techniques of feature selections, we may reach new applications. Classical feature selection has focused on the original set of attributes, now with our new result, it seems suggest that the domain of feature selection should be extended to this complete universal set of derived attributes. We have tentatively called such a selection background knowledge. We will report such research in near future. Next, we would like to remark that the simple observation that isomorphic relations have isomorphic patterns has a strong impact on the meaning of high frequency patterns. Isomorphism is a syntactic notion; it is highly probable that two isomorphic relations have totally different semantics. The patterns mined for one particular application may contain patterns for other applications. So relation with some additional structures need to be explored [23, 14, 15, 17, 20, 21, 11]. In particular, it implies that interesting-ness of association tuples may need extra semantics; the mere probability theory based on counting items may not be able to identify them; we only give a simple example (Section 9) more research will be reported in near future. 11 Elementary Operations In this section, we do not assume the attributes are distinct. The isomorphism of relations is reflexive, symmetric, and transitive, so it classifies all relations into equivalence classes; we call them isomorphic classes. Definition À is a simplified information table of Ã, if À is isomorphic to Ã and only has non-isomorphic attributes. Theorem Let À be the simplified information table of Ã. Then the patterns (large itemsets) of Ã can be obtained from those of À by elementary operations that will be defined below. To prove the Theorem, we will set up a lemma, in which we assume there are two isomorphic attributes and ¼ in Ã, that is, degree Ã - degree À =1. Let ÓÑ µ ÓÑ ¼ µ be the isomorphism and ¼ µ. Let À be the new table in which ¼ has been removed. Lemma The patterns of K can be generated from those of H by elementary operations, namely, 1. If b is a large itemset in H, then b and (b, b ) are large in K. 2. If (a.., b, c...) is a large itemset in H, then (a.., b, c...) and (a.., b, b, c,...) are large in K. 3. These are the only large itemsets in K. The validity of this lemma is rather straightforward; and it provides the critical inductive step for Theorem; we ill skip the proof. References [1] R. Agrawal, T. Imielinski, and A. Swami, Mining Association Rules Between Sets of Items in Large Databases, in Proceeding of ACM-SIGMOD international Conference on Management of Data, pp , Washington, DC, June, 1993 [2] G. Birkhoff and S. MacLane, A Survey of Modern Algebra, Macmillan, 1977 [3] Richard A. Brualdi, Introductory Combinatorics, Prentice Hall, [4] Y.D. Cai, N. Cercone, and J. Han. Attribute-oriented induction in relational databases. In Knowledge Discovery in Databases, pages AAAI/MIT Press, Cambridge, MA, [5] C. J. Date, C. DATE, An Introduction to Database Systems, 7th ed., Addison-Wesley, [6] A. Barr and E.A. Feigenbaum, The handbook of Artificial Intelligence, Willam Kaufmann 1981 [7] T. T. Lee, Algebraic Theory of Relational Databases, The Bell System Technical Journal Vol 62, No 10, December, 1983, pp [8] T. Y. Lin, Database Mining on Derived Attributes, to appear in the Spring-Verlag Lecture Notes on AI, [9] T. Y. Lin, Issues in Data Mining, in:the Proceeding of 26th IEEE Internaational Conference on Computer Software and Applications, Oxford, UK, Aug 26-29, [10] T. Y. Lin Feature Completion, Communication of IICM (Institute of Information and Computing Machinery, Taiwan) Vol 5, No. 2, May 2002, pp (the proceeding for the workshop Toward the Foundation on Data Mining in PAKDD2002, May 6, 2002.

8 [11] Ng, R., Lakshmanan, L.V.S., Han, J. and Pang, A. Exploratory mining and pruning optimizations of constrained associations rules, Proceedings of 1998 ACM- SIGMOD Conference on Management of Data, 13-24, [12] T. Y. Lin The Lattice Structure of Database and Mining Multiple Level Rules. Presented in COMPSAC 2001, Chicago, Oct 8-12, 2001; the exact copy appear Feature Transformations and Structure of Attributes. In: Data Mining and Knowledge Discovery: Theory, Tools, and Technology IV, B. Dasarathy (ed), Proceeding of SPIE Vol 4730, Orlando,Fl, April 1-5, 2002 [13] T. Y. Lin and J. Tremba Attribute Transformations for Data Mining II: Applications to Economic and Stock Market Data, International Journal of Intelligent Systems, to appear [14] T. Y. Lin, Association Rules in Semantically Rich Relations: Granular Computing Approach JSAI International Workshop on Rough Set Theory and Granular Computing May 20-25, The Post Proceeding is in Lecture note in AI 2253, Springer-Verlag, 2001, pp [15] T. Y. Lin, Data Mining and Machine Oriented Modeling: A Granular Computing Approach, Journal of Applied Intelligence, Kluwer, Vol. 13, No 2, September/October,2000, pp [16] T. Y. Lin, Attribute Transformations on Numerical Databases, Lecture Notes in Artificial Intelligence 1805, Terano, Liu, Chen (eds), PAKDD2000, Kyoto, Japan, April 18-20, 2000, [17] T. Y. Lin, Data Mining: Granular Computing Approach. In: Methodologies for Knowledge Discovery and Data Mining, Lecture Notes in Artificial Intelligence 1574, Third Pacific-Asia Conference, Beijing, April 26-28, 1999, [18] T. Y. Lin, Granular Computing on Binary Relations I: Data Mining and Neighborhood Systems. In: Rough Sets In Knowledge Discovery, A. Skoworn and L. Polkowski (eds), Springer-Verlag, 1998, [19] T. Y. Lin Discovering Patterns in Numerical Sequences Using Rough set Theory, In: Proceeding of the Third World Multi-conferences on Systemics, Cybernatics, and Informatics, Vol 5, Computer Science and Engineering, Orlando, Florida, July 31-Aug 4, 1999 sets and Current Trends in Computing, Lecture Notes on Artificial Intelligence 1424, A. Skoworn and L. Polkowski (eds), Springer-Verlag, 1998, [21], T. Y. Lin and M. Hadjimichael, Non-Classificatory Generalization in Data Mining, in Proceedings of the 4th Workshop on Rough Sets, Fuzzy Sets, and Machine Discovery, November 6-8, Tokyo, Japan, 1996, [22] T.Y. Lin, Eric Louie, Modeling the Real World for Data Mining: Granular Computing Approach Joint 9th IFSA World Congress and 20th NAFIPS Conference, July 25-28, Vancouver, Canada, 2001 [23] E. Louie,T. Y. Lin, Semantics Oriented Association Rules, In: 2002 World Congress of Computational Intelligence, Honolulu, Hawaii, May 12-17, 2002, (paper # 5702) [24] E. Louie and T. Y. Lin, Finding Association Rules using Fast Bit Computation: Machine-Oriented Modeling, in: Foundations of Intelligent Systems, Z. Ras and S. Ohsuga (eds), Lecture Notes in Artificial Intelligence 1932, Springer-Verlag, 2000, pp (ISMIS00, Charlotte, NC, Oct 11-14, 2000) [25] Hiroshi Motoda and Huan Liu Feature Selection, Extraction and Construction, Communication of IICM (Institute of Information and Computing Machinery, Taiwan) Vol 5, No. 2, May 2002, pp (proceeding for the workshop Toward the Foundation on Data Mining in PAKDD2002, May 6, [26] H. Liu and H. Motoda, Feature Transformation and Subset Selection, IEEE Intelligent Systems, Vol. 13, No. 2, March/April, pp (1998) [27] H. Liu and H. Motoda (eds), Feature Extraction, Construction and Selection - A Data Mining Perspective, Kluwer Academic Publishers (1998). [28] Z. Pawlak, Rough sets. Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, 1991 [29] Z. Pawlak, Rough sets. International Journal of Information and Computer Science 11, 1982, pp [30] R. Ng, L. V. S. Lakshmanan, J. Han and A. Pang, Exploratory Mining and Pruning Optimizations of Constrained Associations Rules, Proc. of 1998 ACM- SIGMOD Conf. on Management of Data, Seattle, Washington, June 1998, pp [20] T. Y. Lin, N. Zhong, J. Duong, S. Ohsuga, Frameworks for Mining Binary Relations in Data. In: Rough

Mathematical Foundation of Association Rules - Mining Associations by Solving Integral Linear Inequalities

Mathematical Foundation of Association Rules - Mining Associations by Solving Integral Linear Inequalities Tsau Young ( T. Y. ) Lin Department of Computer Science San Jose State University San Jose, CA