size, runs an existing induction algorithm on the rst subset to obtain a rst set of rules, and then processes each of the remaining data subsets at a

Similar documents
Algebraic Properties of CSP Model Operators? Y.C. Law and J.H.M. Lee. The Chinese University of Hong Kong.

APPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES

Gen := 0. Create Initial Random Population. Termination Criterion Satisfied? Yes. Evaluate fitness of each individual in population.

[Ch 6] Set Theory. 1. Basic Concepts and Definitions. 400 lecture note #4. 1) Basics

Fuzzy Partitioning with FID3.1

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA

Computer Science Department, Brigham Young University. Provo, UT U.S.A.

A Mixed Fragmentation Methodology For. Initial Distributed Database Design. Shamkant B. Navathe. Georgia Institute of Technology.

Huan Liu. Kent Ridge, Singapore Tel: (+65) ; Fax: (+65) Abstract

has to choose. Important questions are: which relations should be dened intensionally,

An Average-Case Analysis of the k-nearest Neighbor Classifier for Noisy Domains

Research on outlier intrusion detection technologybased on data mining

WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1

A Boolean Expression. Reachability Analysis or Bisimulation. Equation Solver. Boolean. equations.

Exemplar Learning in Fuzzy Decision Trees

Hyperplane Ranking in. Simple Genetic Algorithms. D. Whitley, K. Mathias, and L. Pyeatt. Department of Computer Science. Colorado State University

Solving Hard Problems Incrementally

Chordal graphs and the characteristic polynomial

Induction of Strong Feature Subsets

Department of. Computer Science. Remapping Subpartitions of. Hyperspace Using Iterative. Genetic Search. Keith Mathias and Darrell Whitley

Networks for Control. California Institute of Technology. Pasadena, CA Abstract

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Tilings of the Euclidean plane

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection

the number of states must be set in advance, i.e. the structure of the model is not t to the data, but given a priori the algorithm converges to a loc

Building Intelligent Learning Database Systems

BRACE: A Paradigm For the Discretization of Continuously Valued Data

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Summary of Course Coverage

A Simplied NP-complete MAXSAT Problem. Abstract. It is shown that the MAX2SAT problem is NP-complete even if every variable

Let v be a vertex primed by v i (s). Then the number f(v) of neighbours of v which have

Finding a winning strategy in variations of Kayles

A Parallel Evolutionary Algorithm for Discovery of Decision Rules

Telecommunication and Informatics University of North Carolina, Technical University of Gdansk Charlotte, NC 28223, USA

of Perceptron. Perceptron CPU Seconds CPU Seconds Per Trial

Discrete Mathematics Lecture 4. Harper Langston New York University

AST: Support for Algorithm Selection with a CBR Approach

Two Problems - Two Solutions: One System - ECLiPSe. Mark Wallace and Andre Veron. April 1993

2.2 Set Operations. Introduction DEFINITION 1. EXAMPLE 1 The union of the sets {1, 3, 5} and {1, 2, 3} is the set {1, 2, 3, 5}; that is, EXAMPLE 2

Richard E. Korf. June 27, Abstract. divide them into two subsets, so that the sum of the numbers in

Evaluation of Seed Selection Strategies for Vehicle to Vehicle Epidemic Information Dissemination

Synchronization Expressions: Characterization Results and. Implementation. Kai Salomaa y Sheng Yu y. Abstract

Using Decision Boundary to Analyze Classifiers

The task of inductive learning from examples is to nd an approximate definition

Propositional Calculus: Boolean Algebra and Simplification. CS 270: Mathematical Foundations of Computer Science Jeremy Johnson

An Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst

Results of an Experiment in Domain Knowledge Base Construction: A. Comparison of the Classic and Algernon Knowledge Representation.

Simplicial Cells in Arrangements of Hyperplanes

Finding Rough Set Reducts with SAT

Heap-on-Top Priority Queues. March Abstract. We introduce the heap-on-top (hot) priority queue data structure that combines the

Classifier C-Net. 2D Projected Images of 3D Objects. 2D Projected Images of 3D Objects. Model I. Model II

The Global Standard for Mobility (GSM) (see, e.g., [6], [4], [5]) yields a

CSE 215: Foundations of Computer Science Recitation Exercises Set #9 Stony Brook University. Name: ID#: Section #: Score: / 4

Kalev Kask and Rina Dechter. Department of Information and Computer Science. University of California, Irvine, CA

of m clauses, each containing the disjunction of boolean variables from a nite set V = fv 1 ; : : : ; vng of size n [8]. Each variable occurrence with

Adaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented wh

Preliminary results from an agent-based adaptation of friendship games

The element the node represents End-of-Path marker e The sons T

CSE 20 DISCRETE MATH. Fall

CS Bootcamp Boolean Logic Autumn 2015 A B A B T T T T F F F T F F F F T T T T F T F T T F F F

described with a predened list of attributes (or variables). In many applications, a xed list of

Fig. 1): The rule creation algorithm creates an initial fuzzy partitioning for each variable. This is given by a xed number of equally distributed tri

Logical Decision Rules: Teaching C4.5 to Speak Prolog

2 Keywords Backtracking Algorithms, Constraint Satisfaction Problem, Distributed Articial Intelligence, Iterative Improvement Algorithm, Multiagent Sy

Towards a Reference Framework. Gianpaolo Cugola and Carlo Ghezzi. [cugola, P.za Leonardo da Vinci 32.

Network. Department of Statistics. University of California, Berkeley. January, Abstract

Weak Dynamic Coloring of Planar Graphs

Localization in Graphs. Richardson, TX Azriel Rosenfeld. Center for Automation Research. College Park, MD

Mining of association rules is a research topic that has received much attention among the various data mining problems. Many interesting wors have be

The temporal explorer who returns to the base 1

Hybrid Algorithms for SAT. Irina Rish and Rina Dechter.

requests or displaying activities, hence they usually have soft deadlines, or no deadlines at all. Aperiodic tasks with hard deadlines are called spor

Process Allocation for Load Distribution in Fault-Tolerant. Jong Kim*, Heejo Lee*, and Sunggu Lee** *Dept. of Computer Science and Engineering

PRELIMINARY RESULTS ON REAL-TIME 3D FEATURE-BASED TRACKER 1. We present some preliminary results on a system for tracking 3D motion using

CSC Discrete Math I, Spring Sets

NP-Completeness of 3SAT, 1-IN-3SAT and MAX 2SAT

CSC 501 Semantics of Programming Languages

Role Modelling: the ASSO Perspective

detected inference channel is eliminated by redesigning the database schema [Lunt, 1989] or upgrading the paths that lead to the inference [Stickel, 1

The Encoding Complexity of Network Coding

Intersection of sets *

to automatically generate parallel code for many applications that periodically update shared data structures using commuting operations and/or manipu

An On-line Variable Length Binary. Institute for Systems Research and. Institute for Advanced Computer Studies. University of Maryland

[8] that this cannot happen on the projective plane (cf. also [2]) and the results of Robertson, Seymour, and Thomas [5] on linkless embeddings of gra

Algebra of Sets (Mathematics & Logic A)

Matching Algorithms. Proof. If a bipartite graph has a perfect matching, then it is easy to see that the right hand side is a necessary condition.

Challenges and Interesting Research Directions in Associative Classification

Concept Tree Based Clustering Visualization with Shaded Similarity Matrices

(i,j,k) North. Back (0,0,0) West (0,0,0) 01. East. Z Front. South. (a) (b)

Evolving SQL Queries for Data Mining

First Order Logic in Practice 1 First Order Logic in Practice John Harrison University of Cambridge Background: int

(a) (4 pts) Prove that if a and b are rational, then ab is rational. Since a and b are rational they can be written as the ratio of integers a 1

h=[3,2,5,7], pos=[2,1], neg=[4,4]

Eddie Schwalb, Rina Dechter. It is well known that all these tasks are NP-hard.

Appears in Proceedings of the International Joint Conference on Neural Networks (IJCNN-92), Baltimore, MD, vol. 2, pp. II II-397, June, 1992

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM

Outline. Computer Science 331. Information Hiding. What This Lecture is About. Data Structures, Abstract Data Types, and Their Implementations

Planning. Qiang Yang Shuo Bai and Guiyou Qiu. Department of Computer Science National Research Center for

Data Analytics and Boolean Algebras

Andrew Davenport and Edward Tsang. fdaveat,edwardgessex.ac.uk. mostly soluble problems and regions of overconstrained, mostly insoluble problems as

Transcription:

Multi-Layer Incremental Induction Xindong Wu and William H.W. Lo School of Computer Science and Software Ebgineering Monash University 900 Dandenong Road Melbourne, VIC 3145, Australia Email: xindong@computer.org To appear in Proceedings of the 5th Pacic Rim International Conference on Articial Intelligence, Singapore, 25-27 November 1998. Abstract. This paper describes a multi-layer incremental induction algorithm, MLII, which is linked to an existing nonincremental induction algorithm to learn incrementally from noisy data. MLII makes use of three operations: data partitioning, generalization and reduction. Generalization can either learn a set of rules from a (sub)set of examples, or rene a previous set of rules. The latter is achieved through a redescription operation called reduction: from a set of examples and a set of rules, we derive a new set of examples describing the behaviour of the rule set. New rules are extracted from these behavioral examples, and these rules can be seen as meta-rules, as they control previous rules in order to improve their predictive accuracy. Experimental results show that MLII achieves signicant improvement on the existing nonincremental algorithm HCV used for experiments in this paper, in terms of rule accuracy. 1 Introduction Existing machine learning algorithms can be generally distinguished into two categories [Langley 1996], nonincremental algorithms which process all training examples at once, and incremental algorithms which handle training examples one by one. When an example set is not a static repository of data, for example, an example set may be added, deleted, or changed over a span of time, the learning on the example set cannot be an one-time process, so nonincremental learning has a problem dealing with changing example populations. However, processing examples one by one in existing incremental algorithms is a very tedious process when the example set is extraordinary large. In addition, when some of the examples are noisy, the results learned from them must be reverted at a later stage. As stated in [Schlimmer and Fisher 1986], incremental learning provides predictive results that depend on the particular order of the data presentation. This paper designs a new incremental learning algorithm, multi-layer induction, which divides an initial training set into subsets of approximately equal

size, runs an existing induction algorithm on the rst subset to obtain a rst set of rules, and then processes each of the remaining data subsets at a time by incorporating the induction results from the previous subset(s). This way, multi-layer induction accumulates discovered rules from each data subset at each layer and produces a nal integrated output which represents the original data more accurately. Any noisy data contained in the original data set can be partitioned and diminished in multi-layer induction into the small data subsets, thus the eects of noise would be diluted and induction eciency can be increased. The existing algorithm used in this paper for experiments is HCV (Version 2.0) [Wu 1995], a nonincremental rule induction system that in many cases performs better than other induction algorithms in terms of rule complexity and predictive accuracy. 2 MLII: Multi-Layer Incremental Induction Multi-layer incremental induction (MLII) applies three learning operations, data partitioning, rule reduction and rule generalization into a self-developed process. Generalization and reduction work together with sequential incrementality in order to learn and rene rules incrementally. After data partitioning, MLII handles example subsets sequentially through the generalization-reduction process. The sequential incrementality is particularly useful in cases of huge amount of data, in order to avoid exponential explosion. 2.1 Algorithm Outline In the rst step, the initial data set is partitioned into a number of data subsets of approximately equal size in a random shued way. In the second step, a set of rules is learned from a rst subset of examples by a generalization algorithm. The only assumption we make here is that the generalization algorithm is able to produce deliberately under-optimal solutions (rules are redundant). This way, the learning problem is given an approximate rule set, and this rule set will be rened with other data subsets. The third step performs the transition toward another learning problem, namely the renement of the previous set of rules. This transition is performed by a redescription operator called reduction, which derives a new set of behavioral examples by examining the behavior of the rule set from Step 2 over a second data subset. From these behavioral examples, generalization can extract new rules, which are expected to correct defects and inconsistencies of previous rules. A sequence of rule sets is so gradually built. Successive applications of the above generalizationreduction process allow more accurate and more complex (because of disjunctive) rules to be discovered, by sequentially handling the subsets of examples. 2.2 Data Partitioning Data partitioning aects the quality of information in each data subset and in turn aects the performance of multi-layer induction. Our main design aim here

is to dilute the noise in the original data set and evenly distribute examples of dierent classes. The partitioning process is designed as follows. 1. Shue all examples in the training set randomly. 2. Put examples of each class into one separate group. 3. Count the number of examples in each class group and get the ratio of the numbers of each class. 4. Randomly select examples from each class group according to the above ratio and put them into a subset. This process performs for N times (where N is the number parameter adjusted by the user). In some cases, the example ratio from dierent class groups cannot be integers and for the last subset some class groups may still have examples while other class groups do not have any examples. In these cases, we do not form the last subset, but insert the remaining examples randomly into the existing subsets. 2.3 Generalization Generalization compresses initial information. It involves observing a (sub)set of training examples of some particular concept, identifying the essential features common to the positive examples in these training examples, and then formulating a concept denition based on these common features. The generalization process can thus be viewed as a search through a space of possible concept denitions for a correct denition of the concept to be learned. Because the space of possible concept denitions is vast, the heart of the generalization problem lies in utilizing whatever training data, assumptions and knowledge are available to constrain the search. In MLII, discriminant generalization by elimination [Tim 1993] is adapted. A discriminant description species an expression (or a logical disjunction of such expressions) that distinguishes a given class from a xed number of other classes. The minimal discriminant descriptions are the shortest expressions (i.e., with the minimum number of descriptors) distinguishing all objects in the given class from objects of other classes. Such descriptions specify the minimum information sucient to identify the given class among a xed number of other classes. These discriminant descriptions will be converted into generalization rules. A generalization rule is a transformation of a description into a more general description that tautologically implies the initial description. Generalization rules are not truth-preserving but falsity preserving, which means that if an event falsies some description, then it also falsies a more general description. This is immediately seen by observing that H ) F is equivalent to :F ) :H (the law of contraposition). Generalization by Elimination Generalization by elimination lies on the concept of the star methodology [Michalski 1984]. Its main originality is a logical pruning of counter-examples, based on the near-miss notion [Kodrato 1984].

Let s be an example of an example (sub)set A. Any counter-example t of A gives a constraint over the generalization of s: the descriptors which discriminate t from s cannot be dropped simultaneously. The constraint C(s, t) is a subset of integers, given by C(s; t) = fijattribute i discriminates s and tg: A counter-example t 0 is a maximal near-miss to s in A if the constraint C(s, t0) is minimal for the set inclusion, among all C(s, t). We search all maximal near-miss counter-examples to nd such an integer set M that intersects every constraint C(s,t). From M, a rule R sm is dened as follows: its premises are the conjunction of all conditions in s. We prove that R sm is a maximally discriminant generalization of s. By construction, for any counter-example t discriminated from s, there exists an element in C(s,t) which belongs to M: the corresponding attribute allows to discriminate s and t; this condition is kept from s to R sm, hence R sm still discriminates t. The search for M can be achieved by a graph exploration, which is exponential with respect to the number of constraints. However, it is enough for a subset M to intersect all C(s,t) for t maximally near-miss to s. This generalization by elimination therefore reduces the size of exponential exploration by a preliminary (polynomial) pruning. Predicate Calculus for Reduction On the discriminant rules obtained above, we apply predicate calculus [Leung 1992] to generate more general rules. The following is a list of formulae (where X; Y and Z each represent a conditional statement and : represents complement (not)) we have used in our MLII system. 1. :(:(X)) X 2. X V Y Y V X (the commutative law of conjunction) 3. X V (Y V Z) (X V Y ) V Z (the associative law of conjunction) 4. X W X X 5. X W Y Y W X (the commutative law of disjunction) 6. X W (Y V Z) (X W Y ) V (X W Z) (the distributive law) 7. X V (Y W Z) (X V Y ) W (X V Z) (the distribute law) 8. :(X W Y ) (:X) V (:Y ) (De Morgan's law) 9. :(X V Y ) (:X) W (:Y ) (De Morgan's law) These laws are useful to combine dierent conditional rules together by symbolic resolution in order to get generalized conditional rules (meta-rules).

2.4 Reduction Let denote the description space of the learning domain, B be the set of rules expressed within, and L be the number of rules in B. For any rule in B, we say an example in res the rule if the description of the example satises the premises of the rule. Denition 1. Reduction, denoted by B, is a redescription operator dened as follows. B:?! [0; 1] L B: s 2?! B(s) = [ r j (s), j = 1,, L] where the reduced descriptor r j is given by: 1 if s fires the j-th rule in B r j (s) = 0 otherwise The redescription transforms each example in into an L-dimension description. The class of the example does not change. Denition 2. From a learning set A and a rule set B, the reduced learning set, denoted by A B, is generated as follows. A B = [( B (s i ); Class(s i ))j(s i ; Class(s i )) 2 A] where Class(s i ) indicates the class information of the s i example. The reduced learning set A B describes the behaviour of B on the examples in A. It is expressed in boolean logic, whatever the initial representation of A and B are. Generalization can be carried out on the reduced learning set to produce a rened set of rules. The rened set of rules is applied to a new subset of the original training examples to obtain a new learning set for further generalization, and so on. The number of examples in the reduced learning set A B is generally less than the number of examples in the initial learning set A. But the reduced learning set must still contain enough information in order to enable a further generalization. So the number of examples in each data subset should not decrease too much. 2.5 Renement of Previous Rules At each learning layer, generalization on a reduced learning set A B renes the rule set B from previous layer(s). First, if a rule in B has a good predictive accuracy, this information is implicitly available from the reduced learning set A B : it is often red by the examples in A and consequently, the corresponding descriptor in A B takes the value 1, and the class of these examples is often the same as the rule class. Hence, there is a correlation in A B between a value of this descriptor and a value of the class information, and rules with a good predictive accuracy will be discovered again by next generalization. This process is stable, as good rules in B are carried on. Second, the same argument above ensures that irrelevant rules are dropped: if a rule is irrelevant, the associated reduced descriptor is irrelevant with respect to A B too. As generalization is supposed to detect and drop irrelevant descriptors, the rules learned from A B do not keep previous irrelevant rules.

Expermnt 1 Expermnt 2 Expermnt 3 Expermnt 4 Database Person 1 Person 2 Labor-Neg 1 Labor-Neg 2 No of training examples 100 300 100 300 Number of attributes 4 4 5 5 Number of classes 2 2 2 2 Missing values 10 15 23 10 Misclassications 3 7 17 7 Level of noise low low high low No of test examples 30 50 50 80 No of HCV rules 8 12 16 10 Accuracy of HCV rules 88.92% 77.33% 81.71% 87.77% No of MLII rules 6 7 10 8 Accuracy of MLII rules 98.88% 90.73% 92.57% 94.61% Table 1. Summary of Experiments 1-4. Third, generalization discovers links among descriptors and classes. In the reduced learning set A B, examples are described according to the rules in B they trigger. Hence, the triggering of rules can be generalized from A B : the generalization solves conicts arising among previous rules. 3 Experiments In this section, we set up a few experiments to compare the predictive accuracy of MLII rules with the HCV induction program [Wu 1995]. 3.1 Experiments 1 to 4 Table 1 provides a summary of the data sets used in our rst four experiments and the results. The 4 databases were all copied from the University of California at Irvine machine learning database repository [Murphy & Aha 95], and each contains certain level of noise. These databases have been selected because each of them consists of two standard components when created or collected by the original providers: a training set and a test set. The databases have been used \as is". Example ordering has not been changed, neither have examples been moved between the sets. For each database, we ran each of HCV (Version 2.0) and MLII (with 4 layers) 10 times on the training set, and the accuracies listed in Tables 1 are average results from the test set. For all the 4 databases, MLII (with 4 layers) performs better than HCV (Version 2.0). The accuracy dierence on each database between MLII and HCV is statistically signicant. Therefore, we conclude that with a carefully selected number of layers, MLII achieves signicant improvement on HCV in terms of rule accuracy.

Layer No. Rule Set Test Set Accuracy 1 B 73.331% 2 C1 63.214% C2 69.643% 3 D1 54.123% D2 70.813% D3 80.771% 4 E1 45.634% E2 60.811% E3 75.123% E4 92.512% Table 2. Results of Experiment 5. 3.2 Experiment 5 The purpose of this experiment is to check the change of accuracy of MLII rules generated on each layer in a n-layer induction. The data set used is labor-neg1, the same as used in Experiment 3. Table 2 shows the results, and Figure 1 provides a visual illustration of the same results. From the graph in Figure 1, it is obvious that at the rst layer, the accuracy of the induced rules on the same test data decreases when the number of layers increases in MLII. The highest is HCV induction (just one layer) and lowest is the 4-layer MLII. The reason is that HCV uses the whole 300 training examples to generate rules, while MLII uses only a subset (one-nth of the training set) at the rst layer for HCV to generate an initial rule set. We have tried up to 4 layers with MLII, and the rule accuracy on the test set is always increasing at the last layer. A question arise here. What is the optimal number of layers for MLII on a training set? Based on various experiments we have carried out, it depends on the size and the noise level of the training set. If the size of the training set is very large (e.g. 5000 examples or more) and there exists high-level noise, more layers of learning allows deeper rule renement to dilute the noise. Otherwise, if we use a large number of layers on a small data set, MLII can not gain enough information to generate approximate (redundant) rules for later successive renement, and this in turn aects the completeness and consistency of the nal generated rules. For each curve in Figure 1, we can nd that the test set accuracy increases signicantly from the rst to second layer rules, still increases from the second to third layer and the improvement decreases as the number of layers increases. This indicates that the approximate rules generated at the rst layer are successively rened at the following learning layers (i.e., rules become more consistent and

Fig. 1. Rule Accuracy at Each Induction Layer of Experiment 5. accurate) and nally achieve an optimal level and are no longer redundant. Therefore, the successive learning should be stopped and the optimal rules be taken as the nal rules. In general, it is not the case that the more layers we use, the more accurate nal rules we will get. 4 Conclusions Multi-layer induction learns accurate rules in an incremental manner. It handles subsets of training examples sequentially. Compared to handling training examples one by one in existing incremental learning algorithms, this sequential incrementality is more exible, because the size of data subsets is controlled by data partitioning in multi-layer induction. Multi-layer induction suits noisy domains, because data partitioning dilutes the eects of noise into data subsets. Five experiments were carried out in this paper to quantify the gains of MLII, and signicant improvement of rule accuracy has been achieved. Multi-layer induction is designed for handling large and/or noisy data sets. With medium-sized, noisefree data sets, we have not found much improvement of MLII on HCV induction in rule accuracy. The information quality of the data sets is a critical factor to determine the number of layers in MLII. The noise level, number of training examples, numbers of attributes and classes, and value domains of attributes are all contributing factors when applying MLII to a particular data set.

Future work will involve applying MLII to other induction programs, such as C4.5 [Quinlan 1993], extending the experiments to larger data sets and comparing with other incremental learning methods such as case-based learning [Ram 1990] which learns incrementally case by case and treats each case as a chunk of partially matched rules. References [Kodrato 1984] Kodrato, Y. (1984). Learning complex structural descriptions from examples. Computer vision, graphics and image processing 27. [Langley 1996] Langley, P. (1996). Elements of Machine Learning. Morgan Kaufmann. [Leung 1992] Leung, K. T. (1992). Elementary Set Theory (3 Ed.). Hong Kong University Press. [Michalski 1984] Michalski, R. S. (1984). A theory and methodology for inductive learning. Articial Intelligence 20 (2). [Michalski 1985] Michalski, R. S. (1985). Knowledge repair mechanisms: Evolution versus revolution. In Proceedings of the Third International Machine Learning Workshop, 116{119. Rutgers University. [Murphy & Aha 95] Murphy, P.M. & Aha, D.W. (1995). UCI Repository of Machine Learning Databases, Machine-Readable Data Repository. University of California, Department of Information and Computer Science, Irvine, CA. [Quinlan 1993] Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann. [Ram 1990] Ram, A. (1990). Incremental learning of explanation patterns and their indices. In Proceedings of the Seventh International Conference on Machine Learning, 49{57. Morgan Kaufmann. [Schlimmer and Fisher 1986] Schlimmer, J. and Fisher, D. (1986). A case study of incremental concept induction. In Proceedings of the Fifth National Conference on Artical Intelligence, pp. 496{501. Morgan Kaufmann. [Tim 1993] Tim, N. (1993, Feb). Discriminant generalization in logic program. Knowledge Representation and Organization in Machine Learning 14 (3), 345{351. [Wu 1995] Wu, X. (1995). Knowledge Acquisition from Databases. Ablex.