Building Intelligent Learning Database Systems

Similar documents
BITS F464: MACHINE LEARNING

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision tree learning

Decision Tree CE-717 : Machine Learning Sharif University of Technology

6. Dicretization methods 6.1 The purpose of discretization

1) Give decision trees to represent the following Boolean functions:

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification

Data Mining Part 5. Prediction

Contents. Preface to the Second Edition

CS Machine Learning

Association Pattern Mining. Lijun Zhang

Machine Learning. Decision Trees. Le Song /15-781, Spring Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU

Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3

COMP 465: Data Mining Classification Basics

Classification: Basic Concepts, Decision Trees, and Model Evaluation

Lecture 7: Decision Trees

Clustering part II 1

CSE4334/5334 DATA MINING

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Data Mining Practical Machine Learning Tools and Techniques

EXTRACTING RULE SCHEMAS FROM RULES, FOR AN INTELLIGENT LEARNING DATABASE SYSTEM

Classification with Decision Tree Induction

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Lecture 5: Decision Trees (Part II)

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

Using Machine Learning to Optimize Storage Systems

An Information-Theoretic Approach to the Prepruning of Classification Rules

Basic Data Mining Technique

Lesson 3. Prof. Enza Messina

Exam Advanced Data Mining Date: Time:

Data Mining Concepts & Techniques

Lecture on Modeling Tools for Clustering & Regression

CS229 Lecture notes. Raphael John Lamarre Townshend

PAM algorithm. Types of Data in Cluster Analysis. A Categorization of Major Clustering Methods. Partitioning i Methods. Hierarchical Methods

2. Discovery of Association Rules

7. Decision or classification trees

Machine Learning. Decision Trees. Manfred Huber

Data Mining. Decision Tree. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Preprocessing. Slides by: Shree Jaswal

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

2. Data Preprocessing

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

Classification and Regression Trees

Semi-Supervised Clustering with Partial Background Information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Unsupervised Learning

Fuzzy Partitioning with FID3.1

Slides for Data Mining by I. H. Witten and E. Frank

Extra readings beyond the lecture slides are important:

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Weka ( )

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

Classification. Instructor: Wei Ding

DECISION TREE INDUCTION USING ROUGH SET THEORY COMPARATIVE STUDY

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Community Detection. Community

DATA WAREHOUING UNIT I

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Introduction to Machine Learning

Sequence clustering. Introduction. Clustering basics. Hierarchical clustering

3. Data Preprocessing. 3.1 Introduction

Classification. 1 o Semestre 2007/2008

Machine Learning Techniques for Data Mining

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Association Rules. Berlin Chen References:

Unsupervised Learning : Clustering

Lecture outline. Decision-tree classification

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008

Machine Learning: Symbolische Ansätze

Data Mining Part 3. Associations Rules

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme

Clustering: Classic Methods and Modern Views

Data Analytics and Boolean Algebras

Linear Methods for Regression and Shrinkage Methods

15.4 Longest common subsequence

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

6. Learning Partitions of a Set

A Review on Cluster Based Approach in Data Mining

INF4820, Algorithms for AI and NLP: Hierarchical Clustering

Unsupervised: no target value to predict

Part I. Instructor: Wei Ding

Chapter 3: Supervised Learning

Section A. 1. a) Explain the evolution of information systems into today s complex information ecosystems and its consequences.

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

A Systematic Overview of Data Mining Algorithms

Data Mining and Knowledge Discovery: Practice Notes

Unsupervised Learning

Road map. Basic concepts

Segmentation of Images

University of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka

Midterm Examination CS 540-2: Introduction to Artificial Intelligence

Expectation Maximization (EM) and Gaussian Mixture Models

Numeric Ranges Handling for Graph Based Knowledge Discovery Oscar E. Romero A., Jesús A. González B., Lawrence B. Holder

Transcription:

Building Intelligent Learning Database Systems 1. Intelligent Learning Database Systems: A Definition (Wu 1995, Wu 2000) 2. Induction: Mining Knowledge from Data Decision tree construction (ID3 and C4.5) Generating variable-valued logic rules (HCV) 3. Deduction of Induction Results No match and multiple match 4. KEshell2: An Intelligent Learning Database System Practical Issues Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 1

1. Intelligent Learning Database Systems: A Definition An ILDB system (Wu 1995, Wu 2000) provides mechanisms for 1. preparation and translation of standard (e.g. relational) database information into a form suitable for use by its induction engines, 2. using induction techniques to extract knowledge from databases, and 3. interpreting the extracted knowledge to solve users problems by deduction (or KBS technology). Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 2

2. Induction: Mining Knowledge from Data Induction paradigms Supervised classification (e.g. C4.5-like algorithms) (with or without background knowledge) Association analysis (e.g. Apriori and FT-Growth) Unsupervised clustering (such as k-means, COBWEB and BIRCH) Discovery of quantitative laws (e.g. BACON-like systems) Criteria for evaluating induction algorithms Time complexity Output compactness Accuracy in different data environments * Database characteristics Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 3

2.1 Rule Induction: A Simplified Example ORDER X1 X2 X3 X4 CLASS 1 1 a # 1 F 2 1 a a 0 F 3 1 b c 1 T 4 0 b b 0 T 5 0 a c 1 T 6 1 b a # T 7 1 c c 0 F 8 1 c b 1 F 9 0 c b 0 T 10 0 a a 0 T 11 0 c c 1 F 12 0 c a 0 T 13 1 a b 0 F 14 0 a a 1 T 15 0 b a 1 T Rules produced by HCV: [ X2=[b] ] [ X2=[c,a] ] [ X1=[0] ] [ X1=[1] ] [ X2=[a] ] [ X2=[c] ] [ X1=[0] ] [ X4=[1] ] [ X4=[0] ] The F class. The T class. Advantages: (1) Implied but not explicit in the database (2) Smaller in volume (3) More general in description Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 4

2.2 Decision Tree Construction ORDER X1 X2 X3 X4 CLASS 1 1 a # 1 F 2 1 a a 0 F 3 1 b c 1 T 4 0 b b 0 T 5 0 a c 1 T 6 1 b a # T 7 1 c c 0 F 8 1 c b 1 F 9 0 c b 0 T 10 0 a a 0 T 11 0 c c 1 F 12 0 c a 0 T 13 1 a b 0 F 14 0 a a 1 T 15 0 b a 1 T T a X1 0 1 X3 X2 c a c b b T X2 F T F a c T F Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 5

Decision Tree Construction (2) S1: T the whole training set. Create a T node. S2: If all examples in T are positive examples of a specific class, create a yes node with T as its parent and stop; If all examples in T are negative, create a no node with T as its parent and stop. S3: Select an attribute X with values V 1,...,V N and partition T into subsets T 1,...,T N according to their values on X. Create N T i nodes (i = 1,...,N) with T as their parent and X = V i as the label of the branch from T to T i. S4: For each T i do: T Ti and goto S2. Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 6

ID3 and C4.5: Key ideas Suppose T = P E N E where PE is the set of positive examples and NE is the set of negative examples, p = PE and n = NE. p+n p log 2p+n p p+n n log 2p+n n when p 0&n 0 I(p,n) = 0 otherwise. (1) If attribute X with value domain {v 1,...,v N } is used for the root of the decision tree, it will partition T into {T 1,...,T N } where T i contains those examples in T that have value v i of X. Let T i contain p i examples of PE and n i of NE. The expected information required for the subtree for T i is I(p i,n i ). Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 7

ID3 and C4.5: Key ideas (2) The expected information required for the tree with X as root, EI(X), is then obtained as weighted average. EI(X) = N p i + n i i=1 p + n I(p i,n i ) (2) where the weight for the i-th branch is the proportion of the examples in T that belong to T i. The information gained by branching on X, G(X), is therefore G(X) = I(p,n) EI(X). (3) Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 8

ID3 and C4.5: Key ideas (3) ID3 examines all candidate attributes, chooses X to maximize G(X), and then uses the same process recursively to construct decision trees for residual subsets T 1,...,T N. C4.5 adopts a new heuristic, the gain ratio criterion, instead of G(X), for selecting tests in decision tree generation. In the gain ratio criterion, G(X)/IV(X) is used to replace G(X) where IV (X) = N i=1 p i + n i p + n log 2( p i + n i p + n ) (4) Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 9

Other topics in C4.5-like algorithms Incremental induction: The windowing technique. Noise handling: Halt tree growth when no more significant information gain can be found. Sensible mechanisms for handling missing information. Post-pruning of decision trees. Decompiling decision trees into production rules. Binarization of decision trees. Structured induction. Real-valued attributes: Discretization. Classification of more than 2 classes with a decision tree. Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 10

2.3 Generating variable-valued logic rules The variable-valued logic is a calculus for representing decision problems where decision variables can take on some range of values. Its principal syntactic entity is a selector with the general form [X#R] (5) where X is a variable or attribute, # is a relational operator (such as =,,<,>,, and ), and R, called a reference, is a list of one or more values that X could take on. A well-formed rule in the logic is similar to a production rule but with selectors as the basic components of both its left-hand and right-hand sides. Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 11

The HCV algorithm Let attributes: {X 1,...,X a } = a; positives of a specific concept C: PE = {e + 1,...,e + p } = p; negatives: NE = {e 1,...,e n } = n. 1. Partition PE into p (p p) intersecting groups, 2. Apply a set of heuristics (in HFL) to find a conjunctive Hfl for each intersecting group, 3. Give the final disjunctive rule for the concept C by logically ORing all the Hfl s. Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 12

The HCV algorithm (2) Definition 1. The extension matrix of e + k against NE EM k = (r ijk ) n a where whenv j + r ijk = k = NEM ij NEM ij whenv j + k NEM ij and denotes a dead element. Definition 2. In an EM k, a set of n nondead elements r iji (i = 1,...,n, j i {1,...,a}) that come from the n different i rows is called a path in the extension matrix. A path {r 1j1,...,r njn } in an EM k corresponds to a conjunctive formula L = n [X j i r iji ] i=1 which covers e + k against NE and vice versa. Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 13

The HCV algorithm (3) Definition 3. Matrix EMD = (r ij ) n a with when k 1 {i 1,...,i k } : EM k1 (i,j) = r ij = k k 2 =1EM ik2 (i,j) otherwise is called the disjunction matrix of the positive example set {e + i 1,...,e + i k } against NE or the disjunction matrix of EM i1,...,em ik. A path {r 1j1,...,r njn } in the EMD of {e + i 1,...,e + i k } against NE corresponds to a conjunctive formula or cover L = n [X j i r iji ] i=1 which covers all of {e + i 1,...,e + i k } against NE and vice versa. Definition 4. If there exists at least one path in the EMD of {e + i 1,...,e + i k } against NE, all the positive examples intersect and the positive example set is called an intersecting group. Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 14

Heuristics used in HCV: The HCV algorithm (4) Partitioning: Divide all the positive examples into intersection groups. Start with the lowest ordered, uncovered positive example each time and try to form an intersecting group as large as possible. Use the following heuristics to generate a conjunctive rule for each intersecting group. The fast strategy. In an extension matrix EM k = (r ij ) n a, if there is no dead element in a (say j) column, then [X j r j ] where r j = n i=1r ij is chosen as the one selector cover for EM k. The precedence strategy. When a r ij in column j is the only nondead element of a row i, the selector [X j r j ] is called an inevitable selector and thus is chosen with top precedence. The elimination strategy. When each appearance of some nondead element in the j 1 -th column of some row is always coupled with another nondead element in the j 2 -th column of the same row, X j1 is called an eliminable attribute and thus eliminated by X j2. The least-frequency strategy. Exclude a least-frequency selector which has least nondead elements in its corresponding column in the extension matrix. Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 15

2.4 Processing real-valued attributes The simplest class-separating method: Place interval borders between each adjacent pair of examples that belong to different classes. Bayesian Classifiers: 1. For each class C i, draw a distribution probability curve P(C i x)p(c i ) where P(C i x) indicates the probability of an example belonging to class C i when the example takes the value of x on the continuous attribute in question. 2. Cut points are placed so that in each two adjacent intervals the class of the predominant curve changes. Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 16

Processing real-valued attributes (2) Information gain methods: Binarization of continuous attributes (in C4.5, for example) Find a cut point (e.g. the average) between each adjacent pair of example values of two different classes; and split the current interval where the cut point is located in if (1) the splitting on the value produces information gain and (2) the number of examples in the current interval is greater than a specific threshold. Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 17

2.5 Noise handling Sources of noise: Erroneous attribute values Missing attribute values (Don t Know values, denoted by?) Misclassifications (including contradictions) Uneven distribution of training examples in the example space Redundant data: multiple copies of the same examples Don t Care (#) values are not noise! Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 18

Stages of noise handling (1) Preprocessing of training examples Remove redundant and contradictory data with attention to? and # values Replace? values with the most frequent attribute values, (for examples of the same class only or the whole training set) Sensible mechanisms for handling? values Generate negative/positive examples under the closed world assumption Induction time (pre-pruning): Avoid overfitting Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 19

Stages of noise handling (2) Post-pruning of induction results The TRUNC method in AQ15 Reduced error pruning with a separate pruning set of examples Pessimistic pruning in C4.5 with the user-specified confidence level Transformation of the representation language of induction results and pruning the new representation Deduction time: Process no match and multiple match examples. Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 20

3. Deduction of Induction Results Single match: a test example matches with the description of a specific class No match Multiple match Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 21

3.1 No match Largest class Measure of fit Fuzzy edges Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 22

Measure of fit 1. MF for a selector (sel) against a no match example, e: 1 if selector sel is satisfied by e MF(sel, e) = v V otherwise (6) where v is the number of disjunctively linked attribute values in sel and V is the size of the attribute s domain 2. MF for a conjunctive rule, conj: MF(conj,e) = n(conj) N Π kmf(sel k, e) (7) where n(conj) is the number of examples in the training set that conj covers and N is the total number of examples. 3. MF for a class c i is the probabilistic sum of its conjunctive rules: MF(c i, e) = MF(conj 1, e) + MF(conj 2 ) MF(conj 1, e)mf(conj 2, e) (8) 4. Assign e to the closest class with the highest MF value Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 23

3.2 Multiple match First hit Largest class Largest conjunctive rule Estimate of Probability Fuzzy Interpretation of Discretized Intervals Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 24

Estimate of Probability EP(conj,e) = n(conj) N if conj is satisfied by e 0 otherwise (9) EP(c i,e) = EP(conj 1, e) + EP(conj 2 ) EP(conj 1, e)ep(conj 2,e) (10) Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 25

3.2 Fuzzy Interpretation of Discretized Intervals (Wu 1999) Fuzzy borders: each discretized interval is associated with a specific membership function. l I i Fuzzy interpretation: a value can be classified into a few different intervals at the same time, with varying degrees, and the interval with the greatest degree is chosen as the value s discrete value. ls x left x ls right Results: Fuzzy matching works well with no matches, but less encouraging with multiple matches, hence integration of probability estimation and fuzzy matching is recommended. Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 26

4. KEshell2: An intelligent learning database system Monitor K.A.Engine I/D Engine KBMS Utility Access Storage Interface DBMS O S Monitor: a man-machine interface KB: knowledge bases DB: data bases OS: operating system facilities Access Storage Interface: KB/DB operators Utility: a set of common procedures DBMS: a database management subsystem KBMS: a KB maintenance subsystem K.A. Engine: a knowledge acquisition engine I/D Engine: an inference/deduction engine KB DB Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 27

Practical Issues in Building ILDB Systems Dealing with noise and different types of data (such as symbolic and real-valued): wide attention has been received. Instance selection: select or search for a portion of a large database that can be used in data mining instead of using the whole database. Structured induction: decompose of a complex problem into a number of subproblems by using domain knowledge and apply an induction algorithm to each of the subproblems. Incremental induction in the case of large, dynamic realworld databases. Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 28

5. Conclusions All the functions and capacities shown in KEshell2 have demonstrated that the target of building intelligent learning database systems to implement automatic KA from DBs is no longer difficult. Knowledge discovery from databases is still an important research frontier for both machine learning and database technology. ILDB systems have wide potential application to various classification problems, such as diagnosis, pattern recognition, prediction, and decision making, where there are large amounts of data sets to be sorted. Data mining has its distinctive goal from related fields (such as machine learning and databases), and require distinctive tools. An ILDB system is one of such tools. Building Intelligent Learning Database Systems Copyright c 2014: Xindong Wu, University of Vermont Overhead 29