Aggregation and Selection in Relational Data Mining

Size: px
Start display at page:

Download "Aggregation and Selection in Relational Data Mining"

Transcription

1 in Relational Data Mining Celine Vens Anneleen Van Assche Hendrik Blockeel Sašo Džeroski Department of Computer Science - K.U.Leuven Department of Knowledge Technologies - Jozef Stefan Institute, Slovenia C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

2 Outline Introduction C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

3 Relational Data Mining Data Mining: searching for patterns in (large) databases. Propositional (Classical) Data Mining: data is stored in single table patterns involve intra-tuple relations Relational Data Mining: data is stored in multiple tables (relational database) patterns involve inter-tuple or inter-table relations how to deal with 1-n or m-n relations (sets)? C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

4 Working Example Current relational learners : 2 approaches to dealing with sets C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

5 Outline Introduction C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

6 First approach: Aggregation Use SQL-like aggregation to summarize set in one big table Apply classical data mining technique (e.g. decision tree inducer) Optimized for highly non-determinate domains C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

7 Second approach: Selection Apply relational data mining technique (e.g. relational decision tree inducer) Test for existence of specific elements in the set Optimized for structurally complex domains e.g. ILP: Inductive Logic Programming database and patterns in Prolog possibility to add background knowledge C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

8 Example concepts 1. Persons that have two books. 2. Persons that have a computer book. 3. Persons that have two computer books. How to express concept 3?? Selective methods need aggregate function in background knowledge. Aggregating methods need separate relation for each genre. Solution: combine aggregation and selection in context of relational data mining C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

9 Decision Trees Combining selection and aggregation Outline Introduction Decision Trees Combining C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

10 Decision Trees Combining selection and aggregation Decision Trees One of the most widely used and practical data mining methods Each internal node contains a test on some attribute Each leaf contains a prediction Classification of new instance C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

11 Decision Trees Combining selection and aggregation Decision Trees: learning them Divide & conquer algorithm Pseudocode: grow node(node,examples): IF stopcriterium: assign majority class from Examples to Node ELSE generate all possible tests for Node associate best test with Node grow two childnodes Left and Right split Examples into ExamplesPass and ExamplesFail grow node(left,examplespass) grow node(right,examplesfail) C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

12 Decision Trees Combining selection and aggregation : learning them Upgrade of classical algorithm: Tilde [Blockeel and De Raedt 98] Trees are relational: contain first order logic literals in test of internal node Selective approach (ILP) Tests can introduce variables : possible tests may differ at each node C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

13 Decision Trees Combining selection and aggregation Adding aggregation User specifies basic components: aggregate functions, sets to be aggregated, query to generate set to be aggregated Aggregate conditions are created, using discretization Aggregate conditions are added to the set of possible tests C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

14 Decision Trees Combining selection and aggregation Adding selections to aggregation: first manner If a node contains an aggregation, any node in its left subtree can add a selection within that aggregate condition Local search within aggregate condition C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

15 Decision Trees Combining selection and aggregation Adding selections to aggregation: second manner Lookahead technique to look ahead in refinement lattice add several literals at once computationally expensive C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

16 Decision Trees Combining selection and aggregation with aggregation and selection: learning them Pseudocode: grow node(node,examples): IF stopcriterium: assign majority class from Examples to Node ELSE generate all possible first order tests for Node: usual tests aggregate functions refinement of aggregate function higher in tree associate best test with Node grow two childnodes Left and Right split Examples into ExamplesPass and ExamplesFail grow node(left,examplespass) grow node(right,examplesfail) C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

17 Decision Trees Combining selection and aggregation with aggregation and selection: problem Number of tests at each node in the tree grows very fast Need some way to deal with it Make use of technique from classical data mining: Random Forests [Breiman 01] C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

18 Random Forests Outline Introduction Random Forests C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

19 Random Forests Random Forests Random Forests C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

20 Random Forests Random Forests Random Decision Tree Algorithm T = f ( T ) with e.g. f (x) = 0.1x or f (x) = x C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

21 Random Forests Random forests with our relational decision tree algorithm. Pseudocode: grow node(node,examples,probability): IF stopcriterium: assign majority class from Examples to Node ELSE generate all possible first order tests for Node: usual tests aggregate functions refinement of aggregate function higher in tree select random subset from possible tests using Probability associate best test out of random subset with Node grow two childnodes Left and Right split Examples into ExamplesPass and ExamplesFail grow node(left,examplespass,probability) grow node(right,examplesfail,probability) C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

22 Real world data Artificial data Outline Introduction Real world data Artificial data C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

23 Real world data Artificial data Experimental Setup: Real world data Average over 5 times 5-fold cross-validation Different parameters: number of trees: 3, 11, 33 proportion of feature sample: 100%, 75%, 50%, 25%, 10%, sqrt level of aggregates: No Aggregates (NA), Simple Aggregates (SA), Refined Aggregates (RA), Lookahead Aggregates (LA) C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

24 Real world data Artificial data : Real world data The effect of aggregates and the number of trees (P = 0.25) C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

25 Real world data Artificial data : Real world data The effect of the number of features (e.g. Mutagenesis) FORF (33 trees) P LA RA SA NA sqrt Tilde NA C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

26 Real world data Artificial data : Real world data Compared to other systems (FORF-SA uses 33 trees and 25% of the features) Financial FORF-SA DINUS-C RELAGGS PROGOL (0.005) (0.103) (0.065) (0.071) Diterpenes FORF-SA FOIL IBL-matchings ICL (0.006) (0.011) (0.006) (0.009) C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

27 Real world data Artificial data : Artificial data Summary of experimental results so far: Positive effect of random forest Positive effect of adding (simple) aggregates Effect of combination of aggregates and selection? Artificial dataset C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

28 Real world data Artificial data : Artificial data Datagenerator for east-/ westbound trains. 800 trains, 400 in each direction Target concept: C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

29 Real world data Artificial data : Artificial data Results (P = 0.25, number of trees = 33) Accuracy LA RA SA NA Avg number of nodes in a tree LA RA SA NA Average induction time LA RA SA NA C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

30 Outline Introduction C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

31 First order random forest induction algorithm based on Tilde Feature space enlarged by including aggregates Refinement operator adjusted to include selection conditions within the aggregates Strength was experimentally shown C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

32 Acknowledgements and References Acknowledgements: Maurice Bruynooghe Full Paper: C. Vens, A. Van Assche, H. Blockeel, and S. Dzeroski, First Order Random Forests with Complex Aggregates, Proceedings of the 14th International Conference on Inductive Logic Programming (ILP-2004), Porto, Portugal, 2004 C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining

First order random forests: Learning relational classifiers with complex aggregates

First order random forests: Learning relational classifiers with complex aggregates Mach Learn (2006) 64:149 182 DOI 10.1007/s10994-006-8713-9 First order random forests: Learning relational classifiers with complex aggregates Anneleen Van Assche Celine Vens Hendrik Blockeel Sašo Džeroski

More information

Classifying Relational Data with Neural Networks

Classifying Relational Data with Neural Networks Classifying Relational Data with Neural Networks Werner Uwents and Hendrik Blockeel Katholieke Universiteit Leuven, Department of Computer Science, Celestijnenlaan 200A, B-3001 Leuven {werner.uwents, hendrik.blockeel}@cs.kuleuven.be

More information

Stochastic propositionalization of relational data using aggregates

Stochastic propositionalization of relational data using aggregates Stochastic propositionalization of relational data using aggregates Valentin Gjorgjioski and Sašo Dzeroski Jožef Stefan Institute Abstract. The fact that data is already stored in relational databases

More information

ReMauve: A Relational Model Tree Learner

ReMauve: A Relational Model Tree Learner ReMauve: A Relational Model Tree Learner Celine Vens, Jan Ramon, and Hendrik Blockeel Katholieke Universiteit Leuven - Department of Computer Science, Celestijnenlaan 200 A, 3001 Leuven, Belgium {celine.vens,

More information

Multi-relational Decision Tree Induction

Multi-relational Decision Tree Induction Multi-relational Decision Tree Induction Arno J. Knobbe 1,2, Arno Siebes 2, Daniël van der Wallen 1 1 Syllogic B.V., Hoefseweg 1, 3821 AE, Amersfoort, The Netherlands, {a.knobbe, d.van.der.wallen}@syllogic.com

More information

Constraint Based Induction of Multi-Objective Regression Trees

Constraint Based Induction of Multi-Objective Regression Trees Constraint Based Induction of Multi-Objective Regression Trees Jan Struyf 1 and Sašo Džeroski 2 1 Katholieke Universiteit Leuven, Dept. of Computer Science Celestijnenlaan 200A, B-3001 Leuven, Belgium

More information

The ACE Data Mining System User s Manual. H. Blockeel L. Dehaspe J. Ramon J. Struyf A. Van Assche C. Vens D. Fierens

The ACE Data Mining System User s Manual. H. Blockeel L. Dehaspe J. Ramon J. Struyf A. Van Assche C. Vens D. Fierens The ACE Data Mining System User s Manual H. Blockeel L. Dehaspe J. Ramon J. Struyf A. Van Assche C. Vens D. Fierens March 9, 2009 2 Contents 1 Introduction 7 2 Installing and Running ACE 9 2.1 Using ACE

More information

arxiv:cs/ v1 [cs.lg] 29 Nov 2000

arxiv:cs/ v1 [cs.lg] 29 Nov 2000 arxiv:cs/0011044v1 [cs.lg] 29 Nov 2000 Scaling Up Inductive Logic Programming by Learning From Interpretations Hendrik Blockeel Luc De Raedt Nico Jacobs Bart Demoen Report CW 297, August 2000 Ò Katholieke

More information

Efficient Multi-relational Classification by Tuple ID Propagation

Efficient Multi-relational Classification by Tuple ID Propagation Efficient Multi-relational Classification by Tuple ID Propagation Xiaoxin Yin, Jiawei Han and Jiong Yang Department of Computer Science University of Illinois at Urbana-Champaign {xyin1, hanj, jioyang}@uiuc.edu

More information

Learning Directed Probabilistic Logical Models using Ordering-search

Learning Directed Probabilistic Logical Models using Ordering-search Learning Directed Probabilistic Logical Models using Ordering-search Daan Fierens, Jan Ramon, Maurice Bruynooghe, and Hendrik Blockeel K.U.Leuven, Dept. of Computer Science, Celestijnenlaan 200A, 3001

More information

An Efficient Approximation to Lookahead in Relational Learners

An Efficient Approximation to Lookahead in Relational Learners An Efficient Approximation to Lookahead in Relational Learners Jan Struyf 1, Jesse Davis 2, and David Page 2 1 Katholieke Universiteit Leuven, Dept. of Computer Science Celestijnenlaan 200A, 3001 Leuven,

More information

Experiments with MRDTL -- A Multi-relational Decision Tree Learning Algorithm

Experiments with MRDTL -- A Multi-relational Decision Tree Learning Algorithm Experiments with MRDTL -- A Multi-relational Decision Tree Learning Algorithm Hector Leiva, Anna Atramentov and Vasant Honavar 1 Artificial Intelligence Laboratory Department of Computer Science and Graduate

More information

The role of feature construction in inductive rule learning

The role of feature construction in inductive rule learning The role of feature construction in inductive rule learning Peter A. Flach 1 and Nada Lavrač 2 1 Department of Computer Science University of Bristol, United Kingdom 2 Department of Intelligent Systems

More information

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification Data Mining 3.3 Fall 2008 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rules With Exceptions Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.4. Spring 2010 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms

More information

Multi-relational Data Mining, Using UML for ILP

Multi-relational Data Mining, Using UML for ILP Multi-relational Data Mining, Using UML for ILP Arno J. Knobbe 1,2, Arno Siebes 2, Hendrik Blockeel 3, and Daniël Van Der Wallen 4 1 Perot Systems Nederland B.V., Hoefseweg 1, 3821 AE Amersfoort, The Netherlands,

More information

International Journal of Software and Web Sciences (IJSWS)

International Journal of Software and Web Sciences (IJSWS) International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International

More information

MRDTL: A multi-relational decision tree learning algorithm. Héctor Ariel Leiva. A thesis submitted to the graduate faculty

MRDTL: A multi-relational decision tree learning algorithm. Héctor Ariel Leiva. A thesis submitted to the graduate faculty MRDTL: A multi-relational decision tree learning algorithm by Héctor Ariel Leiva A thesis submitted to the graduate faculty in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE

More information

Simple Decision Forests for Multi-Relational Classification

Simple Decision Forests for Multi-Relational Classification Simple Decision Forests for Multi-Relational Classification Bahareh Bina, Oliver Schulte, Branden Crawford, Zhensong Qian, Yi Xiong School of Computing Science, Simon Fraser University, Burnaby, B.C.,

More information

Multi-relational decision tree algorithm - implementation and experiments. Anna Atramentov. A thesis submitted to the graduate faculty

Multi-relational decision tree algorithm - implementation and experiments. Anna Atramentov. A thesis submitted to the graduate faculty Multi-relational decision tree algorithm - implementation and experiments by Anna Atramentov A thesis submitted to the graduate faculty in partial fulfillment of the requirements for the degree of MASTER

More information

Decomposition of the output space in multi-label classification using feature ranking

Decomposition of the output space in multi-label classification using feature ranking Decomposition of the output space in multi-label classification using feature ranking Stevanche Nikoloski 2,3, Dragi Kocev 1,2, and Sašo Džeroski 1,2 1 Department of Knowledge Technologies, Jožef Stefan

More information

Relational Knowledge Discovery in Databases Heverlee. fhendrik.blockeel,

Relational Knowledge Discovery in Databases Heverlee.   fhendrik.blockeel, Relational Knowledge Discovery in Databases Hendrik Blockeel and Luc De Raedt Katholieke Universiteit Leuven Department of Computer Science Celestijnenlaan 200A 3001 Heverlee e-mail: fhendrik.blockeel,

More information

Macro-operators in Multirelational Learning: a Search-Space Reduction Technique

Macro-operators in Multirelational Learning: a Search-Space Reduction Technique In Proceedings of ECML 02. LNAI. c Springer-Verlag. Macro-operators in Multirelational Learning: a Search-Space Reduction Technique Lourdes Peña Castillo 1 and Stefan Wrobel 23 1 Otto-von-Guericke-University

More information

INDEPENDENCE ASSUMPTIONS FOR MULTI-RELATIONAL CLASSIFICATION

INDEPENDENCE ASSUMPTIONS FOR MULTI-RELATIONAL CLASSIFICATION INDEPENDENCE ASSUMPTIONS FOR MULTI-RELATIONAL CLASSIFICATION by Bahareh Bina BEng, University of Tehran, 2007 a Thesis submitted in partial fulfillment of the requirements for the degree of Master of Science

More information

A Multi-Relational Decision Tree Learning Algorithm - Implementation and Experiments

A Multi-Relational Decision Tree Learning Algorithm - Implementation and Experiments A Multi-Relational Decision Tree Learning Algorithm - Implementation and Experiments Anna Atramentov, Hector Leiva and Vasant Honavar Artificial Intelligence Research Laboratory, Computer Science Department

More information

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control. What is Learning? CS 343: Artificial Intelligence Machine Learning Herbert Simon: Learning is any process by which a system improves performance from experience. What is the task? Classification Problem

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Decision trees Extending previous approach: Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank to permit numeric s: straightforward

More information

Mr-SBC: A Multi-relational Naïve Bayes Classifier

Mr-SBC: A Multi-relational Naïve Bayes Classifier Mr-SBC: A Multi-relational Naïve Bayes Classifier Michelangelo Ceci, Annalisa Appice, and Donato Malerba Dipartimento di Informatica, Università degli Studi via Orabona, 4-70126 Bari - Italy _GIGMETTMGIQEPIVFEa$HMYRMFEMX

More information

Learning Rules. How to use rules? Known methods to learn rules: Comments: 2.1 Learning association rules: General idea

Learning Rules. How to use rules? Known methods to learn rules: Comments: 2.1 Learning association rules: General idea 2. Learning Rules Rule: cond è concl where } cond is a conjunction of predicates (that themselves can be either simple or complex) and } concl is an action (or action sequence) like adding particular knowledge

More information

Multi-Relational Data Mining

Multi-Relational Data Mining Multi-Relational Data Mining Outline [Dzeroski 2003] [Dzeroski & De Raedt 2003] Introduction Inductive Logic Programmming (ILP) Relational Association Rules Relational Decision Trees Relational Distance-Based

More information

A Clustering Approach to Generalized Pattern Identification Based on Multi-instanced Objects with DARA

A Clustering Approach to Generalized Pattern Identification Based on Multi-instanced Objects with DARA A Clustering Approach to Generalized Pattern Identification Based on Multi-instanced Objects with DARA Rayner Alfred 1,2, Dimitar Kazakov 1 1 University of York, Computer Science Department, Heslington,

More information

Contents. 1. Introduction Ripple-down Rules Relational Rules HENRY and ABE...

Contents. 1. Introduction Ripple-down Rules Relational Rules HENRY and ABE... Abstract The modular nature of rules learned by most inductive machine learning algorithms makes them difficult and costly to maintain when the knowledge they are based on changes. Ripple-down rules, a

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

Data Mining Lecture 8: Decision Trees

Data Mining Lecture 8: Decision Trees Data Mining Lecture 8: Decision Trees Jo Houghton ECS Southampton March 8, 2019 1 / 30 Decision Trees - Introduction A decision tree is like a flow chart. E. g. I need to buy a new car Can I afford it?

More information

Symbolic AI. Andre Freitas. Photo by Vasilyev Alexandr

Symbolic AI. Andre Freitas. Photo by Vasilyev Alexandr Symbolic AI Andre Freitas Photo by Vasilyev Alexandr Acknowledgements These slides were based on the slides of: Peter A. Flach, Rule induction tutorial, IDA Spring School 2001. Anoop & Hector, Inductive

More information

An interface for Text Mining with the RAP system

An interface for Text Mining with the RAP system FEUP - Faculty of Engineering of the University of Porto MIEEC - Integrated Master of Electrical and Computers Engineering Dissertation preparation Final Report An interface for Text Mining with the RAP

More information

Trends in Multi-Relational Data Mining Methods

Trends in Multi-Relational Data Mining Methods IJCSNS International Journal of Computer Science and Network Security, VOL.15 No.8, August 2015 101 Trends in Multi-Relational Data Mining Methods CH.Madhusudhan, K. Mrithyunjaya Rao Research Scholar,

More information

CrossMine: Efficient Classification Across Multiple Database Relations

CrossMine: Efficient Classification Across Multiple Database Relations CrossMine: Efficient Classification Across Multiple Database Relations Xiaoxin Yin UIUC xyin1@uiuc.edu Jiawei Han UIUC hanj@cs.uiuc.edu Jiong Yang UIUC jioyang@cs.uiuc.edu Philip S. Yu IBM T. J. Watson

More information

Learning Classification Rules for Multiple Target Attributes

Learning Classification Rules for Multiple Target Attributes Learning Classification Rules for Multiple Target Attributes Bernard Ženko and Sašo Džeroski Department of Knowledge Technologies, Jožef Stefan Institute Jamova cesta 39, SI-1000 Ljubljana, Slovenia, Bernard.Zenko@ijs.si,

More information

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager

More information

Induction of Relational Algebra Expressions

Induction of Relational Algebra Expressions Induction of Relational Algebra Expressions Joris J.M. Gillis and Jan Van den Bussche Universiteit Hasselt and transnationale Universiteit Limburg 1 Introduction In the theory of database systems [1],

More information

A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM

A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM Akshay S. Agrawal 1, Prof. Sachin Bojewar 2 1 P.G. Scholar, Department of Computer Engg., ARMIET, Sapgaon, (India) 2 Associate Professor, VIT,

More information

A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995)

A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995) A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995) Department of Information, Operations and Management Sciences Stern School of Business, NYU padamopo@stern.nyu.edu

More information

Multi-relational data mining in Microsoft SQL Server 2005

Multi-relational data mining in Microsoft SQL Server 2005 Data Mining VII: Data, Text and Web Mining and their Business Applications 151 Multi-relational data mining in Microsoft SQL Server 005 C. L. Curotto 1 & N. F. F. Ebecken & H. Blockeel 1 PPGCC/UFPR, Universidade

More information

Identifying non-redundant literals in clauses with uniqueness propagation

Identifying non-redundant literals in clauses with uniqueness propagation Identifying non-redundant literals in clauses with uniqueness propagation Hendrik Blockeel Department of Computer Science, KU Leuven Abstract. Several authors have proposed increasingly efficient methods

More information

A MODULAR APPROACH TO RELATIONAL DATA MINING

A MODULAR APPROACH TO RELATIONAL DATA MINING Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2002 Proceedings Americas Conference on Information Systems (AMCIS) December 2002 A MODULAR APPROACH TO RELATIONAL DATA MINING Claudia

More information

View Learning Extended: Inventing New Tables for Statistical Relational Learning

View Learning Extended: Inventing New Tables for Statistical Relational Learning View Learning Extended: Inventing New Tables for Statistical Relational Learning Jesse Davis jdavis@cs.wisc.edu Department of Computer Science, University of Wisconsin, Madison, WI 53705 USA Elizabeth

More information

Inductive Logic Programming in Clementine

Inductive Logic Programming in Clementine Inductive Logic Programming in Clementine Sam Brewer 1 and Tom Khabaza 2 Advanced Data Mining Group, SPSS (UK) Ltd 1st Floor, St. Andrew s House, West Street Woking, Surrey GU21 1EB, UK 1 sbrewer@spss.com,

More information

DRILA: A Distributed Relational Inductive Learning Algorithm

DRILA: A Distributed Relational Inductive Learning Algorithm DRILA: A Distributed Relational Inductive Learning Algorithm SALEH M. ABU-SOUD Computer Science Department New York Institute of Technology Amman Campus P.O. Box (1202), Amman, 11941 JORDAN sabusoud@nyit.edu

More information

Learning Horn Expressions with LOGAN-H

Learning Horn Expressions with LOGAN-H Journal of Machine Learning Research 8 (2007) 549-587 Submitted 12/05; Revised 8/06; Published 3/07 Learning Horn Expressions with LOGAN-H Marta Arias Center for Computational Learning Systems Columbia

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees David S. Rosenberg New York University April 3, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 April 3, 2018 1 / 51 Contents 1 Trees 2 Regression

More information

Relational Learning. Jan Struyf and Hendrik Blockeel

Relational Learning. Jan Struyf and Hendrik Blockeel Relational Learning Jan Struyf and Hendrik Blockeel Dept. of Computer Science, Katholieke Universiteit Leuven Celestijnenlaan 200A, 3001 Leuven, Belgium 1 Problem definition Relational learning refers

More information

A Two-level Learning Method for Generalized Multi-instance Problems

A Two-level Learning Method for Generalized Multi-instance Problems A wo-level Learning Method for Generalized Multi-instance Problems Nils Weidmann 1,2, Eibe Frank 2, and Bernhard Pfahringer 2 1 Department of Computer Science University of Freiburg Freiburg, Germany weidmann@informatik.uni-freiburg.de

More information

Combining Gradient Boosting Machines with Collective Inference to Predict Continuous Values

Combining Gradient Boosting Machines with Collective Inference to Predict Continuous Values Combining Gradient Boosting Machines with Collective Inference to Predict Continuous Values Iman Alodah Computer Science Department Purdue University West Lafayette, Indiana 47906 Email: ialodah@purdue.edu

More information

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio

More information

Classifier Inspired Scaling for Training Set Selection

Classifier Inspired Scaling for Training Set Selection Classifier Inspired Scaling for Training Set Selection Walter Bennette DISTRIBUTION A: Approved for public release: distribution unlimited: 16 May 2016. Case #88ABW-2016-2511 Outline Instance-based classification

More information

Comparative Evaluation of Approaches to Propositionalization

Comparative Evaluation of Approaches to Propositionalization Comparative Evaluation of Approaches to Propositionalization M.-A. Krogel, S. Rawles, F. Železný, P. A. Flach, N. Lavrač, and S. Wrobel Otto-von-Guericke-Universität, Magdeburg, Germany, mark.krogel@iws.cs.uni-magdeburg.de

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

CS246: Mining Massive Datasets Jure Leskovec, Stanford University. CS246: Mining Massive atasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/23/2011 Jure Leskovec, Stanford C246: Mining Massive atasets 2 Input features: N features: X 1, X 2, X N Each

More information

Induction of relational algebra expressions

Induction of relational algebra expressions Induction of relational algebra expressions Joris Gillis and Jan Van den Bussche Universiteit Hasselt and transnationale Universiteit Limburg 1 Introduction In the theory of database systems [1], a database

More information

K- Nearest Neighbors(KNN) And Predictive Accuracy

K- Nearest Neighbors(KNN) And Predictive Accuracy Contact: mailto: Ammar@cu.edu.eg Drammarcu@gmail.com K- Nearest Neighbors(KNN) And Predictive Accuracy Dr. Ammar Mohammed Associate Professor of Computer Science ISSR, Cairo University PhD of CS ( Uni.

More information

8. Tree-based approaches

8. Tree-based approaches Foundations of Machine Learning École Centrale Paris Fall 2015 8. Tree-based approaches Chloé-Agathe Azencott Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

More information

Mining Relational Databases with Multi-view Learning

Mining Relational Databases with Multi-view Learning Mining Relational Databases with Multi-view Learning Hongyu Guo School of Information Technology and Engineering, University of Ottawa 800 King Edward Road, Ottawa, Ontario, Canada, K1N 6N5 hguo028@site.uottawa.ca

More information

Graph-Based Concept Learning

Graph-Based Concept Learning Graph-Based Concept Learning Jesus A. Gonzalez, Lawrence B. Holder, and Diane J. Cook Department of Computer Science and Engineering University of Texas at Arlington Box 19015, Arlington, TX 76019-0015

More information

has to choose. Important questions are: which relations should be dened intensionally,

has to choose. Important questions are: which relations should be dened intensionally, Automated Design of Deductive Databases (Extended abstract) Hendrik Blockeel and Luc De Raedt Department of Computer Science, Katholieke Universiteit Leuven Celestijnenlaan 200A B-3001 Heverlee, Belgium

More information

Estimating the Quality of Databases

Estimating the Quality of Databases Estimating the Quality of Databases Ami Motro Igor Rakov George Mason University May 1998 1 Outline: 1. Introduction 2. Simple quality estimation 3. Refined quality estimation 4. Computing the quality

More information

Credit card Fraud Detection using Predictive Modeling: a Review

Credit card Fraud Detection using Predictive Modeling: a Review February 207 IJIRT Volume 3 Issue 9 ISSN: 2396002 Credit card Fraud Detection using Predictive Modeling: a Review Varre.Perantalu, K. BhargavKiran 2 PG Scholar, CSE, Vishnu Institute of Technology, Bhimavaram,

More information

A Genetic Algorithm-Based Approach for Building Accurate Decision Trees

A Genetic Algorithm-Based Approach for Building Accurate Decision Trees A Genetic Algorithm-Based Approach for Building Accurate Decision Trees by Z. Fu, Fannie Mae Bruce Golden, University of Maryland S. Lele,, University of Maryland S. Raghavan,, University of Maryland Edward

More information

A Comparative study of Clustering Algorithms using MapReduce in Hadoop

A Comparative study of Clustering Algorithms using MapReduce in Hadoop A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering

More information

Experiment databases: a novel methodology for experimental research

Experiment databases: a novel methodology for experimental research Experiment databases: a novel methodology for experimental research Hendrik Blockeel Katholieke Universiteit Leuven, Department of Computer Science Celestijnenlaan 200A, 3001 Leuven, Belgium HendrikBlockeel@cskuleuvenbe

More information

Structural Logistic Regression for Link Analysis

Structural Logistic Regression for Link Analysis University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science August 2003 Structural Logistic Regression for Link Analysis Alexandrin Popescul University

More information

Model Selection and Assessment

Model Selection and Assessment Model Selection and Assessment CS4780/5780 Machine Learning Fall 2014 Thorsten Joachims Cornell University Reading: Mitchell Chapter 5 Dietterich, T. G., (1998). Approximate Statistical Tests for Comparing

More information

Learning Relational Probability Trees Jennifer Neville David Jensen Lisa Friedland Michael Hay

Learning Relational Probability Trees Jennifer Neville David Jensen Lisa Friedland Michael Hay Learning Relational Probability Trees Jennifer Neville David Jensen Lisa Friedland Michael Hay Computer Science Department University of Massachusetts Amherst, MA 01003 USA [jneville jensen lfriedl mhay]@cs.umass.edu

More information

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before

More information

Decision Support Systems

Decision Support Systems Decision Support Systems 54 (2013) 1269 1279 Contents lists available at SciVerse ScienceDirect Decision Support Systems journal homepage: www.elsevier.com/locate/dss Simple decision forests for multi-relational

More information

Classification. Instructor: Wei Ding

Classification. Instructor: Wei Ding Classification Decision Tree Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Preliminaries Each data record is characterized by a tuple (x, y), where x is the attribute

More information

Beam Search Induction and Similarity Constraints for Predictive Clustering Trees

Beam Search Induction and Similarity Constraints for Predictive Clustering Trees Beam Search Induction and Similarity Constraints for Predictive Clustering Trees Dragi Kocev 1, Jan Struyf 2, and Sašo Džeroski 1 1 Dept. of Kwledge Techlogies, Jožef Stefan Institute Jamova 39, 1000 Ljubljana,

More information

Centrum voor Wiskunde en Informatica. Information Systems (INS)

Centrum voor Wiskunde en Informatica. Information Systems (INS) Centrum voor Wiskunde en Informatica! "# $ % &('( %!*) +,-.(!/0**1243 56 78 % Information Systems (INS) INS-R9908 May 31, 1999 Report INS-R9908 ISSN 1386-3681 CWI P.O. Box 94079 1090 GB Amsterdam The Netherlands!

More information

Learning Predictive Clustering Rules

Learning Predictive Clustering Rules Learning Predictive Clustering Rules Bernard Ženko1, Sašo Džeroski 1, and Jan Struyf 2 1 Department of Knowledge Technologies, Jožef Stefan Institute, Slovenia Bernard.Zenko@ijs.si, Saso.Dzeroski@ijs.si

More information

An Extended Transformation Approach to Inductive Logic Programming

An Extended Transformation Approach to Inductive Logic Programming An Extended Transformation Approach to Inductive Logic Programming NADA LAVRAČ Jožef Stefan Institute and PETER A. FLACH University of Bristol Inductive logic programming (ILP) is concerned with learning

More information

7. Boosting and Bagging Bagging

7. Boosting and Bagging Bagging Group Prof. Daniel Cremers 7. Boosting and Bagging Bagging Bagging So far: Boosting as an ensemble learning method, i.e.: a combination of (weak) learners A different way to combine classifiers is known

More information

A Method for Handling Numerical Attributes in GA-based Inductive Concept Learners

A Method for Handling Numerical Attributes in GA-based Inductive Concept Learners A Method for Handling Numerical Attributes in GA-based Inductive Concept Learners Federico Divina, Maarten Keijzer and Elena Marchiori Department of Computer Science Vrije Universiteit De Boelelaan 1081a,

More information

Supervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...

Supervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression... Supervised Learning Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression... Supervised Learning y=f(x): true function (usually not known) D: training

More information

Implementierungstechniken für Hauptspeicherdatenbanksysteme Classification: Decision Trees

Implementierungstechniken für Hauptspeicherdatenbanksysteme Classification: Decision Trees Implementierungstechniken für Hauptspeicherdatenbanksysteme Classification: Decision Trees Dominik Vinan February 6, 2018 Abstract Decision Trees are a well-known part of most modern Machine Learning toolboxes.

More information

Inductive Logic Programming (ILP)

Inductive Logic Programming (ILP) Inductive Logic Programming (ILP) Brad Morgan CS579 4-20-05 What is it? Inductive Logic programming is a machine learning technique which classifies data through the use of logic programs. ILP algorithms

More information

Constraint Satisfaction Problems

Constraint Satisfaction Problems Constraint Satisfaction Problems Search and Lookahead Bernhard Nebel, Julien Hué, and Stefan Wölfl Albert-Ludwigs-Universität Freiburg June 4/6, 2012 Nebel, Hué and Wölfl (Universität Freiburg) Constraint

More information

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,

More information

Exam Advanced Data Mining Date: Time:

Exam Advanced Data Mining Date: Time: Exam Advanced Data Mining Date: 11-11-2010 Time: 13.30-16.30 General Remarks 1. You are allowed to consult 1 A4 sheet with notes written on both sides. 2. Always show how you arrived at the result of your

More information

DATA MINING - 1DL105, 1DL111

DATA MINING - 1DL105, 1DL111 1 DATA MINING - 1DL105, 1DL111 Fall 2007 An introductory class in data mining http://user.it.uu.se/~udbl/dut-ht2007/ alt. http://www.it.uu.se/edu/course/homepage/infoutv/ht07 Kjell Orsborn Uppsala Database

More information

Extra readings beyond the lecture slides are important:

Extra readings beyond the lecture slides are important: 1 Notes To preview next lecture: Check the lecture notes, if slides are not available: http://web.cse.ohio-state.edu/~sun.397/courses/au2017/cse5243-new.html Check UIUC course on the same topic. All their

More information

Event Detection through Differential Pattern Mining in Internet of Things

Event Detection through Differential Pattern Mining in Internet of Things Event Detection through Differential Pattern Mining in Internet of Things Authors: Md Zakirul Alam Bhuiyan and Jie Wu IEEE MASS 2016 The 13th IEEE International Conference on Mobile Ad hoc and Sensor Systems

More information

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Md Nasim Adnan and Md Zahidul Islam Centre for Research in Complex Systems (CRiCS)

More information

Study on Classifiers using Genetic Algorithm and Class based Rules Generation

Study on Classifiers using Genetic Algorithm and Class based Rules Generation 2012 International Conference on Software and Computer Applications (ICSCA 2012) IPCSIT vol. 41 (2012) (2012) IACSIT Press, Singapore Study on Classifiers using Genetic Algorithm and Class based Rules

More information

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Exploratory Machine Learning studies for disruption prediction on DIII-D by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Presented at the 2 nd IAEA Technical Meeting on

More information

An introduction to random forests

An introduction to random forests An introduction to random forests Eric Debreuve / Team Morpheme Institutions: University Nice Sophia Antipolis / CNRS / Inria Labs: I3S / Inria CRI SA-M / ibv Outline Machine learning Decision tree Random

More information

Improving Tree-Based Classification Rules Using a Particle Swarm Optimization

Improving Tree-Based Classification Rules Using a Particle Swarm Optimization Improving Tree-Based Classification Rules Using a Particle Swarm Optimization Chi-Hyuck Jun *, Yun-Ju Cho, and Hyeseon Lee Department of Industrial and Management Engineering Pohang University of Science

More information

3 Virtual attribute subsetting

3 Virtual attribute subsetting 3 Virtual attribute subsetting Portions of this chapter were previously presented at the 19 th Australian Joint Conference on Artificial Intelligence (Horton et al., 2006). Virtual attribute subsetting

More information

Ensemble Methods, Decision Trees

Ensemble Methods, Decision Trees CS 1675: Intro to Machine Learning Ensemble Methods, Decision Trees Prof. Adriana Kovashka University of Pittsburgh November 13, 2018 Plan for This Lecture Ensemble methods: introduction Boosting Algorithm

More information

Cse352 Artifficial Intelligence Short Review for Midterm. Professor Anita Wasilewska Computer Science Department Stony Brook University

Cse352 Artifficial Intelligence Short Review for Midterm. Professor Anita Wasilewska Computer Science Department Stony Brook University Cse352 Artifficial Intelligence Short Review for Midterm Professor Anita Wasilewska Computer Science Department Stony Brook University Midterm Midterm INCLUDES CLASSIFICATION CLASSIFOCATION by Decision

More information

Random Forests for Big Data

Random Forests for Big Data Random Forests for Big Data R. Genuer a, J.-M. Poggi b, C. Tuleau-Malot c, N. Villa-Vialaneix d a Bordeaux University c Nice University b Orsay University d INRA Toulouse October 27, 2017 CNAM, Paris Outline

More information

Blockeel, Dehaspe, Demoen, Janssens, Ramon, & Vandecasteele Most ILP systems build a hypothesis one clause at a time. This search for a single clause

Blockeel, Dehaspe, Demoen, Janssens, Ramon, & Vandecasteele Most ILP systems build a hypothesis one clause at a time. This search for a single clause Journal of Artificial Intelligence Research 16 (2002) 135-166 Submitted 7/01; published 2/02 Improving the Efficiency of Inductive Logic Programming Through the Use of Query Packs Hendrik Blockeel hendrik.blockeel@cs.kuleuven.ac.be

More information