Aggregation and Selection in Relational Data Mining
|
|
- Clementine Scott
- 5 years ago
- Views:
Transcription
1 in Relational Data Mining Celine Vens Anneleen Van Assche Hendrik Blockeel Sašo Džeroski Department of Computer Science - K.U.Leuven Department of Knowledge Technologies - Jozef Stefan Institute, Slovenia C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
2 Outline Introduction C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
3 Relational Data Mining Data Mining: searching for patterns in (large) databases. Propositional (Classical) Data Mining: data is stored in single table patterns involve intra-tuple relations Relational Data Mining: data is stored in multiple tables (relational database) patterns involve inter-tuple or inter-table relations how to deal with 1-n or m-n relations (sets)? C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
4 Working Example Current relational learners : 2 approaches to dealing with sets C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
5 Outline Introduction C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
6 First approach: Aggregation Use SQL-like aggregation to summarize set in one big table Apply classical data mining technique (e.g. decision tree inducer) Optimized for highly non-determinate domains C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
7 Second approach: Selection Apply relational data mining technique (e.g. relational decision tree inducer) Test for existence of specific elements in the set Optimized for structurally complex domains e.g. ILP: Inductive Logic Programming database and patterns in Prolog possibility to add background knowledge C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
8 Example concepts 1. Persons that have two books. 2. Persons that have a computer book. 3. Persons that have two computer books. How to express concept 3?? Selective methods need aggregate function in background knowledge. Aggregating methods need separate relation for each genre. Solution: combine aggregation and selection in context of relational data mining C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
9 Decision Trees Combining selection and aggregation Outline Introduction Decision Trees Combining C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
10 Decision Trees Combining selection and aggregation Decision Trees One of the most widely used and practical data mining methods Each internal node contains a test on some attribute Each leaf contains a prediction Classification of new instance C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
11 Decision Trees Combining selection and aggregation Decision Trees: learning them Divide & conquer algorithm Pseudocode: grow node(node,examples): IF stopcriterium: assign majority class from Examples to Node ELSE generate all possible tests for Node associate best test with Node grow two childnodes Left and Right split Examples into ExamplesPass and ExamplesFail grow node(left,examplespass) grow node(right,examplesfail) C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
12 Decision Trees Combining selection and aggregation : learning them Upgrade of classical algorithm: Tilde [Blockeel and De Raedt 98] Trees are relational: contain first order logic literals in test of internal node Selective approach (ILP) Tests can introduce variables : possible tests may differ at each node C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
13 Decision Trees Combining selection and aggregation Adding aggregation User specifies basic components: aggregate functions, sets to be aggregated, query to generate set to be aggregated Aggregate conditions are created, using discretization Aggregate conditions are added to the set of possible tests C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
14 Decision Trees Combining selection and aggregation Adding selections to aggregation: first manner If a node contains an aggregation, any node in its left subtree can add a selection within that aggregate condition Local search within aggregate condition C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
15 Decision Trees Combining selection and aggregation Adding selections to aggregation: second manner Lookahead technique to look ahead in refinement lattice add several literals at once computationally expensive C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
16 Decision Trees Combining selection and aggregation with aggregation and selection: learning them Pseudocode: grow node(node,examples): IF stopcriterium: assign majority class from Examples to Node ELSE generate all possible first order tests for Node: usual tests aggregate functions refinement of aggregate function higher in tree associate best test with Node grow two childnodes Left and Right split Examples into ExamplesPass and ExamplesFail grow node(left,examplespass) grow node(right,examplesfail) C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
17 Decision Trees Combining selection and aggregation with aggregation and selection: problem Number of tests at each node in the tree grows very fast Need some way to deal with it Make use of technique from classical data mining: Random Forests [Breiman 01] C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
18 Random Forests Outline Introduction Random Forests C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
19 Random Forests Random Forests Random Forests C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
20 Random Forests Random Forests Random Decision Tree Algorithm T = f ( T ) with e.g. f (x) = 0.1x or f (x) = x C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
21 Random Forests Random forests with our relational decision tree algorithm. Pseudocode: grow node(node,examples,probability): IF stopcriterium: assign majority class from Examples to Node ELSE generate all possible first order tests for Node: usual tests aggregate functions refinement of aggregate function higher in tree select random subset from possible tests using Probability associate best test out of random subset with Node grow two childnodes Left and Right split Examples into ExamplesPass and ExamplesFail grow node(left,examplespass,probability) grow node(right,examplesfail,probability) C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
22 Real world data Artificial data Outline Introduction Real world data Artificial data C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
23 Real world data Artificial data Experimental Setup: Real world data Average over 5 times 5-fold cross-validation Different parameters: number of trees: 3, 11, 33 proportion of feature sample: 100%, 75%, 50%, 25%, 10%, sqrt level of aggregates: No Aggregates (NA), Simple Aggregates (SA), Refined Aggregates (RA), Lookahead Aggregates (LA) C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
24 Real world data Artificial data : Real world data The effect of aggregates and the number of trees (P = 0.25) C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
25 Real world data Artificial data : Real world data The effect of the number of features (e.g. Mutagenesis) FORF (33 trees) P LA RA SA NA sqrt Tilde NA C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
26 Real world data Artificial data : Real world data Compared to other systems (FORF-SA uses 33 trees and 25% of the features) Financial FORF-SA DINUS-C RELAGGS PROGOL (0.005) (0.103) (0.065) (0.071) Diterpenes FORF-SA FOIL IBL-matchings ICL (0.006) (0.011) (0.006) (0.009) C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
27 Real world data Artificial data : Artificial data Summary of experimental results so far: Positive effect of random forest Positive effect of adding (simple) aggregates Effect of combination of aggregates and selection? Artificial dataset C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
28 Real world data Artificial data : Artificial data Datagenerator for east-/ westbound trains. 800 trains, 400 in each direction Target concept: C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
29 Real world data Artificial data : Artificial data Results (P = 0.25, number of trees = 33) Accuracy LA RA SA NA Avg number of nodes in a tree LA RA SA NA Average induction time LA RA SA NA C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
30 Outline Introduction C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
31 First order random forest induction algorithm based on Tilde Feature space enlarged by including aggregates Refinement operator adjusted to include selection conditions within the aggregates Strength was experimentally shown C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
32 Acknowledgements and References Acknowledgements: Maurice Bruynooghe Full Paper: C. Vens, A. Van Assche, H. Blockeel, and S. Dzeroski, First Order Random Forests with Complex Aggregates, Proceedings of the 14th International Conference on Inductive Logic Programming (ILP-2004), Porto, Portugal, 2004 C. Vens, A. Van Assche, H. Blockeel, S.Džeroski in Relational Data Mining
First order random forests: Learning relational classifiers with complex aggregates
Mach Learn (2006) 64:149 182 DOI 10.1007/s10994-006-8713-9 First order random forests: Learning relational classifiers with complex aggregates Anneleen Van Assche Celine Vens Hendrik Blockeel Sašo Džeroski
More informationClassifying Relational Data with Neural Networks
Classifying Relational Data with Neural Networks Werner Uwents and Hendrik Blockeel Katholieke Universiteit Leuven, Department of Computer Science, Celestijnenlaan 200A, B-3001 Leuven {werner.uwents, hendrik.blockeel}@cs.kuleuven.be
More informationStochastic propositionalization of relational data using aggregates
Stochastic propositionalization of relational data using aggregates Valentin Gjorgjioski and Sašo Dzeroski Jožef Stefan Institute Abstract. The fact that data is already stored in relational databases
More informationReMauve: A Relational Model Tree Learner
ReMauve: A Relational Model Tree Learner Celine Vens, Jan Ramon, and Hendrik Blockeel Katholieke Universiteit Leuven - Department of Computer Science, Celestijnenlaan 200 A, 3001 Leuven, Belgium {celine.vens,
More informationMulti-relational Decision Tree Induction
Multi-relational Decision Tree Induction Arno J. Knobbe 1,2, Arno Siebes 2, Daniël van der Wallen 1 1 Syllogic B.V., Hoefseweg 1, 3821 AE, Amersfoort, The Netherlands, {a.knobbe, d.van.der.wallen}@syllogic.com
More informationConstraint Based Induction of Multi-Objective Regression Trees
Constraint Based Induction of Multi-Objective Regression Trees Jan Struyf 1 and Sašo Džeroski 2 1 Katholieke Universiteit Leuven, Dept. of Computer Science Celestijnenlaan 200A, B-3001 Leuven, Belgium
More informationThe ACE Data Mining System User s Manual. H. Blockeel L. Dehaspe J. Ramon J. Struyf A. Van Assche C. Vens D. Fierens
The ACE Data Mining System User s Manual H. Blockeel L. Dehaspe J. Ramon J. Struyf A. Van Assche C. Vens D. Fierens March 9, 2009 2 Contents 1 Introduction 7 2 Installing and Running ACE 9 2.1 Using ACE
More informationarxiv:cs/ v1 [cs.lg] 29 Nov 2000
arxiv:cs/0011044v1 [cs.lg] 29 Nov 2000 Scaling Up Inductive Logic Programming by Learning From Interpretations Hendrik Blockeel Luc De Raedt Nico Jacobs Bart Demoen Report CW 297, August 2000 Ò Katholieke
More informationEfficient Multi-relational Classification by Tuple ID Propagation
Efficient Multi-relational Classification by Tuple ID Propagation Xiaoxin Yin, Jiawei Han and Jiong Yang Department of Computer Science University of Illinois at Urbana-Champaign {xyin1, hanj, jioyang}@uiuc.edu
More informationLearning Directed Probabilistic Logical Models using Ordering-search
Learning Directed Probabilistic Logical Models using Ordering-search Daan Fierens, Jan Ramon, Maurice Bruynooghe, and Hendrik Blockeel K.U.Leuven, Dept. of Computer Science, Celestijnenlaan 200A, 3001
More informationAn Efficient Approximation to Lookahead in Relational Learners
An Efficient Approximation to Lookahead in Relational Learners Jan Struyf 1, Jesse Davis 2, and David Page 2 1 Katholieke Universiteit Leuven, Dept. of Computer Science Celestijnenlaan 200A, 3001 Leuven,
More informationExperiments with MRDTL -- A Multi-relational Decision Tree Learning Algorithm
Experiments with MRDTL -- A Multi-relational Decision Tree Learning Algorithm Hector Leiva, Anna Atramentov and Vasant Honavar 1 Artificial Intelligence Laboratory Department of Computer Science and Graduate
More informationThe role of feature construction in inductive rule learning
The role of feature construction in inductive rule learning Peter A. Flach 1 and Nada Lavrač 2 1 Department of Computer Science University of Bristol, United Kingdom 2 Department of Intelligent Systems
More informationData Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification
Data Mining 3.3 Fall 2008 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rules With Exceptions Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.4. Spring 2010 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms
More informationMulti-relational Data Mining, Using UML for ILP
Multi-relational Data Mining, Using UML for ILP Arno J. Knobbe 1,2, Arno Siebes 2, Hendrik Blockeel 3, and Daniël Van Der Wallen 4 1 Perot Systems Nederland B.V., Hoefseweg 1, 3821 AE Amersfoort, The Netherlands,
More informationInternational Journal of Software and Web Sciences (IJSWS)
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International
More informationMRDTL: A multi-relational decision tree learning algorithm. Héctor Ariel Leiva. A thesis submitted to the graduate faculty
MRDTL: A multi-relational decision tree learning algorithm by Héctor Ariel Leiva A thesis submitted to the graduate faculty in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE
More informationSimple Decision Forests for Multi-Relational Classification
Simple Decision Forests for Multi-Relational Classification Bahareh Bina, Oliver Schulte, Branden Crawford, Zhensong Qian, Yi Xiong School of Computing Science, Simon Fraser University, Burnaby, B.C.,
More informationMulti-relational decision tree algorithm - implementation and experiments. Anna Atramentov. A thesis submitted to the graduate faculty
Multi-relational decision tree algorithm - implementation and experiments by Anna Atramentov A thesis submitted to the graduate faculty in partial fulfillment of the requirements for the degree of MASTER
More informationDecomposition of the output space in multi-label classification using feature ranking
Decomposition of the output space in multi-label classification using feature ranking Stevanche Nikoloski 2,3, Dragi Kocev 1,2, and Sašo Džeroski 1,2 1 Department of Knowledge Technologies, Jožef Stefan
More informationRelational Knowledge Discovery in Databases Heverlee. fhendrik.blockeel,
Relational Knowledge Discovery in Databases Hendrik Blockeel and Luc De Raedt Katholieke Universiteit Leuven Department of Computer Science Celestijnenlaan 200A 3001 Heverlee e-mail: fhendrik.blockeel,
More informationMacro-operators in Multirelational Learning: a Search-Space Reduction Technique
In Proceedings of ECML 02. LNAI. c Springer-Verlag. Macro-operators in Multirelational Learning: a Search-Space Reduction Technique Lourdes Peña Castillo 1 and Stefan Wrobel 23 1 Otto-von-Guericke-University
More informationINDEPENDENCE ASSUMPTIONS FOR MULTI-RELATIONAL CLASSIFICATION
INDEPENDENCE ASSUMPTIONS FOR MULTI-RELATIONAL CLASSIFICATION by Bahareh Bina BEng, University of Tehran, 2007 a Thesis submitted in partial fulfillment of the requirements for the degree of Master of Science
More informationA Multi-Relational Decision Tree Learning Algorithm - Implementation and Experiments
A Multi-Relational Decision Tree Learning Algorithm - Implementation and Experiments Anna Atramentov, Hector Leiva and Vasant Honavar Artificial Intelligence Research Laboratory, Computer Science Department
More informationWhat is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.
What is Learning? CS 343: Artificial Intelligence Machine Learning Herbert Simon: Learning is any process by which a system improves performance from experience. What is the task? Classification Problem
More informationData Mining Practical Machine Learning Tools and Techniques
Decision trees Extending previous approach: Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank to permit numeric s: straightforward
More informationMr-SBC: A Multi-relational Naïve Bayes Classifier
Mr-SBC: A Multi-relational Naïve Bayes Classifier Michelangelo Ceci, Annalisa Appice, and Donato Malerba Dipartimento di Informatica, Università degli Studi via Orabona, 4-70126 Bari - Italy _GIGMETTMGIQEPIVFEa$HMYRMFEMX
More informationLearning Rules. How to use rules? Known methods to learn rules: Comments: 2.1 Learning association rules: General idea
2. Learning Rules Rule: cond è concl where } cond is a conjunction of predicates (that themselves can be either simple or complex) and } concl is an action (or action sequence) like adding particular knowledge
More informationMulti-Relational Data Mining
Multi-Relational Data Mining Outline [Dzeroski 2003] [Dzeroski & De Raedt 2003] Introduction Inductive Logic Programmming (ILP) Relational Association Rules Relational Decision Trees Relational Distance-Based
More informationA Clustering Approach to Generalized Pattern Identification Based on Multi-instanced Objects with DARA
A Clustering Approach to Generalized Pattern Identification Based on Multi-instanced Objects with DARA Rayner Alfred 1,2, Dimitar Kazakov 1 1 University of York, Computer Science Department, Heslington,
More informationContents. 1. Introduction Ripple-down Rules Relational Rules HENRY and ABE...
Abstract The modular nature of rules learned by most inductive machine learning algorithms makes them difficult and costly to maintain when the knowledge they are based on changes. Ripple-down rules, a
More informationBig Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1
Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that
More informationData Mining Lecture 8: Decision Trees
Data Mining Lecture 8: Decision Trees Jo Houghton ECS Southampton March 8, 2019 1 / 30 Decision Trees - Introduction A decision tree is like a flow chart. E. g. I need to buy a new car Can I afford it?
More informationSymbolic AI. Andre Freitas. Photo by Vasilyev Alexandr
Symbolic AI Andre Freitas Photo by Vasilyev Alexandr Acknowledgements These slides were based on the slides of: Peter A. Flach, Rule induction tutorial, IDA Spring School 2001. Anoop & Hector, Inductive
More informationAn interface for Text Mining with the RAP system
FEUP - Faculty of Engineering of the University of Porto MIEEC - Integrated Master of Electrical and Computers Engineering Dissertation preparation Final Report An interface for Text Mining with the RAP
More informationTrends in Multi-Relational Data Mining Methods
IJCSNS International Journal of Computer Science and Network Security, VOL.15 No.8, August 2015 101 Trends in Multi-Relational Data Mining Methods CH.Madhusudhan, K. Mrithyunjaya Rao Research Scholar,
More informationCrossMine: Efficient Classification Across Multiple Database Relations
CrossMine: Efficient Classification Across Multiple Database Relations Xiaoxin Yin UIUC xyin1@uiuc.edu Jiawei Han UIUC hanj@cs.uiuc.edu Jiong Yang UIUC jioyang@cs.uiuc.edu Philip S. Yu IBM T. J. Watson
More informationLearning Classification Rules for Multiple Target Attributes
Learning Classification Rules for Multiple Target Attributes Bernard Ženko and Sašo Džeroski Department of Knowledge Technologies, Jožef Stefan Institute Jamova cesta 39, SI-1000 Ljubljana, Slovenia, Bernard.Zenko@ijs.si,
More informationData Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners
Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager
More informationInduction of Relational Algebra Expressions
Induction of Relational Algebra Expressions Joris J.M. Gillis and Jan Van den Bussche Universiteit Hasselt and transnationale Universiteit Limburg 1 Introduction In the theory of database systems [1],
More informationA FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM
A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM Akshay S. Agrawal 1, Prof. Sachin Bojewar 2 1 P.G. Scholar, Department of Computer Engg., ARMIET, Sapgaon, (India) 2 Associate Professor, VIT,
More informationA Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995)
A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995) Department of Information, Operations and Management Sciences Stern School of Business, NYU padamopo@stern.nyu.edu
More informationMulti-relational data mining in Microsoft SQL Server 2005
Data Mining VII: Data, Text and Web Mining and their Business Applications 151 Multi-relational data mining in Microsoft SQL Server 005 C. L. Curotto 1 & N. F. F. Ebecken & H. Blockeel 1 PPGCC/UFPR, Universidade
More informationIdentifying non-redundant literals in clauses with uniqueness propagation
Identifying non-redundant literals in clauses with uniqueness propagation Hendrik Blockeel Department of Computer Science, KU Leuven Abstract. Several authors have proposed increasingly efficient methods
More informationA MODULAR APPROACH TO RELATIONAL DATA MINING
Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2002 Proceedings Americas Conference on Information Systems (AMCIS) December 2002 A MODULAR APPROACH TO RELATIONAL DATA MINING Claudia
More informationView Learning Extended: Inventing New Tables for Statistical Relational Learning
View Learning Extended: Inventing New Tables for Statistical Relational Learning Jesse Davis jdavis@cs.wisc.edu Department of Computer Science, University of Wisconsin, Madison, WI 53705 USA Elizabeth
More informationInductive Logic Programming in Clementine
Inductive Logic Programming in Clementine Sam Brewer 1 and Tom Khabaza 2 Advanced Data Mining Group, SPSS (UK) Ltd 1st Floor, St. Andrew s House, West Street Woking, Surrey GU21 1EB, UK 1 sbrewer@spss.com,
More informationDRILA: A Distributed Relational Inductive Learning Algorithm
DRILA: A Distributed Relational Inductive Learning Algorithm SALEH M. ABU-SOUD Computer Science Department New York Institute of Technology Amman Campus P.O. Box (1202), Amman, 11941 JORDAN sabusoud@nyit.edu
More informationLearning Horn Expressions with LOGAN-H
Journal of Machine Learning Research 8 (2007) 549-587 Submitted 12/05; Revised 8/06; Published 3/07 Learning Horn Expressions with LOGAN-H Marta Arias Center for Computational Learning Systems Columbia
More informationClassification and Regression Trees
Classification and Regression Trees David S. Rosenberg New York University April 3, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 April 3, 2018 1 / 51 Contents 1 Trees 2 Regression
More informationRelational Learning. Jan Struyf and Hendrik Blockeel
Relational Learning Jan Struyf and Hendrik Blockeel Dept. of Computer Science, Katholieke Universiteit Leuven Celestijnenlaan 200A, 3001 Leuven, Belgium 1 Problem definition Relational learning refers
More informationA Two-level Learning Method for Generalized Multi-instance Problems
A wo-level Learning Method for Generalized Multi-instance Problems Nils Weidmann 1,2, Eibe Frank 2, and Bernhard Pfahringer 2 1 Department of Computer Science University of Freiburg Freiburg, Germany weidmann@informatik.uni-freiburg.de
More informationCombining Gradient Boosting Machines with Collective Inference to Predict Continuous Values
Combining Gradient Boosting Machines with Collective Inference to Predict Continuous Values Iman Alodah Computer Science Department Purdue University West Lafayette, Indiana 47906 Email: ialodah@purdue.edu
More informationData Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier
Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio
More informationClassifier Inspired Scaling for Training Set Selection
Classifier Inspired Scaling for Training Set Selection Walter Bennette DISTRIBUTION A: Approved for public release: distribution unlimited: 16 May 2016. Case #88ABW-2016-2511 Outline Instance-based classification
More informationComparative Evaluation of Approaches to Propositionalization
Comparative Evaluation of Approaches to Propositionalization M.-A. Krogel, S. Rawles, F. Železný, P. A. Flach, N. Lavrač, and S. Wrobel Otto-von-Guericke-Universität, Magdeburg, Germany, mark.krogel@iws.cs.uni-magdeburg.de
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University.
CS246: Mining Massive atasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/23/2011 Jure Leskovec, Stanford C246: Mining Massive atasets 2 Input features: N features: X 1, X 2, X N Each
More informationInduction of relational algebra expressions
Induction of relational algebra expressions Joris Gillis and Jan Van den Bussche Universiteit Hasselt and transnationale Universiteit Limburg 1 Introduction In the theory of database systems [1], a database
More informationK- Nearest Neighbors(KNN) And Predictive Accuracy
Contact: mailto: Ammar@cu.edu.eg Drammarcu@gmail.com K- Nearest Neighbors(KNN) And Predictive Accuracy Dr. Ammar Mohammed Associate Professor of Computer Science ISSR, Cairo University PhD of CS ( Uni.
More information8. Tree-based approaches
Foundations of Machine Learning École Centrale Paris Fall 2015 8. Tree-based approaches Chloé-Agathe Azencott Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr
More informationMining Relational Databases with Multi-view Learning
Mining Relational Databases with Multi-view Learning Hongyu Guo School of Information Technology and Engineering, University of Ottawa 800 King Edward Road, Ottawa, Ontario, Canada, K1N 6N5 hguo028@site.uottawa.ca
More informationGraph-Based Concept Learning
Graph-Based Concept Learning Jesus A. Gonzalez, Lawrence B. Holder, and Diane J. Cook Department of Computer Science and Engineering University of Texas at Arlington Box 19015, Arlington, TX 76019-0015
More informationhas to choose. Important questions are: which relations should be dened intensionally,
Automated Design of Deductive Databases (Extended abstract) Hendrik Blockeel and Luc De Raedt Department of Computer Science, Katholieke Universiteit Leuven Celestijnenlaan 200A B-3001 Heverlee, Belgium
More informationEstimating the Quality of Databases
Estimating the Quality of Databases Ami Motro Igor Rakov George Mason University May 1998 1 Outline: 1. Introduction 2. Simple quality estimation 3. Refined quality estimation 4. Computing the quality
More informationCredit card Fraud Detection using Predictive Modeling: a Review
February 207 IJIRT Volume 3 Issue 9 ISSN: 2396002 Credit card Fraud Detection using Predictive Modeling: a Review Varre.Perantalu, K. BhargavKiran 2 PG Scholar, CSE, Vishnu Institute of Technology, Bhimavaram,
More informationA Genetic Algorithm-Based Approach for Building Accurate Decision Trees
A Genetic Algorithm-Based Approach for Building Accurate Decision Trees by Z. Fu, Fannie Mae Bruce Golden, University of Maryland S. Lele,, University of Maryland S. Raghavan,, University of Maryland Edward
More informationA Comparative study of Clustering Algorithms using MapReduce in Hadoop
A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering
More informationExperiment databases: a novel methodology for experimental research
Experiment databases: a novel methodology for experimental research Hendrik Blockeel Katholieke Universiteit Leuven, Department of Computer Science Celestijnenlaan 200A, 3001 Leuven, Belgium HendrikBlockeel@cskuleuvenbe
More informationStructural Logistic Regression for Link Analysis
University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science August 2003 Structural Logistic Regression for Link Analysis Alexandrin Popescul University
More informationModel Selection and Assessment
Model Selection and Assessment CS4780/5780 Machine Learning Fall 2014 Thorsten Joachims Cornell University Reading: Mitchell Chapter 5 Dietterich, T. G., (1998). Approximate Statistical Tests for Comparing
More informationLearning Relational Probability Trees Jennifer Neville David Jensen Lisa Friedland Michael Hay
Learning Relational Probability Trees Jennifer Neville David Jensen Lisa Friedland Michael Hay Computer Science Department University of Massachusetts Amherst, MA 01003 USA [jneville jensen lfriedl mhay]@cs.umass.edu
More informationCse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University
Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before
More informationDecision Support Systems
Decision Support Systems 54 (2013) 1269 1279 Contents lists available at SciVerse ScienceDirect Decision Support Systems journal homepage: www.elsevier.com/locate/dss Simple decision forests for multi-relational
More informationClassification. Instructor: Wei Ding
Classification Decision Tree Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Preliminaries Each data record is characterized by a tuple (x, y), where x is the attribute
More informationBeam Search Induction and Similarity Constraints for Predictive Clustering Trees
Beam Search Induction and Similarity Constraints for Predictive Clustering Trees Dragi Kocev 1, Jan Struyf 2, and Sašo Džeroski 1 1 Dept. of Kwledge Techlogies, Jožef Stefan Institute Jamova 39, 1000 Ljubljana,
More informationCentrum voor Wiskunde en Informatica. Information Systems (INS)
Centrum voor Wiskunde en Informatica! "# $ % &('( %!*) +,-.(!/0**1243 56 78 % Information Systems (INS) INS-R9908 May 31, 1999 Report INS-R9908 ISSN 1386-3681 CWI P.O. Box 94079 1090 GB Amsterdam The Netherlands!
More informationLearning Predictive Clustering Rules
Learning Predictive Clustering Rules Bernard Ženko1, Sašo Džeroski 1, and Jan Struyf 2 1 Department of Knowledge Technologies, Jožef Stefan Institute, Slovenia Bernard.Zenko@ijs.si, Saso.Dzeroski@ijs.si
More informationAn Extended Transformation Approach to Inductive Logic Programming
An Extended Transformation Approach to Inductive Logic Programming NADA LAVRAČ Jožef Stefan Institute and PETER A. FLACH University of Bristol Inductive logic programming (ILP) is concerned with learning
More information7. Boosting and Bagging Bagging
Group Prof. Daniel Cremers 7. Boosting and Bagging Bagging Bagging So far: Boosting as an ensemble learning method, i.e.: a combination of (weak) learners A different way to combine classifiers is known
More informationA Method for Handling Numerical Attributes in GA-based Inductive Concept Learners
A Method for Handling Numerical Attributes in GA-based Inductive Concept Learners Federico Divina, Maarten Keijzer and Elena Marchiori Department of Computer Science Vrije Universiteit De Boelelaan 1081a,
More informationSupervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...
Supervised Learning Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression... Supervised Learning y=f(x): true function (usually not known) D: training
More informationImplementierungstechniken für Hauptspeicherdatenbanksysteme Classification: Decision Trees
Implementierungstechniken für Hauptspeicherdatenbanksysteme Classification: Decision Trees Dominik Vinan February 6, 2018 Abstract Decision Trees are a well-known part of most modern Machine Learning toolboxes.
More informationInductive Logic Programming (ILP)
Inductive Logic Programming (ILP) Brad Morgan CS579 4-20-05 What is it? Inductive Logic programming is a machine learning technique which classifies data through the use of logic programs. ILP algorithms
More informationConstraint Satisfaction Problems
Constraint Satisfaction Problems Search and Lookahead Bernhard Nebel, Julien Hué, and Stefan Wölfl Albert-Ludwigs-Universität Freiburg June 4/6, 2012 Nebel, Hué and Wölfl (Universität Freiburg) Constraint
More informationEstimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees
Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,
More informationExam Advanced Data Mining Date: Time:
Exam Advanced Data Mining Date: 11-11-2010 Time: 13.30-16.30 General Remarks 1. You are allowed to consult 1 A4 sheet with notes written on both sides. 2. Always show how you arrived at the result of your
More informationDATA MINING - 1DL105, 1DL111
1 DATA MINING - 1DL105, 1DL111 Fall 2007 An introductory class in data mining http://user.it.uu.se/~udbl/dut-ht2007/ alt. http://www.it.uu.se/edu/course/homepage/infoutv/ht07 Kjell Orsborn Uppsala Database
More informationExtra readings beyond the lecture slides are important:
1 Notes To preview next lecture: Check the lecture notes, if slides are not available: http://web.cse.ohio-state.edu/~sun.397/courses/au2017/cse5243-new.html Check UIUC course on the same topic. All their
More informationEvent Detection through Differential Pattern Mining in Internet of Things
Event Detection through Differential Pattern Mining in Internet of Things Authors: Md Zakirul Alam Bhuiyan and Jie Wu IEEE MASS 2016 The 13th IEEE International Conference on Mobile Ad hoc and Sensor Systems
More informationImproving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets
Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Md Nasim Adnan and Md Zahidul Islam Centre for Research in Complex Systems (CRiCS)
More informationStudy on Classifiers using Genetic Algorithm and Class based Rules Generation
2012 International Conference on Software and Computer Applications (ICSCA 2012) IPCSIT vol. 41 (2012) (2012) IACSIT Press, Singapore Study on Classifiers using Genetic Algorithm and Class based Rules
More informationMIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA
Exploratory Machine Learning studies for disruption prediction on DIII-D by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Presented at the 2 nd IAEA Technical Meeting on
More informationAn introduction to random forests
An introduction to random forests Eric Debreuve / Team Morpheme Institutions: University Nice Sophia Antipolis / CNRS / Inria Labs: I3S / Inria CRI SA-M / ibv Outline Machine learning Decision tree Random
More informationImproving Tree-Based Classification Rules Using a Particle Swarm Optimization
Improving Tree-Based Classification Rules Using a Particle Swarm Optimization Chi-Hyuck Jun *, Yun-Ju Cho, and Hyeseon Lee Department of Industrial and Management Engineering Pohang University of Science
More information3 Virtual attribute subsetting
3 Virtual attribute subsetting Portions of this chapter were previously presented at the 19 th Australian Joint Conference on Artificial Intelligence (Horton et al., 2006). Virtual attribute subsetting
More informationEnsemble Methods, Decision Trees
CS 1675: Intro to Machine Learning Ensemble Methods, Decision Trees Prof. Adriana Kovashka University of Pittsburgh November 13, 2018 Plan for This Lecture Ensemble methods: introduction Boosting Algorithm
More informationCse352 Artifficial Intelligence Short Review for Midterm. Professor Anita Wasilewska Computer Science Department Stony Brook University
Cse352 Artifficial Intelligence Short Review for Midterm Professor Anita Wasilewska Computer Science Department Stony Brook University Midterm Midterm INCLUDES CLASSIFICATION CLASSIFOCATION by Decision
More informationRandom Forests for Big Data
Random Forests for Big Data R. Genuer a, J.-M. Poggi b, C. Tuleau-Malot c, N. Villa-Vialaneix d a Bordeaux University c Nice University b Orsay University d INRA Toulouse October 27, 2017 CNAM, Paris Outline
More informationBlockeel, Dehaspe, Demoen, Janssens, Ramon, & Vandecasteele Most ILP systems build a hypothesis one clause at a time. This search for a single clause
Journal of Artificial Intelligence Research 16 (2002) 135-166 Submitted 7/01; published 2/02 Improving the Efficiency of Inductive Logic Programming Through the Use of Query Packs Hendrik Blockeel hendrik.blockeel@cs.kuleuven.ac.be
More information