Machine Learning via Decision Trees: C4.5
|
|
- Jack Hall
- 6 years ago
- Views:
Transcription
1 Machine Learning via Decision Trees: C4.5
2 C4.5: Algorithms for Machine Learning Main task: learning Decision Trees from data The so'ware development has ended (in favor of C5.0, which is commercial), but s<ll one of the reference algorithms for the considered task. Last release: 8.0 C ImplementaBon, for Unix systems: available at: hfp:// C4.5 Tutorial available at: hfp://www2.cs.uregina.ca/~dbd/ cs831/notes/ml/dtrees/c4.5/tutorial.html The man page for the C4.5 so'ware is available on the course web site.
3 Building the Tree: Base Algorithm Base algorithm (same as in the CLS system) T: training set {C 1,C 2,...,C k }: set of all classes; Consider the T set: if T contains examples all in the same class then build a single leaf, with such class as label if T contains examples of several classes then Build a partition of T based on a test on the value of a particular attributes. Build a nome associate to the test, with one child for each subset in the partition Recursively call the algorithm on each subset
4 Stop condi<ons Actually, C4.5 stops if: T contains examples of a single class (main stop condi<on) Yields a single leaf, labeled with the class T is empty Yields a single leaf, labeled with the most frequent class in the parent node/set No test can generate at least two sets with a minimum of 2*MINOBJ examples Yields a single leaf, labeled with the most frequent class (some examples in the corresponding set will be misclassified) Other condibons (omifed for sake of simplicity)
5 Choosing the APribute for the Test Entropy for a set S of examples: kx freq(c j,s) info(s) = S j=1 Entropy of a par<<on P= (T 0, T 1,...) of a set T: nx T i info P (T )= T info(t i) i=1 Gain of a par<<on P= (T 0, T 1,...) of a set T: gain(p) = info(t) - info P (T) log 2 freq(cj,s) S Spli<nfo: splitinfo P (T )= kx j=1 T i T log 2 Ti T
6 Choosing the APribute for the Test Criterion to choose the apribute for the test: info P (T ) info(t ) splitinfo P (T ) Dividing the gain by splibnfo avoids branching on afributes with many possible values (high risk of overfiwng) This differs from the behavior you have seen before: if you want to replicate the results, you will need to force C4.5 to use the unmodified gain for choosing the afribute
7 Excercise 1 Download the golf dataset from the course web site (this is the example that you already know!) C4.5 is pre-installed on the lab machines. You can run it with: c4.5 -f <filestem> -v <verbosity level> Experiment with different verbosity levels Try to idenbfy, in the so'ware output: The decision tree itself The steps when the algorithm is choosing the spliwng afribute The gain values for the considered afributes Try to understand how conbnuous afributes are treated
8 Moving to more user-friendly environment 1. The WEKA system for data mining provides an implementabon of the C4.5 algorithm 2. Download the weather dataset from the course web-site 3. Open the weather.arff file with a text editor 4. Run WEKA (from the command line or from the GUI) weka 5. Open the weather.arff file from the preprocessing tab in the WEKA explorer
9 What is the content of a.arff file? 1. The name of the main relabon (with the same meaning as in relabonal weather 2. The afributes (by default, the last one is the outlook {sunny, overcast, temperature humidity windy {TRUE, play {yes, no} 3. The data (you should know them sunny,85,85,false,no sunny,80,90,true,no...
10 What is the content of a.arff file? No Outlook Temp ( F) Humid (%) Windy Class D1 sunny T Play D2 sunny T Don't Play D3 sunny F Don't Play D4 sunny F Don't Play D5 sunny F Play D6 overcast T Play D7 overcast F Play D8 overcast T Play D9 overcast F Play D10 rain T Don't Play D11 rain T Don't Play D12 rain F Play D13 rain F Play D14 rain F Play
11 Let s start!
12 Histograms 1. A panel on the lower right contains a histogram with the class distribubon over the afribute currently selected in the preprocessing tabafribub discreb: distribuzione per ogni valore For continuous attributes, the domain is split into bins 2. The color-class mapping can be inferred by selecbng the class afribute 3. Visualize all shows the histograms for all afributes
13 Classifica<on 1. In the Classify tab you can choose: The classifier to be trained The evauabon method The class afribute 2. Select the J4.8 classifier (a Java implementabon of C4.5) 3. Perform the evaluabon on the training set 4. Choose play as the class afribute
14 Classifica<on 1. You can access yet more opbons by clicking on the classifier name: binarysplits: use binary splits on nominal attributes confidencefactor: the confidence factor used for pruning (smaller values incur more pruning). minnumobj: the minimum number of instances per leaf. saveinstancedata: save the training data for visualization. Unpruned: no pruning is performed. 2. Try an run the classificabon task
15 Output (1) === Run information === Scheme: weka.classifiers.trees.j48 -C M 2 Relation: weather Instances: 14 Attributes: 5 outlook temperature humidity windy play Test mode: evaluate on training data PREAMBOLO
16 Output (2) J48 pruned tree outlook = sunny humidity <= 75: yes (2.0) humidity > 75: no (3.0) outlook = overcast: yes (4.0) outlook = rainy windy = TRUE: no (2.0) windy = FALSE: yes (3.0) Questo è l'albero decisionale che conosciamo... Che si può visualizzare! Click destro sulla lista dei risultab, poi vizualize tree
17 Output (3) === Evaluation on training set === === Summary === Correctly Classified Instances % Incorrectly Classified Instances 0 0 % Kappa statistic 1 Mean absolute error 0 Root mean squared error 0 Relative absolute error 0 % Root relative squared error 0 % Total Number of Instances these are the classificabon results on the training set: no mistake has been made
18 Output (4) === Confusion Matrix === False negabves a b <-- classified as 9 0 a = yes 0 5 b = no False posibves
19 Error Based Pruning C4.5 employs a technique called error-based pruning Before the tree is considered fina, the algorithm afempts to simplify its structure For each leaf, C4.5 esbmates a missclassificabon probability on unknown examples base on stabsbcal reasoning For each node the missclassificabon probability is given by the sum of the probabilibes of the underlying leaves At each node: If the error esbmate can be reduced by replacing the node with a leaf, then C4.5 performs the replacement If the error esbmate can be reduce by replacing the node with the branch having te most examples, then C4.5 performs the replacement
20 Excercise 3 1. Try and learn a tree for the cars2004 dataset, using the sport car afribute as class, both with an without error based pruning How do the two trees look? What is their performance on the training set? 2. Try and perform the evaluabon using cars2004_test_noname.arff as a test set. What happens?
21 Esercise 4: Language Detec<on
22 Excercise 4: Language Detec<on We want to design a so'ware system for automabcally detecbng the language of a short text Examples: The doctor, who was the family physician, saluted him, but he scarcely took any nobce. --> english J'ai couru chez toi, je ne t'ai plus trouvée, tu sais la parole que je t'avais donnée, je la Bens. -- > french Conosci tu qualche hossanieh poco scrupoloso che si possa comperare con un bel pugno d'oro? -- > italian
23 Exercise 4: Language Detec<on Download the language.zip file from the course web site. The archive contains the files tset_data.txt and vset_data.txt, respecbvely corresponding to the test and the validabon set Training and test set are not in the arff format Because we sbll don t know which afributes should be used as input for the classfier! This is what happens in 99% of the prac<cal cases. IdenBfying a good set of features and training a classifier are two components of a single design problem
24 Exercise 4: Language Detec<on Together with the dataset, you will find the generate_arff.py script, which can process the raw dataset to produce an.arff file. The script does not specify which features should be used: that s your task!. The generate_arff script is wrifen in Python (we will use Python in two more occasions): Python is (as a first approximabon) an interpreted languge. It s not very fast, but it allows for fast code wribng Python interpreters are pre-installad on most *nix systems (including OSX). In can be installed on Windows For OSX user: my advice is to override the system Python and install it via the homebrew tool
25 Python Basics Python is loosely typed (variables lack a fixed type). PrimiBve types: a = 2 (int) b = 2.4 (float) s = hello world o hello world (stringhe) Boolean: True, False Data structures: Lists (dynamic sequences): l = [1, 3, 5, 7] Tuples (immutable sequences): t = (1, 3, 5, 7) Indexing: lists & tuples: l[0] (first item), l[-1] (last item) Strings: s[0] (first lefer), s[-1] (last lefer)
26 Python Basics InstrucBons end when the line ends (no final ; ) When you need to write an instrucbon on mulbple lines, you can end the parbal lines with \ The \ character is not needed between pairs of brackets: E.g. l = [1, 2, 3, 4] No {} to delimit instrucbon blocks: they are instead defined via indenta<on
27 Python Basics Condi<onal instruc<ons: if <condition>: <instruction block> Example: if a == 0: a += 1 print I have just incremented a elif a == 1: a -= 1 print I have just decremented a else: print No increment
28 Python Basics Cycles for <variable> in <enumerable object>: <instrucbon block> Examples: for a in [1, 2, 3]: print a for i in range(3): print i Lists, strings, and tuples are all enumerables range(n) returns a list with all integers between 0 and n-1
29 Python Basics List Comprehension [<espression> for <variable> in <enumerable> if <condibon>] Example: Even numbers from 0 to 8: [2*i for i in range(5)] Squares of integers from 0 to 4: [i**2 for i in range(5)] Even numbers from 0 to 8 (bis): [i for i in range(10) if i % 2 == 0]
30 Python Basics Func<on defini<on: def <funcbon name>(<parameter>, <parameter>,...): <instrucbon block> Example: def even(n): return i % 2 == 0 FuncBons are objects! They can be passed as parameters. There is a vast collecbon of external modules: import <module name>
31 Exercise 6: Shape Recogni<on
32 Exercise 6: Shape Recogni<on An industrial word-processing plant employs a machine the should process only square boards. The input slot of the machine has been instrumented with an array of opbcal sensors, plus a so'ware unit, which can provide a descripbon of the board as a polygon:
33 Exercise 6: Shape Recogni<on Boards can have different size and slightly irregular shape. They can also be posiboned in different at the input slot. Devise and implementa a so'ware system based on Decision Trees to classify the boards as square and not square".
34 Exercise 6: Shape Recogni<on Download the shapes dataset from the course web site. The archive contains: Two dataset files ( tset_data.txt and vset_data.txt ) for the training and test set in raw format Two directores ( tset and vset ), containing an image file for each example in the training and test set. A Python script to generate the.arff file, to be customized as in the previous exercise.
Decision Trees In Weka,Data Formats
CS 4510/9010 Applied Machine Learning 1 Decision Trees In Weka,Data Formats Paula Matuszek Fall, 2016 J48: Decision Tree in Weka 2 NAME: weka.classifiers.trees.j48 SYNOPSIS Class for generating a pruned
More informationPractical Data Mining COMP-321B. Tutorial 1: Introduction to the WEKA Explorer
Practical Data Mining COMP-321B Tutorial 1: Introduction to the WEKA Explorer Gabi Schmidberger Mark Hall Richard Kirkby July 12, 2006 c 2006 University of Waikato 1 Setting up your Environment Before
More informationDecision Trees Using Weka and Rattle
9/28/2017 MIST.6060 Business Intelligence and Data Mining 1 Data Mining Software Decision Trees Using Weka and Rattle We will mainly use Weka ((http://www.cs.waikato.ac.nz/ml/weka/), an open source datamining
More informationJue Wang (Joyce) Department of Computer Science, University of Massachusetts, Boston Feb Outline
Learn to Use Weka Jue Wang (Joyce) Department of Computer Science, University of Massachusetts, Boston Feb-09-2010 Outline Introduction of Weka Explorer Filter Classify Cluster Experimenter KnowledgeFlow
More informationThe Explorer. chapter Getting started
chapter 10 The Explorer Weka s main graphical user interface, the Explorer, gives access to all its facilities using menu selection and form filling. It is illustrated in Figure 10.1. There are six different
More informationData Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Output: Knowledge representation Tables Linear models Trees Rules
More informationRepresenting structural patterns: Reading Material: Chapter 3 of the textbook by Witten
Representing structural patterns: Plain Classification rules Decision Tree Rules with exceptions Relational solution Tree for Numerical Prediction Instance-based presentation Reading Material: Chapter
More informationData Mining Practical Machine Learning Tools and Techniques
Output: Knowledge representation Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter of Data Mining by I. H. Witten and E. Frank Decision tables Decision trees Decision rules
More informationCOMP33111: Tutorial and lab exercise 7
COMP33111: Tutorial and lab exercise 7 Guide answers for Part 1: Understanding clustering 1. Explain the main differences between classification and clustering. main differences should include being unsupervised
More informationMachine Learning in Real World: C4.5
Machine Learning in Real World: C4.5 Industrial-strength algorithms For an algorithm to be useful in a wide range of realworld applications it must: Permit numeric attributes with adaptive discretization
More informationHomework 1 Sample Solution
Homework 1 Sample Solution 1. Iris: All attributes of iris are numeric, therefore ID3 of weka cannt be applied to this data set. Contact-lenses: tear-prod-rate = reduced: none tear-prod-rate = normal astigmatism
More information9/6/14. Our first learning algorithm. Comp 135 Introduction to Machine Learning and Data Mining. knn Algorithm. knn Algorithm (simple form)
Comp 135 Introduction to Machine Learning and Data Mining Our first learning algorithm How would you classify the next example? Fall 2014 Professor: Roni Khardon Computer Science Tufts University o o o
More informationClassification. Instructor: Wei Ding
Classification Decision Tree Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Preliminaries Each data record is characterized by a tuple (x, y), where x is the attribute
More informationBasic Concepts Weka Workbench and its terminology
Changelog: 14 Oct, 30 Oct Basic Concepts Weka Workbench and its terminology Lecture Part Outline Concepts, instances, attributes How to prepare the input: ARFF, attributes, missing values, getting to know
More informationPart I. Instructor: Wei Ding
Classification Part I Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Classification: Definition Given a collection of records (training set ) Each record contains a set
More informationDATA MINING LAB MANUAL
DATA MINING LAB MANUAL Subtasks : 1. List all the categorical (or nominal) attributes and the real-valued attributes seperately. Attributes:- 1. checking_status 2. duration 3. credit history 4. purpose
More informationClassification with Decision Tree Induction
Classification with Decision Tree Induction This algorithm makes Classification Decision for a test sample with the help of tree like structure (Similar to Binary Tree OR k-ary tree) Nodes in the tree
More informationData Mining Practical Machine Learning Tools and Techniques
Decision trees Extending previous approach: Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank to permit numeric s: straightforward
More information1. make a scenario and build a bayesian network + conditional probability table! use only nominal variable!
Project 1 140313 1. make a scenario and build a bayesian network + conditional probability table! use only nominal variable! network.txt @attribute play {yes, no}!!! @graph! play -> outlook! play -> temperature!
More informationCS Machine Learning
CS 60050 Machine Learning Decision Tree Classifier Slides taken from course materials of Tan, Steinbach, Kumar 10 10 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K
More informationData Mining Algorithms: Basic Methods
Algorithms: The basic methods Inferring rudimentary rules Data Mining Algorithms: Basic Methods Chapter 4 of Data Mining Statistical modeling Constructing decision trees Constructing rules Association
More informationROBOTC Basic Programming
ROBOTC Basic Programming Open ROBOTC and create a new file Check Compiler Target If you plan to download code to a robot, select the Physical Robot opbon. If you plan to download code to a virtual robot,
More informationCMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)
CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification
More informationMachine Learning Chapter 2. Input
Machine Learning Chapter 2. Input 2 Input: Concepts, instances, attributes Terminology What s a concept? Classification, association, clustering, numeric prediction What s in an example? Relations, flat
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 03 Data Processing, Data Mining Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationWEKA: Practical Machine Learning Tools and Techniques in Java. Seminar A.I. Tools WS 2006/07 Rossen Dimov
WEKA: Practical Machine Learning Tools and Techniques in Java Seminar A.I. Tools WS 2006/07 Rossen Dimov Overview Basic introduction to Machine Learning Weka Tool Conclusion Document classification Demo
More informationData Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Input: Concepts, instances, attributes Terminology What s a concept?
More informationOutline. RainForest A Framework for Fast Decision Tree Construction of Large Datasets. Introduction. Introduction. Introduction (cont d)
Outline RainForest A Framework for Fast Decision Tree Construction of Large Datasets resented by: ov. 25, 2004 1. 2. roblem Definition 3. 4. Family of Algorithms 5. 6. 2 Classification is an important
More informationWhat is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.
What is Learning? CS 343: Artificial Intelligence Machine Learning Herbert Simon: Learning is any process by which a system improves performance from experience. What is the task? Classification Problem
More informationClassification: Basic Concepts, Decision Trees, and Model Evaluation
Classification: Basic Concepts, Decision Trees, and Model Evaluation Data Warehousing and Mining Lecture 4 by Hossen Asiful Mustafa Classification: Definition Given a collection of records (training set
More informationCSE4334/5334 DATA MINING
CSE4334/5334 DATA MINING Lecture 4: Classification (1) CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy
More informationCSE 634/590 Data mining Extra Credit: Classification by Association rules: Example Problem. Muhammad Asiful Islam, SBID:
CSE 634/590 Data mining Extra Credit: Classification by Association rules: Example Problem Muhammad Asiful Islam, SBID: 106506983 Original Data Outlook Humidity Wind PlayTenis Sunny High Weak No Sunny
More informationChapter 4: Algorithms CS 795
Chapter 4: Algorithms CS 795 Inferring Rudimentary Rules 1R Single rule one level decision tree Pick each attribute and form a single level tree without overfitting and with minimal branches Pick that
More informationUnsupervised: no target value to predict
Clustering Unsupervised: no target value to predict Differences between models/algorithms: Exclusive vs. overlapping Deterministic vs. probabilistic Hierarchical vs. flat Incremental vs. batch learning
More informationData Mining. 1.3 Input. Fall Instructor: Dr. Masoud Yaghini. Chapter 3: Input
Data Mining 1.3 Input Fall 2008 Instructor: Dr. Masoud Yaghini Outline Instances Attributes References Instances Instance: Instances Individual, independent example of the concept to be learned. Characterized
More informationInducer: a Rule Induction Workbench for Data Mining
Inducer: a Rule Induction Workbench for Data Mining Max Bramer Faculty of Technology University of Portsmouth Portsmouth, UK Email: Max.Bramer@port.ac.uk Fax: +44-2392-843030 Abstract One of the key technologies
More informationLecture 5: Decision Trees (Part II)
Lecture 5: Decision Trees (Part II) Dealing with noise in the data Overfitting Pruning Dealing with missing attribute values Dealing with attributes with multiple values Integrating costs into node choice
More informationDecision Tree Learning
Decision Tree Learning 1 Simple example of object classification Instances Size Color Shape C(x) x1 small red circle positive x2 large red circle positive x3 small red triangle negative x4 large blue circle
More informationDecision tree learning
Decision tree learning Andrea Passerini passerini@disi.unitn.it Machine Learning Learning the concept Go to lesson OUTLOOK Rain Overcast Sunny TRANSPORTATION LESSON NO Uncovered Covered Theoretical Practical
More informationImplementation of Classification Rules using Oracle PL/SQL
1 Implementation of Classification Rules using Oracle PL/SQL David Taniar 1 Gillian D cruz 1 J. Wenny Rahayu 2 1 School of Business Systems, Monash University, Australia Email: David.Taniar@infotech.monash.edu.au
More informationAdvanced learning algorithms
Advanced learning algorithms Extending decision trees; Extraction of good classification rules; Support vector machines; Weighted instance-based learning; Design of Model Tree Clustering Association Mining
More informationData Mining. Part 1. Introduction. 1.3 Input. Fall Instructor: Dr. Masoud Yaghini. Input
Data Mining Part 1. Introduction 1.3 Fall 2009 Instructor: Dr. Masoud Yaghini Outline Instances Attributes References Instances Instance: Instances Individual, independent example of the concept to be
More informationLecture outline. Decision-tree classification
Lecture outline Decision-tree classification Decision Trees Decision tree A flow-chart-like tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes
More informationData Engineering. Data preprocessing and transformation
Data Engineering Data preprocessing and transformation Just apply a learner? NO! Algorithms are biased No free lunch theorem: considering all possible data distributions, no algorithm is better than another
More informationData Mining. Decision Tree. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Decision Tree Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 24 Table of contents 1 Introduction 2 Decision tree
More informationCS 8520: Artificial Intelligence. Weka Lab. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek
CS 8520: Artificial Intelligence Weka Lab Paula Matuszek Fall, 2015!1 Weka is Waikato Environment for Knowledge Analysis Machine Learning Software Suite from the University of Waikato Been under development
More informationData Mining and Machine Learning: Techniques and Algorithms
Instance based classification Data Mining and Machine Learning: Techniques and Algorithms Eneldo Loza Mencía eneldo@ke.tu-darmstadt.de Knowledge Engineering Group, TU Darmstadt International Week 2019,
More informationData Mining Practical Machine Learning Tools and Techniques
Input: Concepts, instances, attributes Data ining Practical achine Learning Tools and Techniques Slides for Chapter 2 of Data ining by I. H. Witten and E. rank Terminology What s a concept z Classification,
More information3.DEFINITION. examples being the finite sets. However, we will have to consider infinite sets as well.
3.DEFINITION Discrete Mathema6cs is the Math needed in decision making in noncon6nuous situa6ons. Thus, it mainly deals with discrete objects, their best examples being the finite sets. However, we will
More informationBusiness Club. Decision Trees
Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building
More informationSearch. The Nearest Neighbor Problem
3 Nearest Neighbor Search Lab Objective: The nearest neighbor problem is an optimization problem that arises in applications such as computer vision, pattern recognition, internet marketing, and data compression.
More informationData Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier
Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio
More informationWhat Is Data Mining? CMPT 354: Database I -- Data Mining 2
Data Mining What Is Data Mining? Mining data mining knowledge Data mining is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data CMPT
More informationNaïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Things We d Like to Do Spam Classification Given an email, predict
More informationBITS F464: MACHINE LEARNING
BITS F464: MACHINE LEARNING Lecture-16: Decision Tree (contd.) + Random Forest Dr. Kamlesh Tiwari Assistant Professor Department of Computer Science and Information Systems Engineering, BITS Pilani, Rajasthan-333031
More informationData Mining. Part 1. Introduction. 1.4 Input. Spring Instructor: Dr. Masoud Yaghini. Input
Data Mining Part 1. Introduction 1.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Instances Attributes References Instances Instance: Instances Individual, independent example of the concept to be
More informationChapter 4: Algorithms CS 795
Chapter 4: Algorithms CS 795 Inferring Rudimentary Rules 1R Single rule one level decision tree Pick each attribute and form a single level tree without overfitting and with minimal branches Pick that
More informationDecision Tree CE-717 : Machine Learning Sharif University of Technology
Decision Tree CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adapted from: Prof. Tom Mitchell Decision tree Approximating functions of usually discrete
More informationDecision Trees Dr. G. Bharadwaja Kumar VIT Chennai
Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target
More informationAI32 Guide to Weka. Andrew Roberts 1st March 2005
AI32 Guide to Weka Andrew Roberts http://www.comp.leeds.ac.uk/andyr 1st March 2005 1 Introduction Weka is an excellent system for learning about machine learning techniques. Of course, it is a generic
More informationPython review. 1 Python basics. References. CS 234 Naomi Nishimura
Python review CS 234 Naomi Nishimura The sections below indicate Python material, the degree to which it will be used in the course, and various resources you can use to review the material. You are not
More informationPython lab session 1
Python lab session 1 Dr Ben Dudson, Department of Physics, University of York 28th January 2011 Python labs Before we can start using Python, first make sure: ˆ You can log into a computer using your username
More informationData Mining Classification - Part 1 -
Data Mining Classification - Part 1 - Universität Mannheim Bizer: Data Mining I FSS2019 (Version: 20.2.2018) Slide 1 Outline 1. What is Classification? 2. K-Nearest-Neighbors 3. Decision Trees 4. Model
More informationConstruct an optimal tree of one level
Economics 1660: Big Data PS 3: Trees Prof. Daniel Björkegren Poisonous Mushrooms, Continued A foodie friend wants to cook a dish with fresh collected mushrooms. However, he knows that some wild mushrooms
More informationData Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar (modified by Predrag Radivojac, 2017) Classification:
More informationIntroduction to Machine Learning
Introduction to Machine Learning Decision Tree Example Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short} Class: Country = {Gromland, Polvia} CS4375 --- Fall 2018 a
More informationExtra readings beyond the lecture slides are important:
1 Notes To preview next lecture: Check the lecture notes, if slides are not available: http://web.cse.ohio-state.edu/~sun.397/courses/au2017/cse5243-new.html Check UIUC course on the same topic. All their
More informationCOMP s1 - Getting started with the Weka Machine Learning Toolkit
COMP9417 16s1 - Getting started with the Weka Machine Learning Toolkit Last revision: Thu Mar 16 2016 1 Aims This introduction is the starting point for Assignment 1, which requires the use of the Weka
More informationHierarchical Clustering Lecture 9
Hierarchical Clustering Lecture 9 Marina Santini Acknowledgements Slides borrowed and adapted from: Data Mining by I. H. Witten, E. Frank and M. A. Hall 1 Lecture 9: Required Reading Witten et al. (2011:
More informationLists, loops and decisions
Caltech/LEAD Summer 2012 Computer Science Lecture 4: July 11, 2012 Lists, loops and decisions Lists Today Looping with the for statement Making decisions with the if statement Lists A list is a sequence
More informationMachine Learning. Decision Trees. Le Song /15-781, Spring Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU
Machine Learning 10-701/15-781, Spring 2008 Decision Trees Le Song Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU Reading: Chap. 1.6, CB & Chap 3, TM Learning non-linear functions f:
More informationData Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification
Data Mining 3.3 Fall 2008 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rules With Exceptions Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms
More informationMIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018
MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge
More informationNominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN
NonMetric Data Nominal Data So far we consider patterns to be represented by feature vectors of real or integer values Easy to come up with a distance (similarity) measure by using a variety of mathematical
More informationData Structures III: K-D
Lab 6 Data Structures III: K-D Trees Lab Objective: Nearest neighbor search is an optimization problem that arises in applications such as computer vision, pattern recognition, internet marketing, and
More informationCS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #19: Machine Learning 1
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #19: Machine Learning 1 Supervised Learning Would like to do predicbon: esbmate a func3on f(x) so that y = f(x) Where y can be: Real number:
More informationNearest neighbor classification DSE 220
Nearest neighbor classification DSE 220 Decision Trees Target variable Label Dependent variable Output space Person ID Age Gender Income Balance Mortgag e payment 123213 32 F 25000 32000 Y 17824 49 M 12000-3000
More informationInduction of Decision Trees
Induction of Decision Trees Blaž Zupan and Ivan Bratko magixfriuni-ljsi/predavanja/uisp An Example Data Set and Decision Tree # Attribute Class Outlook Company Sailboat Sail? 1 sunny big small yes 2 sunny
More informationHomework #4 RELEASE DATE: 04/22/2014 DUE DATE: 05/06/2014, 17:30 (after class) in CSIE R217
Homework #4 RELEASE DATE: 04/22/2014 DUE DATE: 05/06/2014, 17:30 (after class) in CSIE R217 As directed below, you need to submit your code to the designated place on the course website. Any form of cheating,
More informationData Mining and Analytics
Data Mining and Analytics Aik Choon Tan, Ph.D. Associate Professor of Bioinformatics Division of Medical Oncology Department of Medicine aikchoon.tan@ucdenver.edu 9/22/2017 http://tanlab.ucdenver.edu/labhomepage/teaching/bsbt6111/
More informationComputing a Gain Chart. Comparing the computation time of data mining tools on a large dataset under Linux.
1 Introduction Computing a Gain Chart. Comparing the computation time of data mining tools on a large dataset under Linux. The gain chart is an alternative to confusion matrix for the evaluation of a classifier.
More informationClassifica(on and Clustering with WEKA. Classifica*on and Clustering with WEKA
Classifica(on and Clustering with WEKA 1 Schedule: Classifica(on and Clustering with WEKA 1. Presentation of WEKA. 2. Your turn: perform classification and clustering. 2 WEKA Weka is a collec*on of machine
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationISSUES IN DECISION TREE LEARNING
ISSUES IN DECISION TREE LEARNING Handling Continuous Attributes Other attribute selection measures Overfitting-Pruning Handling of missing values Incremental Induction of Decision Tree 1 DECISION TREE
More informationList of Exercises: Data Mining 1 December 12th, 2015
List of Exercises: Data Mining 1 December 12th, 2015 1. We trained a model on a two-class balanced dataset using five-fold cross validation. One person calculated the performance of the classifier by measuring
More informationDecision Trees: Discussion
Decision Trees: Discussion Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning
More informationLazy Rule Learning. Lazy Rule Learning Bachelor-Thesis von Nikolaus Korfhage Januar ngerman
ngerman Lazy Rule Learning Lazy Rule Learning Bachelor-Thesis von Nikolaus Korfhage Januar 2012 Fachbereich Informatik Fachgebiet Knowledge Engineering Lazy Rule Learning Lazy Rule Learning Vorgelegte
More informationCSE 115. Introduction to Computer Science I
CSE 115 Introduction to Computer Science I Progress In UBInfinite? A. Haven't started B. Earned 3 stars in "Calling Functions" C. Earned 3 stars in "Defining Functions" D. Earned 3 stars in "Conditionals"
More informationBasic Python 3 Programming (Theory & Practical)
Basic Python 3 Programming (Theory & Practical) Length Delivery Method : 5 Days : Instructor-led (Classroom) Course Overview This Python 3 Programming training leads the student from the basics of writing
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationARTIFICIAL INTELLIGENCE AND PYTHON
ARTIFICIAL INTELLIGENCE AND PYTHON DAY 1 STANLEY LIANG, LASSONDE SCHOOL OF ENGINEERING, YORK UNIVERSITY WHAT IS PYTHON An interpreted high-level programming language for general-purpose programming. Python
More informationCONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM
1 CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM John R. Koza Computer Science Department Stanford University Stanford, California 94305 USA E-MAIL: Koza@Sunburn.Stanford.Edu
More informationcs1114 REVIEW of details test closed laptop period
python details DOES NOT COVER FUNCTIONS!!! This is a sample of some of the things that you are responsible for do not believe that if you know only the things on this test that they will get an A on any
More informationTour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers
Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers James P. Biagioni Piotr M. Szczurek Peter C. Nelson, Ph.D. Abolfazl Mohammadian, Ph.D. Agenda Background
More informationWEB BASED DATA-MINING ASSISTANT
P. J. Safarik University Faculty of Science WEB BASED DATA-MINING ASSISTANT THESIS Field of Study: Institute: Tutor: Computer Science Institute of Computer Science RNDr. Tomáš Horváth, PhD. Košice 2015
More informationKNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa
KNIME TUTORIAL Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it Outline Introduction on KNIME KNIME components Exercise: Data Understanding Exercise: Market Basket Analysis Exercise:
More informationCrash Dive into Python
ECPE 170 University of the Pacific Crash Dive into Python 2 Lab Schedule Today Ac:vi:es Endianness Python Thursday Network programming Lab 8 Network Programming Lab 8 Assignments Due Due by Mar 30 th 5:00am
More informationWEKA homepage.
WEKA homepage http://www.cs.waikato.ac.nz/ml/weka/ Data mining software written in Java (distributed under the GNU Public License). Used for research, education, and applications. Comprehensive set of
More informationS2 Text. Instructions to replicate classification results.
S2 Text. Instructions to replicate classification results. Machine Learning (ML) Models were implemented using WEKA software Version 3.8. The software can be free downloaded at this link: http://www.cs.waikato.ac.nz/ml/weka/downloading.html.
More information