Machine Learning (CSE 446): Decision Trees

Size: px
Start display at page:

Download "Machine Learning (CSE 446): Decision Trees"

Transcription

1 Machine Learning (CSE 446): Decision Trees Sham M Kakade c 28 University of Washington cse446-staff@cs.washington.edu / 8

2 Announcements First assignment posted. Due Thurs, Jan 8th. Remember the late policy (see the website). TA office hours posted. (Please check website before you go, just in case of changes.) Midterm: Weds, Feb 7. Today: Decision Trees, the supervised learning 2 / 8

3 Features (a conceptual point) Let φ be (one such) function that maps from inputs x to values. There could be many such functions, sometimes we write Φ(x) for the feature vector (it s really a tuple ). If φ maps to {, }, we call it a binary feature (function). If φ maps to R, we call it a real-valued feature (function). φ could map to categorical values. ordinal values, integers,... Often, there isn t much of a difference between x and the tuple of features. 3 / 8

4 Features Data derived from mpg; cylinders; displacement; horsepower; weight; acceleration; year; origin Input: a row in this table. a feature mapping corresponds to a column. Goal: predict whether mpg is < 23 ( bad = ) or above ( good = ) given other attributes (other columns). 2 good and 97 bad ; guessing the most frequent class (good) will get 5.5% accuracy. 4 / 8

5 Let s build a classifier! Let s just try to build a classifier. (This is our first learning algorithm) For now, let s ignore the test set and trying to generalize Let s start with just looking at a simple classifier. What is a simple classification rule 5 / 8

6 Contingency Table values of y values of feature φ v v 2 v K 6 / 8

7 Decision Stump Example y maker america europe asia / 8

8 Decision Stump Example 97:2 maker y maker america europe asia america 74:75 europe 4:56 asia 9:7 7 / 8

9 Decision Stump Example 97:2 maker y maker america europe asia america 74:75 europe 4:56 asia 9:7 Errors: = 98 (about 25%) 7 / 8

10 Decision Stump Example 97:2 cylinders 3 3: 4 2:84 5 :2 6 73: 8 :3 8 / 8

11 Decision Stump Example 97:2 cylinders 3 3: 4 2:84 5 :2 6 73: 8 :3 Errors: = 36 (about 9%) 8 / 8

12 Key Idea: Recursion A single feature partitions the data. For each partition, we could choose another feature and partition further. Applying this recursively, we can construct a decision tree. 9 / 8

13 Decision Tree Example 97:2 cylinders 3 3: 4 2:84 5 :2 6 73: 8 :3 maker america 7:65 europe :53 asia 3:66 Error reduction compared to the cylinders stump / 8

14 Decision Tree Example 97:2 cylinders 3 3: 4 2:84 5 :2 6 73: 8 :3 maker america 67:7 europe 3: asia 3:3 Error reduction compared to the cylinders stump / 8

15 Decision Tree Example 97:2 cylinders 3 3: 4 2:84 5 :2 6 73: 8 :3 ϕ 73: : Error reduction compared to the cylinders stump / 8

16 Decision Tree Example 97:2 cylinders 3 3: 4 2:84 5 :2 6 73: 8 :3 ϕ ϕ 2:69 8:5 73: : Error reduction compared to the cylinders stump / 8

17 Decision Tree: Making a Prediction n:p ϕ n :p n :p ϕ 2 n :p n :p ϕ 3 ϕ 4 n :p n :p n :p n :p / 8

18 Decision Tree: Making a Prediction n:p n :p ϕ 3 ϕ n :p ϕ 2 n :p n :p ϕ 4 Data: decision tree t, input example x Result: predicted class if t has the form Leaf(y) then return y; else # t.φ is the feature associated with t; # t.child(v) is the subtree for value v; return DTreeTest(t.child(t.φ(x)), x)); end Algorithm : DTreeTest n :p n :p n :p n :p / 8

19 Decision Tree: Making a Prediction n:p n :p ϕ n :p ϕ 2 n :p n :p Equivalent boolean formulas: (φ = ) n < p (φ = ) (φ 2 = ) (φ 3 = ) n < p (φ = ) (φ 2 = ) (φ 3 = ) n < p (φ = ) (φ 2 = ) (φ 4 = ) n < p (φ = ) (φ 2 = ) (φ 4 = ) n < p ϕ 3 ϕ 4 n :p n :p n :p n :p / 8

20 Tangent: How Many Formulas Assume we have D binary features. Each feature could be set to, or set to, or excluded (wildcard/don t care). 3 D formulas. 2 / 8

21 Building a Decision Tree n:p 3 / 8

22 Building a Decision Tree n:p ϕ n :p n :p We chose feature φ. Note that n = n + n and p = p + p. 3 / 8

23 Building a Decision Tree n:p ϕ n :p n :p We chose not to split the left partition. Why not 3 / 8

24 Building a Decision Tree n:p ϕ n :p n :p ϕ 2 n :p n :p 3 / 8

25 Building a Decision Tree n:p ϕ n :p n :p ϕ 2 n :p n :p ϕ 3 n :p n :p 3 / 8

26 Building a Decision Tree n:p ϕ n :p n :p ϕ 2 n :p n :p ϕ 3 ϕ 4 n :p n :p n :p n :p 3 / 8

27 Greedily Building a Decision Tree (Binary Features) Data: data D, feature set Φ Result: decision tree if all examples in D have the same label y, or Φ is empty and y is the best guess then return Leaf(y); else for each feature φ in Φ do partition D into D and D based on φ-values; let mistakes(φ) = (non-majority answers in D ) + (non-majority answers in D ); end let φ be the feature with the smallest number of mistakes; return Node(φ, { DTreeTrain(D, Φ \ {φ }), DTreeTrain(D, Φ \ {φ })}); end Algorithm 2: DTreeTrain 4 / 8

28 What could go wrong Suppose we split on a variable with many values (e.g. a continous one like displacement ) Suppose we built out our tree to be very deep and wide 5 / 8

29 Danger: Overfitting overfitting error rate (lower is better) unseen data training data depth of the decision tree 6 / 8

30 Detecting Overfitting If you use all of your data to train, you won t be able to draw the red curve on the preceding slide! 7 / 8

31 Detecting Overfitting If you use all of your data to train, you won t be able to draw the red curve on the preceding slide! Solution: hold some out. This data is called development data. More terms: Decision tree max depth is an example of a hyperparameter I used my development data to tune the max-depth hyperparameter. 7 / 8

32 Detecting Overfitting If you use all of your data to train, you won t be able to draw the red curve on the preceding slide! Solution: hold some out. This data is called development data. More terms: Decision tree max depth is an example of a hyperparameter I used my development data to tune the max-depth hyperparameter. Better yet, hold out two subsets, one for tuning and one for a true, honest-to-science test. Splitting your data into training/development/test requires careful thinking. Starting point: randomly shuffle examples with an 8%/%/% split. 7 / 8

33 The i.i.d. Supervised Learning Setup Let l be a loss function; l(y, ŷ) is what we lose by outputting ŷ when y is the correct output. For classification: l(y, ŷ) = y ŷ Let D(x, y) define the true probability of input/output pair (x, y), in nature. We never know this distribution. The training data D = (x, y ), (x 2, y 2 ),..., (x N, y N ) are assumed to be identical, independently, distributed (i.i.d.) samples from D. The test data are also assumed to be i.i.d. samples from D. The space of classifiers we re considering is F; f is a classifier from F, chosen by our learning algorithm. 8 / 8

Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm

Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 17 Review 1 / 17 Decision Tree: Making a

More information

CSE 446 Bias-Variance & Naïve Bayes

CSE 446 Bias-Variance & Naïve Bayes CSE 446 Bias-Variance & Naïve Bayes Administrative Homework 1 due next week on Friday Good to finish early Homework 2 is out on Monday Check the course calendar Start early (midterm is right before Homework

More information

Machine Learning (CSE 446): Perceptron

Machine Learning (CSE 446): Perceptron Machine Learning (CSE 446): Perceptron Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 14 Announcements HW due this week. See detailed instructions in the hw. One pdf file.

More information

Machine Learning (CSE 446): Unsupervised Learning

Machine Learning (CSE 446): Unsupervised Learning Machine Learning (CSE 446): Unsupervised Learning Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 19 Announcements HW2 posted. Due Feb 1. It is long. Start this week! Today:

More information

Algorithms: Decision Trees

Algorithms: Decision Trees Algorithms: Decision Trees A small dataset: Miles Per Gallon Suppose we want to predict MPG From the UCI repository A Decision Stump Recursion Step Records in which cylinders = 4 Records in which cylinders

More information

Classification. Slide sources:

Classification. Slide sources: Classification Slide sources: Gideon Dror, Academic College of TA Yaffo Nathan Ifill, Leicester MA4102 Data Mining and Neural Networks Andrew Moore, CMU : http://www.cs.cmu.edu/~awm/tutorials 1 Outline

More information

Lecture 7: Decision Trees

Lecture 7: Decision Trees Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...

More information

Decision Tree CE-717 : Machine Learning Sharif University of Technology

Decision Tree CE-717 : Machine Learning Sharif University of Technology Decision Tree CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adapted from: Prof. Tom Mitchell Decision tree Approximating functions of usually discrete

More information

4. Feedforward neural networks. 4.1 Feedforward neural network structure

4. Feedforward neural networks. 4.1 Feedforward neural network structure 4. Feedforward neural networks 4.1 Feedforward neural network structure Feedforward neural network is one of the most common network architectures. Its structure and some basic preprocessing issues required

More information

Machine Learning (CSE 446): Practical Issues

Machine Learning (CSE 446): Practical Issues Machine Learning (CSE 446): Practical Issues Noah Smith c 2017 University of Washington nasmith@cs.washington.edu October 18, 2017 1 / 39 scary words 2 / 39 Outline of CSE 446 We ve already covered stuff

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

Supervised Learning: The Setup. Spring 2018

Supervised Learning: The Setup. Spring 2018 Supervised Learning: The Setup Spring 2018 1 Homework 0 will be released today through Canvas Due: Jan. 19 (next Friday) midnight 2 Last lecture We saw What is learning? Learning as generalization The

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees David S. Rosenberg New York University April 3, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 April 3, 2018 1 / 51 Contents 1 Trees 2 Regression

More information

Ensemble Methods, Decision Trees

Ensemble Methods, Decision Trees CS 1675: Intro to Machine Learning Ensemble Methods, Decision Trees Prof. Adriana Kovashka University of Pittsburgh November 13, 2018 Plan for This Lecture Ensemble methods: introduction Boosting Algorithm

More information

Classification/Regression Trees and Random Forests

Classification/Regression Trees and Random Forests Classification/Regression Trees and Random Forests Fabio G. Cozman - fgcozman@usp.br November 6, 2018 Classification tree Consider binary class variable Y and features X 1,..., X n. Decide Ŷ after a series

More information

8.1 Polynomial-Time Reductions

8.1 Polynomial-Time Reductions 8.1 Polynomial-Time Reductions Classify Problems According to Computational Requirements Q. Which problems will we be able to solve in practice? A working definition. Those with polynomial-time algorithms.

More information

1) Give decision trees to represent the following Boolean functions:

1) Give decision trees to represent the following Boolean functions: 1) Give decision trees to represent the following Boolean functions: 1) A B 2) A [B C] 3) A XOR B 4) [A B] [C Dl Answer: 1) A B 2) A [B C] 1 3) A XOR B = (A B) ( A B) 4) [A B] [C D] 2 2) Consider the following

More information

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989] Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 3 rd, 2007 1 Boosting [Schapire, 1989] Idea: given a weak

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Decision Tree Example Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short} Class: Country = {Gromland, Polvia} CS4375 --- Fall 2018 a

More information

Figure 4.1: The evolution of a rooted tree.

Figure 4.1: The evolution of a rooted tree. 106 CHAPTER 4. INDUCTION, RECURSION AND RECURRENCES 4.6 Rooted Trees 4.6.1 The idea of a rooted tree We talked about how a tree diagram helps us visualize merge sort or other divide and conquer algorithms.

More information

Simple Model Selection Cross Validation Regularization Neural Networks

Simple Model Selection Cross Validation Regularization Neural Networks Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February

More information

Decision Trees. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Decision Trees. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University. Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them

More information

Search. The Nearest Neighbor Problem

Search. The Nearest Neighbor Problem 3 Nearest Neighbor Search Lab Objective: The nearest neighbor problem is an optimization problem that arises in applications such as computer vision, pattern recognition, internet marketing, and data compression.

More information

COMS 4771 Clustering. Nakul Verma

COMS 4771 Clustering. Nakul Verma COMS 4771 Clustering Nakul Verma Supervised Learning Data: Supervised learning Assumption: there is a (relatively simple) function such that for most i Learning task: given n examples from the data, find

More information

Topic 7 Machine learning

Topic 7 Machine learning CSE 103: Probability and statistics Winter 2010 Topic 7 Machine learning 7.1 Nearest neighbor classification 7.1.1 Digit recognition Countless pieces of mail pass through the postal service daily. A key

More information

Boosting Simple Model Selection Cross Validation Regularization

Boosting Simple Model Selection Cross Validation Regularization Boosting: (Linked from class website) Schapire 01 Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 8 th,

More information

CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008

CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008 CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008 Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute NAME: Prof. Ruiz Problem

More information

CHAPTER 7. Copyright Cengage Learning. All rights reserved.

CHAPTER 7. Copyright Cengage Learning. All rights reserved. CHAPTER 7 FUNCTIONS Copyright Cengage Learning. All rights reserved. SECTION 7.1 Functions Defined on General Sets Copyright Cengage Learning. All rights reserved. Functions Defined on General Sets We

More information

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017 CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.

More information

Nearest neighbors classifiers

Nearest neighbors classifiers Nearest neighbors classifiers James McInerney Adapted from slides by Daniel Hsu Sept 11, 2017 1 / 25 Housekeeping We received 167 HW0 submissions on Gradescope before midnight Sept 10th. From a random

More information

Artificial Neural Networks (Feedforward Nets)

Artificial Neural Networks (Feedforward Nets) Artificial Neural Networks (Feedforward Nets) y w 03-1 w 13 y 1 w 23 y 2 w 01 w 21 w 22 w 02-1 w 11 w 12-1 x 1 x 2 6.034 - Spring 1 Single Perceptron Unit y w 0 w 1 w n w 2 w 3 x 0 =1 x 1 x 2 x 3... x

More information

Part I. Classification & Decision Trees. Classification. Classification. Week 4 Based in part on slides from textbook, slides of Susan Holmes

Part I. Classification & Decision Trees. Classification. Classification. Week 4 Based in part on slides from textbook, slides of Susan Holmes Week 4 Based in part on slides from textbook, slides of Susan Holmes Part I Classification & Decision Trees October 19, 2012 1 / 1 2 / 1 Classification Classification Problem description We are given a

More information

Geometric data structures:

Geometric data structures: Geometric data structures: Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Sham Kakade 2017 1 Announcements: HW3 posted Today: Review: LSH for Euclidean distance Other

More information

Lecture 5: Decision Trees (Part II)

Lecture 5: Decision Trees (Part II) Lecture 5: Decision Trees (Part II) Dealing with noise in the data Overfitting Pruning Dealing with missing attribute values Dealing with attributes with multiple values Integrating costs into node choice

More information

Classification with Decision Tree Induction

Classification with Decision Tree Induction Classification with Decision Tree Induction This algorithm makes Classification Decision for a test sample with the help of tree like structure (Similar to Binary Tree OR k-ary tree) Nodes in the tree

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

Singular Value Decomposition, and Application to Recommender Systems

Singular Value Decomposition, and Application to Recommender Systems Singular Value Decomposition, and Application to Recommender Systems CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Recommendation

More information

Construct an optimal tree of one level

Construct an optimal tree of one level Economics 1660: Big Data PS 3: Trees Prof. Daniel Björkegren Poisonous Mushrooms, Continued A foodie friend wants to cook a dish with fresh collected mushrooms. However, he knows that some wild mushrooms

More information

Deep Learning & Neural Networks

Deep Learning & Neural Networks Deep Learning & Neural Networks Machine Learning CSE4546 Sham Kakade University of Washington November 29, 2016 Sham Kakade 1 Announcements: HW4 posted Poster Session Thurs, Dec 8 Today: Review: EM Neural

More information

Extra readings beyond the lecture slides are important:

Extra readings beyond the lecture slides are important: 1 Notes To preview next lecture: Check the lecture notes, if slides are not available: http://web.cse.ohio-state.edu/~sun.397/courses/au2017/cse5243-new.html Check UIUC course on the same topic. All their

More information

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability Decision trees A decision tree is a method for classification/regression that aims to ask a few relatively simple questions about an input and then predicts the associated output Decision trees are useful

More information

Steiner Tree. Algorithms and Networks 2014/2015 Hans L. Bodlaender Johan M. M. van Rooij

Steiner Tree. Algorithms and Networks 2014/2015 Hans L. Bodlaender Johan M. M. van Rooij Steiner Tree Algorithms and Networks 2014/2015 Hans L. Bodlaender Johan M. M. van Rooij 1 The Steiner Tree Problem Let G = (V,E) be an undirected graph, and let N µ V be a subset of the terminals. A Steiner

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

More information

Input Space Partitioning

Input Space Partitioning Input Space Partitioning Instructor : Ali Sharifara CSE 5321/4321 Summer 2017 CSE 5321/4321, Ali Sharifara, UTA 1 Input Space Partitioning Introduction Equivalence Partitioning Boundary-Value Analysis

More information

(Refer Slide Time: 01.26)

(Refer Slide Time: 01.26) Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture # 22 Why Sorting? Today we are going to be looking at sorting.

More information

CSE 2123 Recursion. Jeremy Morris

CSE 2123 Recursion. Jeremy Morris CSE 2123 Recursion Jeremy Morris 1 Past Few Weeks For the past few weeks we have been focusing on data structures Classes & Object-oriented programming Collections Lists, Sets, Maps, etc. Now we turn our

More information

(Provisional) Lecture 22: Rackette Overview, Binary Tree Analysis 10:00 AM, Oct 27, 2017

(Provisional) Lecture 22: Rackette Overview, Binary Tree Analysis 10:00 AM, Oct 27, 2017 Integrated Introduction to Computer Science Hughes (Provisional) Lecture 22: Rackette Overview, Binary Tree Analysis 10:00 Contents 1 Announcements 1 2 An OCaml Debugging Tip 1 3 Introduction to Rackette

More information

Chapter 8. NP and Computational Intractability. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Chapter 8. NP and Computational Intractability. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. Chapter 8 NP and Computational Intractability Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 Algorithm Design Patterns and Anti-Patterns Algorithm design patterns.

More information

Binary Space Partition Trees. Overview. Binary Space Partitioning Trees (Fuchs, Kedem and Naylor `80)

Binary Space Partition Trees. Overview. Binary Space Partitioning Trees (Fuchs, Kedem and Naylor `80) Binary Space Partition Trees Anthony Steed, Yiorgos Chrysanthou 999-200, Celine Loscos 200, Jan Kautz 2007-2009 Overview Previous list priority algorithms fail in a number of cases, non of them is completely

More information

CS 373: Combinatorial Algorithms, Spring 1999

CS 373: Combinatorial Algorithms, Spring 1999 CS 373: Combinatorial Algorithms, Spring 1999 Final Exam (May 7, 1999) Name: Net ID: Alias: This is a closed-book, closed-notes exam! If you brought anything with you besides writing instruments and your

More information

Machine Learning. bad news. Big Picture: Supervised Learning. Supervised (Function) Learning. Learning From Data with Decision Trees.

Machine Learning. bad news. Big Picture: Supervised Learning. Supervised (Function) Learning. Learning From Data with Decision Trees. bad news Machine Learning Learning From Data with Decision Trees Supervised (Function) Learning y = F(x 1 x n ): true function (usually not known) D: training sample drawn from F(x) 57,M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0

More information

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13 CSE 634 - Data Mining Concepts and Techniques STATISTICAL METHODS Professor- Anita Wasilewska (REGRESSION) Team 13 Contents Linear Regression Logistic Regression Bias and Variance in Regression Model Fit

More information

2. A Bernoulli distribution has the following likelihood function for a data set D: N 1 N 1 + N 0

2. A Bernoulli distribution has the following likelihood function for a data set D: N 1 N 1 + N 0 Machine Learning Fall 2015 Homework 1 Homework must be submitted electronically following the instructions on the course homepage. Make sure to explain you reasoning or show your derivations. Except for

More information

Random Forest A. Fornaser

Random Forest A. Fornaser Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University

More information

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018 MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge

More information

Decision Trees. Petr Pošík. Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics

Decision Trees. Petr Pošík. Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics Decision Trees Petr Pošík Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics This lecture is largely based on the book Artificial Intelligence: A Modern Approach,

More information

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio

More information

WRAPPER feature selection method with SIPINA and R (RWeka package). Comparison with a FILTER approach implemented into TANAGRA.

WRAPPER feature selection method with SIPINA and R (RWeka package). Comparison with a FILTER approach implemented into TANAGRA. 1 Topic WRAPPER feature selection method with SIPINA and R (RWeka package). Comparison with a FILTER approach implemented into TANAGRA. Feature selection. The feature selection 1 is a crucial aspect of

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

CSC 411 Lecture 4: Ensembles I

CSC 411 Lecture 4: Ensembles I CSC 411 Lecture 4: Ensembles I Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 04-Ensembles I 1 / 22 Overview We ve seen two particular classification algorithms:

More information

(Refer Slide Time 6:48)

(Refer Slide Time 6:48) Digital Circuits and Systems Prof. S. Srinivasan Department of Electrical Engineering Indian Institute of Technology Madras Lecture - 8 Karnaugh Map Minimization using Maxterms We have been taking about

More information

Motivation. CS389L: Automated Logical Reasoning. Lecture 5: Binary Decision Diagrams. Historical Context. Binary Decision Trees

Motivation. CS389L: Automated Logical Reasoning. Lecture 5: Binary Decision Diagrams. Historical Context. Binary Decision Trees Motivation CS389L: Automated Logical Reasoning Lecture 5: Binary Decision Diagrams Işıl Dillig Previous lectures: How to determine satisfiability of propositional formulas Sometimes need to efficiently

More information

Parallelization in the Big Data Regime 5: Data Parallelization? Sham M. Kakade

Parallelization in the Big Data Regime 5: Data Parallelization? Sham M. Kakade Parallelization in the Big Data Regime 5: Data Parallelization? Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 23 Announcements...

More information

CS Machine Learning

CS Machine Learning CS 60050 Machine Learning Decision Tree Classifier Slides taken from course materials of Tan, Steinbach, Kumar 10 10 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K

More information

10601 Machine Learning. Model and feature selection

10601 Machine Learning. Model and feature selection 10601 Machine Learning Model and feature selection Model selection issues We have seen some of this before Selecting features (or basis functions) Logistic regression SVMs Selecting parameter value Prior

More information

PROBLEM 4

PROBLEM 4 PROBLEM 2 PROBLEM 4 PROBLEM 5 PROBLEM 6 PROBLEM 7 PROBLEM 8 PROBLEM 9 PROBLEM 10 PROBLEM 11 PROBLEM 12 PROBLEM 13 PROBLEM 14 PROBLEM 16 PROBLEM 17 PROBLEM 22 PROBLEM 23 PROBLEM 24 PROBLEM 25

More information

Model Complexity and Generalization

Model Complexity and Generalization HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Generalization Learning Curves Underfit Generalization

More information

Data Structures III: K-D

Data Structures III: K-D Lab 6 Data Structures III: K-D Trees Lab Objective: Nearest neighbor search is an optimization problem that arises in applications such as computer vision, pattern recognition, internet marketing, and

More information

Decision Trees: Discussion

Decision Trees: Discussion Decision Trees: Discussion Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning

More information

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18 Lecture 6: k-nn Cross-validation Regularization LEARNING METHODS Lazy vs eager learning Eager learning generalizes training data before

More information

Sorting and Selection

Sorting and Selection Sorting and Selection Introduction Divide and Conquer Merge-Sort Quick-Sort Radix-Sort Bucket-Sort 10-1 Introduction Assuming we have a sequence S storing a list of keyelement entries. The key of the element

More information

CS 4100 // artificial intelligence

CS 4100 // artificial intelligence CS 4100 // artificial intelligence instructor: byron wallace Constraint Satisfaction Problems Attribution: many of these slides are modified versions of those distributed with the UC Berkeley CS188 materials

More information

Machine Learning in Telecommunications

Machine Learning in Telecommunications Machine Learning in Telecommunications Paulos Charonyktakis & Maria Plakia Department of Computer Science, University of Crete Institute of Computer Science, FORTH Roadmap Motivation Supervised Learning

More information

List of Exercises: Data Mining 1 December 12th, 2015

List of Exercises: Data Mining 1 December 12th, 2015 List of Exercises: Data Mining 1 December 12th, 2015 1. We trained a model on a two-class balanced dataset using five-fold cross validation. One person calculated the performance of the classifier by measuring

More information

CSE 410 Computer Systems. Hal Perkins Spring 2010 Lecture 12 More About Caches

CSE 410 Computer Systems. Hal Perkins Spring 2010 Lecture 12 More About Caches CSE 4 Computer Systems Hal Perkins Spring Lecture More About Caches Reading Computer Organization and Design Section 5. Introduction Section 5. Basics of Caches Section 5. Measuring and Improving Cache

More information

6. Learning Partitions of a Set

6. Learning Partitions of a Set 6. Learning Partitions of a Set Also known as clustering! Usually, we partition sets into subsets with elements that are somewhat similar (and since similarity is often task dependent, different partitions

More information

h=[3,2,5,7], pos=[2,1], neg=[4,4]

h=[3,2,5,7], pos=[2,1], neg=[4,4] 2D1431 Machine Learning Lab 1: Concept Learning & Decision Trees Frank Hoffmann e-mail: hoffmann@nada.kth.se November 8, 2002 1 Introduction You have to prepare the solutions to the lab assignments prior

More information

Why Use Graphs? Test Grade. Time Sleeping (Hrs) Time Sleeping (Hrs) Test Grade

Why Use Graphs? Test Grade. Time Sleeping (Hrs) Time Sleeping (Hrs) Test Grade Analyzing Graphs Why Use Graphs? It has once been said that a picture is worth a thousand words. This is very true in science. In science we deal with numbers, some times a great many numbers. These numbers,

More information

Assignment 4 CSE 517: Natural Language Processing

Assignment 4 CSE 517: Natural Language Processing Assignment 4 CSE 517: Natural Language Processing University of Washington Winter 2016 Due: March 2, 2016, 1:30 pm 1 HMMs and PCFGs Here s the definition of a PCFG given in class on 2/17: A finite set

More information

Model Selection and Assessment

Model Selection and Assessment Model Selection and Assessment CS4780/5780 Machine Learning Fall 2014 Thorsten Joachims Cornell University Reading: Mitchell Chapter 5 Dietterich, T. G., (1998). Approximate Statistical Tests for Comparing

More information

Announcements. CS 188: Artificial Intelligence Spring Today. Example: Map-Coloring. Example: Cryptarithmetic.

Announcements. CS 188: Artificial Intelligence Spring Today. Example: Map-Coloring. Example: Cryptarithmetic. CS 188: Artificial Intelligence Spring 2010 Lecture 5: CSPs II 2/2/2010 Pieter Abbeel UC Berkeley Many slides from Dan Klein Announcements Project 1 due Thursday Lecture videos reminder: don t count on

More information

Lecture 20: Bagging, Random Forests, Boosting

Lecture 20: Bagging, Random Forests, Boosting Lecture 20: Bagging, Random Forests, Boosting Reading: Chapter 8 STATS 202: Data mining and analysis November 13, 2017 1 / 17 Classification and Regression trees, in a nut shell Grow the tree by recursively

More information

This is a set of practice questions for the final for CS16. The actual exam will consist of problems that are quite similar to those you have

This is a set of practice questions for the final for CS16. The actual exam will consist of problems that are quite similar to those you have This is a set of practice questions for the final for CS16. The actual exam will consist of problems that are quite similar to those you have encountered on homeworks, the midterm, and on this practice

More information

Ensemble Methods: Bagging

Ensemble Methods: Bagging Ensemble Methods: Bagging Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Eric Eaton (UPenn), Jenna Wiens (UMich), Tommi Jaakola (MIT), David Kauchak (Pomona), David Sontag

More information

Database Systems CSE 414

Database Systems CSE 414 Database Systems CSE 414 Lecture 15-16: Basics of Data Storage and Indexes (Ch. 8.3-4, 14.1-1.7, & skim 14.2-3) 1 Announcements Midterm on Monday, November 6th, in class Allow 1 page of notes (both sides,

More information

CSE4334/5334 DATA MINING

CSE4334/5334 DATA MINING CSE4334/5334 DATA MINING Lecture 4: Classification (1) CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy

More information

[key, Left subtree, Right subtree]

[key, Left subtree, Right subtree] Project: Binary Search Trees A binary search tree is a method to organize data, together with operations on these data (i.e., it is a data structure). In particular, the operation that this organization

More information

1 of 5 5/11/2006 12:10 AM CS 61A Spring 2006 Midterm 2 solutions 1. Box and pointer. Note: Please draw actual boxes, as in the book and the lectures, not XX and X/ as in these ASCII-art solutions. Also,

More information

Decision Tree Learning

Decision Tree Learning Decision Tree Learning Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata August 25, 2014 Example: Age, Income and Owning a flat Monthly income (thousand rupees) 250 200 150

More information

CMPSCI611: The SUBSET-SUM Problem Lecture 18

CMPSCI611: The SUBSET-SUM Problem Lecture 18 CMPSCI611: The SUBSET-SUM Problem Lecture 18 We begin today with the problem we didn t get to at the end of last lecture the SUBSET-SUM problem, which we also saw back in Lecture 8. The input to SUBSET-

More information

Fundamentals. Fundamentals. Fundamentals. We build up instructions from three types of materials

Fundamentals. Fundamentals. Fundamentals. We build up instructions from three types of materials Fundamentals We build up instructions from three types of materials Constants Expressions Fundamentals Constants are just that, they are values that don t change as our macros are executing Fundamentals

More information

Supervised Learning for Image Segmentation

Supervised Learning for Image Segmentation Supervised Learning for Image Segmentation Raphael Meier 06.10.2016 Raphael Meier MIA 2016 06.10.2016 1 / 52 References A. Ng, Machine Learning lecture, Stanford University. A. Criminisi, J. Shotton, E.

More information

CSE 158. Web Mining and Recommender Systems. Midterm recap

CSE 158. Web Mining and Recommender Systems. Midterm recap CSE 158 Web Mining and Recommender Systems Midterm recap Midterm on Wednesday! 5:10 pm 6:10 pm Closed book but I ll provide a similar level of basic info as in the last page of previous midterms CSE 158

More information

Lecture 5. Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs

Lecture 5. Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs Lecture 5 Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs Reading: Randomized Search Trees by Aragon & Seidel, Algorithmica 1996, http://sims.berkeley.edu/~aragon/pubs/rst96.pdf;

More information

CSE 473: Artificial Intelligence

CSE 473: Artificial Intelligence CSE 473: Artificial Intelligence Constraint Satisfaction Luke Zettlemoyer Multiple slides adapted from Dan Klein, Stuart Russell or Andrew Moore What is Search For? Models of the world: single agent, deterministic

More information

Classification and Regression

Classification and Regression Classification and Regression Announcements Study guide for exam is on the LMS Sample exam will be posted by Monday Reminder that phase 3 oral presentations are being held next week during workshops Plan

More information

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Exploratory Machine Learning studies for disruption prediction on DIII-D by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Presented at the 2 nd IAEA Technical Meeting on

More information

An introduction to classification and regression trees with PROC HPSPLIT Peter L. Flom Peter Flom Consulting, LLC

An introduction to classification and regression trees with PROC HPSPLIT Peter L. Flom Peter Flom Consulting, LLC Paper AA-42 An introduction to classification and regression trees with PROC HPSPLIT Peter L. Flom Peter Flom Consulting, LLC ABSTRACT Classification and regression trees are extremely intuitive to read

More information

Cluster Analysis. CSE634 Data Mining

Cluster Analysis. CSE634 Data Mining Cluster Analysis CSE634 Data Mining Agenda Introduction Clustering Requirements Data Representation Partitioning Methods K-Means Clustering K-Medoids Clustering Constrained K-Means clustering Introduction

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Lecture 10 - Classification trees Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey

More information