Overview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8

Similar documents
MODULE 7 Nearest Neighbour Classifier and its variants LESSON 11. Nearest Neighbour Classifier. Keywords: K Neighbours, Weighted, Nearest Neighbour

CSE 6242/CX Ensemble Methods. Or, Model Combination. Based on lecture by Parikshit Ram

Naïve Bayes for text classification

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Random Forest A. Fornaser

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Contents. Preface to the Second Edition

Nonparametric Classification Methods

Using Machine Learning to Identify Security Issues in Open-Source Libraries. Asankhaya Sharma Yaqin Zhou SourceClear

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)

Performance Evaluation of Various Classification Algorithms

Applied Statistics for Neuroscientists Part IIa: Machine Learning

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar

Network Traffic Measurements and Analysis

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University

Statistics 202: Statistical Aspects of Data Mining

Applying Supervised Learning

IEE 520 Data Mining. Project Report. Shilpa Madhavan Shinde

CPSC 340: Machine Learning and Data Mining. Outlier Detection Fall 2018

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

An Empirical Comparison of Ensemble Methods Based on Classification Trees. Mounir Hamza and Denis Larocque. Department of Quantitative Methods

CSC 411 Lecture 4: Ensembles I

On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

An introduction to random forests

1 Training/Validation/Testing

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

CPSC Coding Project (due December 17)

7. Boosting and Bagging Bagging

Batch-Incremental vs. Instance-Incremental Learning in Dynamic and Evolving Data

R (2) Data analysis case study using R for readily available data set using any one machine learning algorithm.

Contents. ACE Presentation. Comparison with existing frameworks. Technical aspects. ACE 2.0 and future work. 24 October 2009 ACE 2

CPSC 340: Machine Learning and Data Mining. Non-Linear Regression Fall 2016

Ensemble Methods: Bagging

ECG782: Multidimensional Digital Signal Processing

Data Mining Lecture 8: Decision Trees

Classification Algorithms in Data Mining

IBL and clustering. Relationship of IBL with CBR

Disease Prediction in Data Mining

Supervised vs unsupervised clustering

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

Classifier Inspired Scaling for Training Set Selection

Table Of Contents: xix Foreword to Second Edition

Machine Learning. Chao Lan

Contents. Foreword to Second Edition. Acknowledgments About the Authors

An Unsupervised Approach for Combining Scores of Outlier Detection Techniques, Based on Similarity Measures

Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018

What to come. There will be a few more topics we will cover on supervised learning

Introduction to Artificial Intelligence

Text Categorization (I)

Features: representation, normalization, selection. Chapter e-9

Performance Measures

Lecture 9: Support Vector Machines

Machine Learning: Think Big and Parallel

Machine Learning Techniques for Data Mining

Slides for Data Mining by I. H. Witten and E. Frank

Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy

CAMCOS Report Day. December 9 th, 2015 San Jose State University Project Theme: Classification

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Machine Learning Duncan Anderson Managing Director, Willis Towers Watson

Semi-Supervised Clustering with Partial Background Information

CPSC 340: Machine Learning and Data Mining. Multi-Dimensional Scaling Fall 2017

Orange Model Maps Documentation v0.2.8

CS 229 Midterm Review

Exploratory Analysis: Clustering

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

DATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:

Chuck Cartledge, PhD. 23 September 2017

Natural Language Processing

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013

Predicting Gene Function and Localization

Machine Learning Lecture 3

Machine Learning Lecture 3

Machine Learning. Classification

INTRODUCTION TO ARTIFICIAL INTELLIGENCE

Data Science Course Content

Image Classification using Bag of Visual Words

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

From dynamic classifier selection to dynamic ensemble selection Albert H.R. Ko, Robert Sabourin, Alceu Souza Britto, Jr.

Machine Learning. Supervised Learning. Manfred Huber

ADVANCED CLASSIFICATION TECHNIQUES

Pattern recognition (3)

More Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA

Gene Clustering & Classification

Classifying Building Energy Consumption Behavior Using an Ensemble of Machine Learning Methods

Lecture 06 Decision Trees I

April 3, 2012 T.C. Havens

Random Forests and Boosting

Bioinformatics - Lecture 07

CSC411 Fall 2014 Machine Learning & Data Mining. Ensemble Methods. Slides by Rich Zemel

CSE4334/5334 DATA MINING

KTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn

CSE 573: Artificial Intelligence Autumn 2010

Non-Parametric Modeling

Transcription:

Tutorial 3 1 / 8

Overview Non-Parametrics Models Definitions KNN Ensemble Methods Definitions, Examples Random Forests Clustering Definitions, Examples k-means Clustering 2 / 8

Non-Parametrics Models Definitions Definitions 3 / 8

Non-Parametrics Models Definitions Definitions Parametric Models 3 / 8

Non-Parametrics Models Definitions Definitions Parametric Models Fixed number of parameters - learned (estimated) from data More data More accurate models. 3 / 8

Non-Parametrics Models Definitions Definitions Parametric Models Fixed number of parameters - learned (estimated) from data More data More accurate models. Non-parametric Models 3 / 8

Non-Parametrics Models Definitions Definitions Parametric Models Fixed number of parameters - learned (estimated) from data More data More accurate models. Non-parametric Models Number of parameters grows with the amount of data More data More complex models. 3 / 8

Non-Parametrics Models Definitions Definitions Parametric Models Fixed number of parameters - learned (estimated) from data More data More accurate models. Non-parametric Models Number of parameters grows with the amount of data More data More complex models. Parametric or Non-parametric? What are the parameters? Decision Trees Naive Bayes KNN Random Forests K-Means Clustering 3 / 8

Non-Parametrics Models KNN k-nearest Neighbour 4 / 8

Non-Parametrics Models KNN k-nearest Neighbour How does it work? 4 / 8

Non-Parametrics Models KNN k-nearest Neighbour How does it work? What is the effect of k with respect to the fundamental tradeoff in machine learning? 4 / 8

Non-Parametrics Models KNN k-nearest Neighbour How does it work? What is the effect of k with respect to the fundamental tradeoff in machine learning? What is the runtime of a naive implementation? How could you speed this up? 4 / 8

Ensemble Methods Definitions, Examples Ensemble Methods 5 / 8

Ensemble Methods Definitions, Examples Ensemble Methods Learning algorithms that take classifiers as input and use the output of each classifier to determine a classification 5 / 8

Ensemble Methods Definitions, Examples Ensemble Methods Learning algorithms that take classifiers as input and use the output of each classifier to determine a classification Averaging Take the average of the outputs of each classifier (or mode if categorical) 5 / 8

Ensemble Methods Definitions, Examples Ensemble Methods Learning algorithms that take classifiers as input and use the output of each classifier to determine a classification Averaging Take the average of the outputs of each classifier (or mode if categorical) Bagging Each classifier in the ensemble votes on an output with equal weight Each classifier is trained with a random subset of the training set 5 / 8

Ensemble Methods Definitions, Examples Ensemble Methods Learning algorithms that take classifiers as input and use the output of each classifier to determine a classification Averaging Take the average of the outputs of each classifier (or mode if categorical) Bagging Each classifier in the ensemble votes on an output with equal weight Each classifier is trained with a random subset of the training set Boosting Incrementally build the ensemble. When training new models higher weight is given to data that was mis-classified by previous models 5 / 8

Ensemble Methods Definitions, Examples Ensemble Methods Learning algorithms that take classifiers as input and use the output of each classifier to determine a classification Averaging Take the average of the outputs of each classifier (or mode if categorical) Bagging Each classifier in the ensemble votes on an output with equal weight Each classifier is trained with a random subset of the training set Boosting Incrementally build the ensemble. When training new models higher weight is given to data that was mis-classified by previous models Stacking Train a classifier to combine the predictions of the other classifiers 5 / 8

Ensemble Methods Definitions, Examples Ensemble Methods Learning algorithms that take classifiers as input and use the output of each classifier to determine a classification Averaging Take the average of the outputs of each classifier (or mode if categorical) Bagging Each classifier in the ensemble votes on an output with equal weight Each classifier is trained with a random subset of the training set Boosting Incrementally build the ensemble. When training new models higher weight is given to data that was mis-classified by previous models Stacking Train a classifier to combine the predictions of the other classifiers And more! 5 / 8

Ensemble Methods Random Forests Random Forests 6 / 8

Ensemble Methods Random Forests Random Forests How do they work? How do you train them? 6 / 8

Ensemble Methods Random Forests Random Forests How do they work? How do you train them? 1. Create several bootstrap samples of the data 6 / 8

Ensemble Methods Random Forests Random Forests How do they work? How do you train them? 1. Create several bootstrap samples of the data 2. Train a random decision tree on each bootstrap sample 6 / 8

Ensemble Methods Random Forests Random Forests How do they work? How do you train them? 1. Create several bootstrap samples of the data 2. Train a random decision tree on each bootstrap sample 3. Test by averaging the predictions of each tree 6 / 8

Ensemble Methods Random Forests Random Forests How do they work? How do you train them? 1. Create several bootstrap samples of the data 2. Train a random decision tree on each bootstrap sample 3. Test by averaging the predictions of each tree How does the number of trees affect the fundmental tradeoff of machine learning? 6 / 8

Ensemble Methods Random Forests Random Forests How do they work? How do you train them? 1. Create several bootstrap samples of the data 2. Train a random decision tree on each bootstrap sample 3. Test by averaging the predictions of each tree How does the number of trees affect the fundmental tradeoff of machine learning? How does the amount of randomness in the trees affect the fundamental tradeoff of machine learning? 6 / 8

Clustering Definitions, Examples Clustering 7 / 8

Clustering Definitions, Examples Clustering An unsupervised method - not given labels, but want to learn something about the data Specifically the classes, or groups, that the data falls into 7 / 8

Clustering Definitions, Examples Clustering An unsupervised method - not given labels, but want to learn something about the data Specifically the classes, or groups, that the data falls into Classes are determined by similarty between data and dissimilarity to other classes 7 / 8

Clustering Definitions, Examples Clustering An unsupervised method - not given labels, but want to learn something about the data Specifically the classes, or groups, that the data falls into Classes are determined by similarty between data and dissimilarity to other classes e.g. Types of genes, variants of a disease, topics on Wikipedia, friends on Facebook, etc. 7 / 8

Clustering k-means Clustering k-means 8 / 8

Clustering k-means Clustering k-means How does it work? 8 / 8

Clustering k-means Clustering k-means How does it work? K++ means - what problem does this address? 8 / 8

Clustering k-means Clustering k-means How does it work? K++ means - what problem does this address? Label switching problem 8 / 8