# Nonparametric Methods Recap

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Nonparametric Methods Recap Aarti Singh Machine Learning / Oct 4, 2010

2 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority vote Kernel Regression Weighted average where 2

3 Kernel Regression as Weighted Least Squares Weighted Least Squares Kernel regression corresponds to locally constant estimator obtained from (locally) weighted least squares i.e. set f(x i ) = b (a constant) 3

4 Kernel Regression as Weighted Least set f(x i ) = b (a constant) Squares constant Notice that 4

5 Local Linear/Polynomial Regression Weighted Least Squares Local Polynomial regression corresponds to locally polynomial estimator obtained from (locally) weighted least squares i.e. set (local polynomial of degree p around X) More in (statistical machine learning) 5

6 Summary Parametric vs Nonparametric approaches Nonparametric models place very mild assumptions on the data distribution and provide good models for complex data Parametric models rely on very strong (simplistic) distributional assumptions Nonparametric models (not histograms) requires storing and computing with the entire data set. Parametric models, once fitted, are much more efficient in terms of storage and computation. 6

7 Summary Instance based/non-parametric approaches Four things make a memory based learner: 1. A distance metric, dist(x,x i ) Euclidean (and many more) 2. How many nearby neighbors/radius to look at? k, D/h 3. A weighting function (optional) W based on kernel K 4. How to fit with the local points? Average, Majority vote, Weighted average, Poly fit 7

8 What you should know Histograms, Kernel density estimation Effect of bin width/ kernel bandwidth Bias-variance tradeoff K-NN classifier Nonlinear decision boundaries Kernel (local) regression Interpretation as weighted least squares Local constant/linear/polynomial regression 8

9 Practical Issues in Machine Learning Overfitting and Model selection Aarti Singh Machine Learning / Oct 4, 2010

10 True vs. Empirical Risk True Risk: Target performance measure Classification Probability of misclassification Regression Mean Squared Error performance on a random test point (X,Y) Empirical Risk: Performance on training data Classification Proportion of misclassified examples Regression Average Squared Error

11 Overfitting Is the following predictor a good one? What is its empirical risk? (performance on training data) zero! What about true risk? > zero Will predict very poorly on new random test point: Large generalization error!

12 Weight Weight Overfitting If we allow very complicated predictors, we could overfit the training data. Examples: Classification (0-NN classifier) Football player? No Yes Height Height

13 Overfitting If we allow very complicated predictors, we could overfit the training data. Examples: Regression (Polynomial of order k degree up to k-1) 1.5 k=1 k= k=3 1.2 k=

14 Effect of Model Complexity If we allow very complicated predictors, we could overfit the training data. fixed # training data Empirical risk is no longer a good indicator of true risk

15 Behavior of True Risk Want to be as good as optimal predictor Excess Risk finite sample size + noise Due to randomness of training data Due to restriction of model class Estimation error Excess risk Approx. error

16 Behavior of True Risk

17 Bias Variance Tradeoff Regression:... Notice: Optimal predictor does not have zero error variance bias^2 Noise var Excess Risk = = variance + bias^2 Random component est err approx err

18 Bias Variance Tradeoff: Derivation Regression: Notice: Optimal predictor does not have zero error 0

19 Bias Variance Tradeoff: Derivation Regression: Notice: Optimal predictor does not have zero error Now, lets look at the second term: variance how much does the predictor vary about its mean for different training datasets Note: this term doesn t depend on D n

20 Bias Variance Tradeoff: Derivation 0 since noise is independent and zero mean bias^2 how much does the mean of the predictor differ from the optimal predictor noise variance

21 Bias Variance Tradeoff 3 Independent training datasets Large bias, Small variance poor approximation but robust/stable Small bias, Large variance good approximation but instable

22 Examples of Model Spaces Model Spaces with increasing complexity: Nearest-Neighbor classifiers with varying neighborhood sizes k = 1,2,3, Small neighborhood => Higher complexity Decision Trees with depth k or with k leaves Higher depth/ More # leaves => Higher complexity Regression with polynomials of order k = 0, 1, 2, Higher degree => Higher complexity Kernel Regression with bandwidth h Small bandwidth => Higher complexity How can we select the right complexity model?

23 Model Selection Setup: Model Classes of increasing complexity We can select the right complexity model in a data-driven/adaptive way: Cross-validation Structural Risk Minimization Complexity Regularization Information Criteria - AIC, BIC, Minimum Description Length (MDL)

24 Hold-out method We would like to pick the model that has smallest generalization error. Can judge generalization error by using an independent sample of data. Hold out procedure: n data points available 1) Split into two sets: Training dataset Validation dataset NOT test Data!! 2) Use D T for training a predictor from each model class: Evaluated on training dataset D T

25 Hold-out method 3) Use Dv to select the model class which has smallest empirical error on D v 4) Hold-out predictor Evaluated on validation dataset D V Intuition: Small error on one set of data will not imply small error on a randomly sub-sampled second set of data Ensures method is stable

26 Hold-out method Drawbacks: May not have enough data to afford setting one subset aside for getting a sense of generalization abilities Validation error may be misleading (bad estimate of generalization error) if we get an unfortunate split Limitations of hold-out can be overcome by a family of random subsampling methods at the expense of more computation.

27 Cross-validation K-fold cross-validation Create K-fold partition of the dataset. Form K hold-out predictors, each time using one partition as validation and rest K-1 as training datasets. Final predictor is average/majority vote over the K hold-out estimates. training validation Run 1 Run 2 Run K

28 Cross-validation Leave-one-out (LOO) cross-validation Special case of K-fold with K=n partitions Equivalently, train on n-1 samples and validate on only one sample per run for n runs training validation Run 1 Run 2 Run K

29 Cross-validation Random subsampling Randomly subsample a fixed fraction αn (0< α <1) of the dataset for validation. Form hold-out predictor with remaining data as training data. Repeat K times Final predictor is average/majority vote over the K hold-out estimates. training validation Run 1 Run 2 Run K

30 Estimating generalization error Generalization error Hold-out 1-fold: Error estimate = K-fold/LOO/random Error estimate = sub-sampling: We want to estimate the error of a predictor based on n data points. If K is large (close to n), bias of error estimate is small since each training set has close to n data points. However, variance of error estimate is high since each validation set has fewer data points and might deviate a lot from the mean. Run 1 Run 2 Run K training validation

31 Practical Issues in Cross-validation How to decide the values for K and α? Large K + The bias of the error estimate will be small - The variance of the error estimate will be large (few validation pts) - The computational time will be very large as well (many experiments) Small K + The # experiments and, therefore, computation time are reduced + The variance of the error estimate will be small (many validation pts) - The bias of the error estimate will be large Common choice: K = 10, a = 0.1

32 Structural Risk Minimization Penalize models using bound on deviation of true and empirical risks. With high probability, Bound on deviation from true risk Concentration bounds (later) High probability Upper bound on true risk C(f) - large for complex models

33 Structural Risk Minimization Deviation bounds are typically pretty loose, for small sample sizes. In practice, Choose by cross-validation! Problem: Identify flood plain from noisy satellite images Noiseless image Noisy image True Flood plain (elevation level > x)

34 Structural Risk Minimization Deviation bounds are typically pretty loose, for small sample sizes. In practice, Choose by cross-validation! Problem: Identify flood plain from noisy satellite images True Flood plain (elevation level > x) Zero penalty CV penalty Theoretical penalty

35 Occam s Razor William of Ockham ( ) Principle of Parsimony: One should not increase, beyond what is necessary, the number of entities required to explain anything. Alternatively, seek the simplest explanation. Penalize complex models based on Prior information (bias) Information Criterion (MDL, AIC, BIC)

36 Importance of Domain knowledge Oil Spill Contamination Distribution of photon arrivals Compton Gamma-Ray Observatory Burst and Transient Source Experiment (BATSE)

37 Complexity Regularization Penalize complex models using prior knowledge. Bayesian viewpoint: prior probability of f, p(f) Cost of model (log prior) cost is small if f is highly probable, cost is large if f is improbable ERM (empirical risk minimization) over a restricted class F uniform prior on f ϵ F, zero probability for other predictors

38 Complexity Regularization Penalize complex models using prior knowledge. Cost of model (log prior) Examples: MAP estimators Regularized Linear Regression - Ridge Regression, Lasso How to choose tuning parameter λ? Cross-validation Penalize models based on some norm of regression coefficients

39 Information Criteria AIC, BIC Penalize complex models based on their information content. # bits needed to describe f (description length) AIC (Akiake IC) C(f) = # parameters Allows # parameters to be infinite as # training data n become large BIC (Bayesian IC) C(f) = # parameters * log n Penalizes complex models more heavily limits complexity of models as # training data n become large

40 Information Criteria - MDL Penalize complex models based on their information content. MDL (Minimum Description Length) # bits needed to describe f (description length) Example: Binary Decision trees k leaves => 2k 1 nodes 2k 1 bits to encode tree structure + k bits to encode label of each leaf (0/1) 5 leaves => 9 bits to encode structure

41 Summary True and Empirical Risk Over-fitting Approx err vs Estimation err, Bias vs Variance tradeoff Model Selection, Estimating Generalization Error Hold-out, K-fold cross-validation Structural Risk Minimization Complexity Regularization Information Criteria AIC, BIC, MDL

### What is machine learning?

Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

### Model Complexity and Generalization

HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Generalization Learning Curves Underfit Generalization

### CSE446: Linear Regression. Spring 2017

CSE446: Linear Regression Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Prediction of continuous variables Billionaire says: Wait, that s not what I meant! You say: Chill

### Distribution-free Predictive Approaches

Distribution-free Predictive Approaches The methods discussed in the previous sections are essentially model-based. Model-free approaches such as tree-based classification also exist and are popular for

### Model selection and validation 1: Cross-validation

Model selection and validation 1: Cross-validation Ryan Tibshirani Data Mining: 36-462/36-662 March 26 2013 Optional reading: ISL 2.2, 5.1, ESL 7.4, 7.10 1 Reminder: modern regression techniques Over the

### BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics Lecture 12: Ensemble Learning I Jie Wang Department of Computational Medicine & Bioinformatics University of Michigan 1 Outline Bias

### EE 511 Linear Regression

EE 511 Linear Regression Instructor: Hanna Hajishirzi hannaneh@washington.edu Slides adapted from Ali Farhadi, Mari Ostendorf, Pedro Domingos, Carlos Guestrin, and Luke Zettelmoyer, Announcements Hw1 due

### Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

### Nonparametric Approaches to Regression

Nonparametric Approaches to Regression In traditional nonparametric regression, we assume very little about the functional form of the mean response function. In particular, we assume the model where m(xi)

### CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017

CPSC 340: Machine Learning and Data Mining More Regularization Fall 2017 Assignment 3: Admin Out soon, due Friday of next week. Midterm: You can view your exam during instructor office hours or after class

### Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART V Credibility: Evaluating what s been learned 10/25/2000 2 Evaluation: the key to success How

### MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge

### The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

### Multicollinearity and Validation CIVL 7012/8012

Multicollinearity and Validation CIVL 7012/8012 2 In Today s Class Recap Multicollinearity Model Validation MULTICOLLINEARITY 1. Perfect Multicollinearity 2. Consequences of Perfect Multicollinearity 3.

### Machine Learning. Topic 4: Linear Regression Models

Machine Learning Topic 4: Linear Regression Models (contains ideas and a few images from wikipedia and books by Alpaydin, Duda/Hart/ Stork, and Bishop. Updated Fall 205) Regression Learning Task There

### Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

### 1) Give decision trees to represent the following Boolean functions:

1) Give decision trees to represent the following Boolean functions: 1) A B 2) A [B C] 3) A XOR B 4) [A B] [C Dl Answer: 1) A B 2) A [B C] 1 3) A XOR B = (A B) ( A B) 4) [A B] [C D] 2 2) Consider the following

### CART. Classification and Regression Trees. Rebecka Jörnsten. Mathematical Sciences University of Gothenburg and Chalmers University of Technology

CART Classification and Regression Trees Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology CART CART stands for Classification And Regression Trees.

CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal

### model order p weights The solution to this optimization problem is obtained by solving the linear system

CS 189 Introduction to Machine Learning Fall 2017 Note 3 1 Regression and hyperparameters Recall the supervised regression setting in which we attempt to learn a mapping f : R d R from labeled examples

### CS4491/CS 7265 BIG DATA ANALYTICS

CS4491/CS 7265 BIG DATA ANALYTICS EVALUATION * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Dr. Mingon Kang Computer Science, Kennesaw State University Evaluation for

### Nearest neighbors classifiers

Nearest neighbors classifiers James McInerney Adapted from slides by Daniel Hsu Sept 11, 2017 1 / 25 Housekeeping We received 167 HW0 submissions on Gradescope before midnight Sept 10th. From a random

### Lecture 19: Decision trees

Lecture 19: Decision trees Reading: Section 8.1 STATS 202: Data mining and analysis November 10, 2017 1 / 17 Decision trees, 10,000 foot view R2 R5 t4 1. Find a partition of the space of predictors. X2

### Classification Algorithms in Data Mining

August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms

### Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Model Selection Matt Gormley Lecture 4 January 29, 2018 1 Q&A Q: How do we deal

### Exploring Econometric Model Selection Using Sensitivity Analysis

Exploring Econometric Model Selection Using Sensitivity Analysis William Becker Paolo Paruolo Andrea Saltelli Nice, 2 nd July 2013 Outline What is the problem we are addressing? Past approaches Hoover

### Statistical Consulting Topics Using cross-validation for model selection. Cross-validation is a technique that can be used for model evaluation.

Statistical Consulting Topics Using cross-validation for model selection Cross-validation is a technique that can be used for model evaluation. We often fit a model to a full data set and then perform

### Machine Learning: Think Big and Parallel

Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least

### Introduction to Classification & Regression Trees

Introduction to Classification & Regression Trees ISLR Chapter 8 vember 8, 2017 Classification and Regression Trees Carseat data from ISLR package Classification and Regression Trees Carseat data from

### Machine Learning (CS 567)

Machine Learning (CS 567) Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol Han (cheolhan@usc.edu)

### Nearest Neighbor Classification

Nearest Neighbor Classification Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 11, 2017 1 / 48 Outline 1 Administration 2 First learning algorithm: Nearest

### MLCC 2018 Local Methods and Bias Variance Trade-Off. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Local Methods and Bias Variance Trade-Off Lorenzo Rosasco UNIGE-MIT-IIT About this class 1. Introduce a basic class of learning methods, namely local methods. 2. Discuss the fundamental concept

### Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall

Midterm Exam Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Covers topics through Decision Trees and Random Forests (does not include constraint satisfaction) Closed book 8.5 x 11 sheet with notes

### These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Machine Learning Algorithms (IFT6266 A7) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

### Lecture 9: Support Vector Machines

Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and

### Chapter 5 Efficient Memory Information Retrieval

Chapter 5 Efficient Memory Information Retrieval In this chapter, we will talk about two topics: (1) What is a kd-tree? (2) How can we use kdtrees to speed up the memory-based learning algorithms? Since

### Introduction to Machine Learning. Xiaojin Zhu

Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006

### Nominal Data. May not have a numerical representation Distance measures might not make sense PR, ANN, & ML

Decision Trees Nominal Data So far we consider patterns to be represented by feature vectors of real or integer values Easy to come up with a distance (similarity) measure by using a variety of mathematical

### Classification Key Concepts

http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Classification Key Concepts Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech 1 How will

### CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks

CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks Part IV 1 Function approximation MLP is both a pattern classifier and a function approximator As a function approximator,

### CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 9, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, 2014 1 / 47

### CS570: Introduction to Data Mining

CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.

### 9/6/14. Our first learning algorithm. Comp 135 Introduction to Machine Learning and Data Mining. knn Algorithm. knn Algorithm (simple form)

Comp 135 Introduction to Machine Learning and Data Mining Our first learning algorithm How would you classify the next example? Fall 2014 Professor: Roni Khardon Computer Science Tufts University o o o

### What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

What is Learning? CS 343: Artificial Intelligence Machine Learning Herbert Simon: Learning is any process by which a system improves performance from experience. What is the task? Classification Problem

### A Systematic Overview of Data Mining Algorithms

A Systematic Overview of Data Mining Algorithms 1 Data Mining Algorithm A well-defined procedure that takes data as input and produces output as models or patterns well-defined: precisely encoded as a

### Subsemble: A Flexible Subset Ensemble Prediction Method. Stephanie Karen Sapp. A dissertation submitted in partial satisfaction of the

Subsemble: A Flexible Subset Ensemble Prediction Method by Stephanie Karen Sapp A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Statistics

### Lecture #11: The Perceptron

Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be

### STA121: Applied Regression Analysis

STA121: Applied Regression Analysis Variable Selection - Chapters 8 in Dielman Artin Department of Statistical Science October 23, 2009 Outline Introduction 1 Introduction 2 3 4 Variable Selection Model

### Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A.

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Credibility: Evaluating what s been learned Issues: training, testing,

### Classification: Linear Discriminant Functions

Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions

### Probabilistic Classifiers DWML, /27

Probabilistic Classifiers DWML, 2007 1/27 Probabilistic Classifiers Conditional class probabilities Id. Savings Assets Income Credit risk 1 Medium High 75 Good 2 Low Low 50 Bad 3 High Medium 25 Bad 4 Medium

### Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,

### Supervised vs unsupervised clustering

Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful

1 Trade-offs in Explanatory 21 st of February 2012 Model Learning Data Analysis Project Madalina Fiterau DAP Committee Artur Dubrawski Jeff Schneider Geoff Gordon 2 Outline Motivation: need for interpretable

### CISC 4631 Data Mining

CISC 4631 Data Mining Lecture 05: Overfitting Evaluation: accuracy, precision, recall, ROC Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Eamonn Koegh (UC Riverside)

### Perceptron: This is convolution!

Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image

### 7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech

Foundations of Machine Learning CentraleSupélec Paris Fall 2016 7. Nearest neighbors Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning

### SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin. April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1

SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1 Overview The goals of analyzing cross-sectional data Standard methods used

### Class 6 Large-Scale Image Classification

Class 6 Large-Scale Image Classification Liangliang Cao, March 7, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Visual

### Nearest Neighbor Methods

Nearest Neighbor Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Nearest Neighbor Methods Learning Store all training examples Classifying a

### INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

INTRODUCTION TO MACHINE LEARNING Measuring model performance or error Is our model any good? Context of task Accuracy Computation time Interpretability 3 types of tasks Classification Regression Clustering

### Building Classifiers using Bayesian Networks

Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance

### Classification Key Concepts

http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Classification Key Concepts Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech Parishit

### Chapter 4: Non-Parametric Techniques

Chapter 4: Non-Parametric Techniques Introduction Density Estimation Parzen Windows Kn-Nearest Neighbor Density Estimation K-Nearest Neighbor (KNN) Decision Rule Supervised Learning How to fit a density

### Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically

### Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to

### I211: Information infrastructure II

Data Mining: Classifier Evaluation I211: Information infrastructure II 3-nearest neighbor labeled data find class labels for the 4 data points 1 0 0 6 0 0 0 5 17 1.7 1 1 4 1 7.1 1 1 1 0.4 1 2 1 3.0 0 0.1

### Cross-validation for detecting and preventing overfitting

Cross-validation for detecting and preventing overfitting Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures.

### Support Vector Machines + Classification for IR

Support Vector Machines + Classification for IR Pierre Lison University of Oslo, Dep. of Informatics INF3800: Søketeknologi April 30, 2014 Outline of the lecture Recap of last week Support Vector Machines

### A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine

### Classification and Regression Trees

Classification and Regression Trees Matthew S. Shotwell, Ph.D. Department of Biostatistics Vanderbilt University School of Medicine Nashville, TN, USA March 16, 2018 Introduction trees partition feature

### Evaluating Classifiers

Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with

### Pattern Recognition for Neuroimaging Data

Pattern Recognition for Neuroimaging Data Edinburgh, SPM course April 2013 C. Phillips, Cyclotron Research Centre, ULg, Belgium http://www.cyclotron.ulg.ac.be Overview Introduction Univariate & multivariate

### An introduction to multi-armed bandits

An introduction to multi-armed bandits Henry WJ Reeve (Manchester) (henry.reeve@manchester.ac.uk) A joint work with Joe Mellor (Edinburgh) & Professor Gavin Brown (Manchester) Plan 1. An introduction to

### Evaluating Classifiers

Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts

### CS 559: Machine Learning Fundamentals and Applications 10 th Set of Notes

1 CS 559: Machine Learning Fundamentals and Applications 10 th Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Office: Lieb 215

### 3 Feature Selection & Feature Extraction

3 Feature Selection & Feature Extraction Overview: 3.1 Introduction 3.2 Feature Extraction 3.3 Feature Selection 3.3.1 Max-Dependency, Max-Relevance, Min-Redundancy 3.3.2 Relevance Filter 3.3.3 Redundancy

### CS 231A CA Session: Problem Set 4 Review. Kevin Chen May 13, 2016

CS 231A CA Session: Problem Set 4 Review Kevin Chen May 13, 2016 PS4 Outline Problem 1: Viewpoint estimation Problem 2: Segmentation Meanshift segmentation Normalized cut Problem 1: Viewpoint Estimation

### CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM

CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM 4.1 Introduction Nowadays money investment in stock market gains major attention because of its dynamic nature. So the

### Overview and Practical Application of Machine Learning in Pricing

Overview and Practical Application of Machine Learning in Pricing 2017 CAS Spring Meeting May 23, 2017 Duncan Anderson and Claudine Modlin (Willis Towers Watson) Mark Richards (Allstate Insurance Company)

### Model combination. Resampling techniques p.1/34

Model combination The winner-takes-all approach is intuitively the approach which should work the best. However recent results in machine learning show that the performance of the final model can be improved

### Topic 7 Machine learning

CSE 103: Probability and statistics Winter 2010 Topic 7 Machine learning 7.1 Nearest neighbor classification 7.1.1 Digit recognition Countless pieces of mail pass through the postal service daily. A key

### Semi-supervised learning and active learning

Semi-supervised learning and active learning Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Combining classifiers Ensemble learning: a machine learning paradigm where multiple learners

### CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Fall, 2015!1 Regression Classifiers We said earlier that the task of a supervised learning system can be viewed as learning a function

### Semi-supervised Learning

Semi-supervised Learning Piyush Rai CS5350/6350: Machine Learning November 8, 2011 Semi-supervised Learning Supervised Learning models require labeled data Learning a reliable model usually requires plenty

### 2. On classification and related tasks

2. On classification and related tasks In this part of the course we take a concise bird s-eye view of different central tasks and concepts involved in machine learning and classification particularly.

### Lecture 7: Decision Trees

Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...

### Using the DATAMINE Program

6 Using the DATAMINE Program 304 Using the DATAMINE Program This chapter serves as a user s manual for the DATAMINE program, which demonstrates the algorithms presented in this book. Each menu selection

### Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

### All lecture slides will be available at CSC2515_Winter15.html

CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many

### Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric)

Splines Patrick Breheny November 20 Patrick Breheny STA 621: Nonparametric Statistics 1/46 Introduction Introduction Problems with polynomial bases We are discussing ways to estimate the regression function

### 3 Nonlinear Regression

CSC 4 / CSC D / CSC C 3 Sometimes linear models are not sufficient to capture the real-world phenomena, and thus nonlinear models are necessary. In regression, all such models will have the same basic

### Econometric Tools 1: Non-Parametric Methods

University of California, Santa Cruz Department of Economics ECON 294A (Fall 2014) - Stata Lab Instructor: Manuel Barron 1 Econometric Tools 1: Non-Parametric Methods 1 Introduction This lecture introduces

### Data Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules

Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 06/0/ Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms

### K-Nearest Neighbour Classifier. Izabela Moise, Evangelos Pournaras, Dirk Helbing

K-Nearest Neighbour Classifier Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 Reminder Supervised data mining Classification Decision Trees Izabela

### Data Mining Classification: Bayesian Decision Theory

Data Mining Classification: Bayesian Decision Theory Lecture Notes for Chapter 2 R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification, 2nd ed. New York: Wiley, 2001. Lecture Notes for Chapter

### NONPARAMETRIC REGRESSION TECHNIQUES

NONPARAMETRIC REGRESSION TECHNIQUES C&PE 940, 28 November 2005 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and other resources available at: http://people.ku.edu/~gbohling/cpe940

### Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Introduction to object recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Overview Basic recognition tasks A statistical learning approach Traditional or shallow recognition