Machine Learning. Chao Lan
|
|
- Matthew Wilcox
- 5 years ago
- Views:
Transcription
1 Machine Learning Chao Lan
2 Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian discriminant analysis, k-nearest neighbor, linear support vector machine (LSVM), decision tree, neural network, (Bayesian network) - multi-class classification with binary classifier Advanced Model - kernel, ensemble Online Model - SGD learner, HMM
3 Machine Learning Prediction Models feature vector example X google lottery cat transport... book = 1 (x1) 1 (x2) 0 (x3) 1 (x4) 0 (x5)... 0 (xp) label Y Model spam or, ham
4 Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian discriminant analysis, Bayesian network, k-nearest Neighbor, linear support vector machine (LSVM), decision tree, neural network Advanced Model - kernel, ensemble Online Model - SGD learner, HMM
5 Linear Regression Model Linear regression model has the following form - w s are regression coefficients - w0 is the bias term - y is label (e.g., how likely x is a spam) feature vector google lottery cat transport... book = 1 (x1) 1 (x2) 0 (x3) 1 (x4) 0 (x5)... 0 (xp)
6 Geometric Interpretation of Linear Regression y Linear regression model has the following form - x1 w s are regression coefficients feature vector - y w0 is the bias term y is label (e.g., how likely x is a spam) google lottery = 1 (x1) 0 (x2) x2 x1
7 Geometric Interpretation of the Bias Term w0 is the bias term. It allows the model to shift along the y axis (to better fit data). y y xi xi
8 Three Learners of the Linear Regression Model Three ways to learn w in 1. least square 2. ridge regression 3. Lasso from data.
9 Three Learners of the Linear Regression Model Three ways to learn w in 1. least square: find w that minimizes total squared error 2. ridge regression 3. Lasso from data.
10 Three Learners of the Linear Regression Model Three ways to learn w in from data. 1. least square: find w that minimizes total squared error 2. ridge regression: find w in a restricted range that minimizes total squared error 3. Lasso How would λ affect shrinkage?
11 Three Learners of the Linear Regression Model Three ways to learn w in from data. 1. least square: find w that minimizes the total squared error 2. ridge regression: find w in a restricted range that minimizes the total squared error 3. Lasso: find w with some elements eliminated that minimizes the total squared error How would λ affect elimination?
12 Coefficients Obtained by the Three Learners
13 Lasso: Automatically Eliminate Features by L1-Norm
14 Bias-Variance Tradeoff Simple model often has small variance but large bias (in estimation). Complex model often has small bias but large variance (a.k.a., overfitting).
15 Recap: Simple Model has Shrinked Range of w A model with shrinked range of unknown parameters is usually simpler. M=9 λ ~ 10-8 M=9 λ=1
16 Recap: Larger λ Learns a Simpler Model M=9 λ ~ 10-8 λ=0 λ = 10-8 λ=1 M=9 λ=1
17 Bias-Variance Tradeoff Simple model often has small variance but large bias (in estimation). Complex model often has small bias but large variance (a.k.a., overfitting). λ ~ 13 (simplest)
18 Bias-Variance Tradeoff Simple model often has small variance but large bias (in estimation). Complex model often has small bias but large variance (a.k.a., overfitting). λ ~ 0.7 (simple)
19 Bias-Variance Tradeoff Simple model often has small variance but large bias (in estimation). Complex model often has small bias but large variance (a.k.a., overfitting). λ ~ 0.09 (complex)
20 Bias-Variance Tradeoff We also have theoretical evidence:
21 Regression Methods in Python
22 Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian discriminant analysis, k-nearest neighbor, linear support vector machine (LSVM), decision tree, neural network, (Bayesian network) - multi-class classification with binary classifier Advanced Model - kernel, ensemble Online Model - SGD learner, HMM
23 A Probabilistic View of Classifier A classifier can be a table of p(y x): given an example x, what is the distribution of label y? p( y=spam x=blue ) = 0.75 spam / ham? p( y=ham x=blue ) = 0.25 p( y=spam x=red ) = 0.4 spam / ham? p( y=ham x=red ) = 0.6
24 1. Naive Bayes Estimate p(y x) using (1) the Bayes rule and (2) the independent assumption of features.
25 Example of Bayes Classifier with One Feature spam ham Goal: p( y = spam x = blue ) - p( x = blue y = spam ) = 1/4 - p( x = blue ) = 2/6 = 1/3 - p( y = spam ) = 4/6 = ⅔ p( x = blue y = spam ) * p( y = spam ) - p( y = spam x = blue ) = p( x = blue )
26 Example of Naive Classifier spam ham p(x = blue & google & lottery y = spam) = p(x is blue y = spam) * p(x has word google y = spam) * p(x has word lottery y = spam) Of course, we can also construct conditional probability for continuous variables. google lottery cat ... book = 1 (x1) 1 (x2) 0 (x3) 1 (x4)... 0 (xp)
27 Naive Bayes in Python
28 2. Gaussian Discriminant Analysis Similar to Naive Bayes, but do not make i.i.d. assumption on features. Assume p(x y=k) is normal for each class k and estimate it from data. Quadratic DA (QDA) assumes classes have different distributions. Linear DA (LDA) assumes covariance is shared by classes.
29 spam Example of GDA p(x = blue & google & lottery y = spam) x= color google lottery color.m μ = google.m lottery.m cov[c,c], cov[c,g], cov[c,l] Σ= cov[g,c], cov[g,g], cov[g,l] cov[l,c], cov[l,g], cov[l,l] (μ, Σ) can be estimated by MLE. Quadratic DA (QDA) assumes different classes have different (μ,σ). Linear DA (LDA) assumes Σ is shared by all classes. ham
30 LDA/QDA in Python
31 3. Logistic Regression Directly construct p(y x) with unknown parameters and estimate them by MLE. Logistic regression typically works with binary classification, i.e., y = 0 or 1. And β can be numerically solved by the Newton s method.
32 Regularized Logistic Regression It is logistic regression learned with restrictive range of parameter β. - larger λ gives model?
33 Logistic Regression in Python
34 4. k-nearest Neighbor Q: how to classify the green test data?
35 4. k-nearest Neighbor Q: how to classify the green test data? k-nn assigns an example to class c if most of its k nearest neighbors are from class c. We could select k nearest neighbors, or choose a distance threshold to find nearest neighbors.
36 Impact of Neighborhood Size Q: how to classify the green test data? k-nn assigns an example to class c if most of its k nearest neighbors are from class c. Q: will k=3 and k=5 give the same result?
37 Impact of Neighborhood Size Q: how to classify the green test data? k-nn assigns an example to class c if most of its k nearest neighbors are from class c. Q: will k=3 and k=5 give the same result? Classification results depend on k.
38 Model Complexity of k-nn Q: how to classify the green test data? k-nn assigns an example to class c if most of its k nearest neighbors are from class c. Q: will k=3 and k=5 give the same result? Classification results depend on k. Q: how does k affect model complexity?
39 Model Complexity of k-nn Q: how to classify the green test data? k-nn assigns an example to class c if most of its k nearest neighbors are from class c. Q: will k=3 and k=5 give the same result? Classification results depend on k. Q: how does k affect model complexity? Smaller k gives more complex model.
40 Impact of k on Model Complexity
41 k-nn in Python
42 5. Linear Support Vector Machine (LSVM) Find a linear decision boundary to maximally separate instances from two classes. LSVM typically works with binary classification, i.e., y = 0 or 1.
43 Learning LSVM Find a linear decision boundary to maximally separate instances from two classes.
44 Soft-Margin LSVM The boundary can tolerate misbehaved points (soft-margin).
45 LSVM in Python
46 6. Decision Tree Build a tree where each node filters some feature(s) for one-step classification.
47 Learning/Constructing a Decision Tree Keep splitting nodes until leaf nodes are sufficiently pure or we run out of features.
48 Another Example Build a tree to detect spam s. Features and Labels X1 = color X2 = google X4 = lottery R1, R3, R4 = spam R2, R5 = hsm
49 Another Example We want to have pure node. pure Features and Labels X1 = color X2 = google X4 = lottery R1, R3, R4 = spam R2, R5 = hsm not-so-pure
50 Decision Tree in Python
51 7. Neural Network A network of neurons for classification.
52 Neuron A neuron is (typically) a non-linearized linear function of inputs. weights activation functions
53 Backpropagation BP is a popular algorithm to learn weights of a neural network. (~gradient descent)
54 Backpropagation First, we can write output of a neural network as a function of input features.
55 Backpropagation Then, we take derivative of the function w.r.t. each weight Ou tpu t g(x)
56 Neural Network in Python
57 [+] Bayesian Network Apply Bayesian network to infer p( y = cancer x = (smoker, X-ray, Dyspnoea, Asia,...) ).
58 Multi-Class Classification with Binary Classifiers We can apply binary classifier by one-versus-all or one-versus-one strategies. 1-vs-all 1-vs-1
59 Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian discriminant analysis, k-nearest neighbor, linear support vector machine (LSVM), decision tree, neural network, (Bayesian network) - multi-class classification with binary classifier Advanced Model - kernel, ensemble Online Model - SGD learner, HMM
60 1. Kernel Methods Project data to a higher-dimensional space (implicitly) and do linear analysis there. google lottery gottery gogog loogle tt #$@@ *&$#...
61 Demo
62 Implicit Mapping and Kernel Function No need to know new feature representation; only need to know inner product of instances. Kernel %%$, gottery gogog loogle = 0.4
63 Representer Theorem The optimal model (a.w. a proper objective function) is a linear combination of training data.
64 Kernel Functions
65 Example Kernel Methods (in Python) Kernel ridge regression Kernel LSVM (a.k.a. SVM) Kernel PCA Kernel K-means Gaussian Process
66 2. Ensemble Method Ensemble several weak models to form a strong model. Two popular ways to learn weak models - bagging: bootstrapping + average - boosting: weighting + weighted av
67 Bagging Bagging has three steps 1. bootstrap training set to get subsets 2. learn one model from each subset 3. average model prediction
68 An Example of Bagging: Random Forest random forest = decision tree + bagging + feature bootstrapping
69 Boosting Loop - weight training instances - train a model on the weighted sample Get weighted average of model prediction
70 An Example of Boosting: AdaBoost Mis-classified instances receive higher weights; accurate models receive higher weights.
71 An Demo of AdaBoost
72 Example Ensemble Methods (in Python)
73 Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian discriminant analysis, k-nearest neighbor, linear support vector machine (LSVM), decision tree, neural network, (Bayesian network) - multi-class classification with binary classifier Advanced Model - kernel, ensemble Online Model - SGD learner, HMM
74 Learning from a Sequence of Data How to learn a spam detection model from a sequence of s? - situation 1: have features and labels of received s - situation 2: only have labels of received s
75 S1: Stochastic Gradient Descent We can continually update model based on new s, using stochastic gradient descent.
76 Example: Perceptron A single neuron that only updates itself when an instance is mis-classified.
77 A Demo of Perceptron
78 S2: Need Special Sequential Prediction Model Q: how do we predict tomorrow s weather based solely on historical weather records?
79 S2: Markov and Hidden Markov Models
80 Example Online Learners (in Python)
Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More informationWhat is machine learning?
Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More informationPredictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA
Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,
More informationSupervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful
More informationClassification with PAM and Random Forest
5/7/2007 Classification with PAM and Random Forest Markus Ruschhaupt Practical Microarray Analysis 2007 - Regensburg Two roads to classification Given: patient profiles already diagnosed by an expert.
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017
3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural
More information7. Boosting and Bagging Bagging
Group Prof. Daniel Cremers 7. Boosting and Bagging Bagging Bagging So far: Boosting as an ensemble learning method, i.e.: a combination of (weak) learners A different way to combine classifiers is known
More informationProbabilistic Approaches
Probabilistic Approaches Chirayu Wongchokprasitti, PhD University of Pittsburgh Center for Causal Discovery Department of Biomedical Informatics chw20@pitt.edu http://www.pitt.edu/~chw20 Overview Independence
More informationPartitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning
Partitioning Data IRDS: Evaluation, Debugging, and Diagnostics Charles Sutton University of Edinburgh Training Validation Test Training : Running learning algorithms Validation : Tuning parameters of learning
More informationSimple Model Selection Cross Validation Regularization Neural Networks
Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationNaïve Bayes for text classification
Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support
More informationMachine Learning: Think Big and Parallel
Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least
More informationADVANCED CLASSIFICATION TECHNIQUES
Admin ML lab next Monday Project proposals: Sunday at 11:59pm ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 Fall 2014 Project proposal presentations Machine Learning: A Geometric View 1 Apples
More informationLecture Notes for Chapter 5
Classifcation - Alternative Techniques Lecture Notes for Chapter 5 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Look for accompanying R code on the course web site. Topics Rule-Based Classifier
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description
More informationSupervised Learning for Image Segmentation
Supervised Learning for Image Segmentation Raphael Meier 06.10.2016 Raphael Meier MIA 2016 06.10.2016 1 / 52 References A. Ng, Machine Learning lecture, Stanford University. A. Criminisi, J. Shotton, E.
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 12 Combining
More informationPlease write your initials at the top right of each page (e.g., write JS if you are Jonathan Shewchuk). Finish this by the end of your 3 hours.
CS 189 Spring 016 Introduction to Machine Learning Final Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your two-page cheat sheet. Electronic
More informationContents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation
Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4
More information5 Learning hypothesis classes (16 points)
5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated
More informationBioinformatics - Lecture 07
Bioinformatics - Lecture 07 Bioinformatics Clusters and networks Martin Saturka http://www.bioplexity.org/lectures/ EBI version 0.4 Creative Commons Attribution-Share Alike 2.5 License Learning on profiles
More informationAll lecture slides will be available at CSC2515_Winter15.html
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many
More informationNotes and Announcements
Notes and Announcements Midterm exam: Oct 20, Wednesday, In Class Late Homeworks Turn in hardcopies to Michelle. DO NOT ask Michelle for extensions. Note down the date and time of submission. If submitting
More informationCSE 446 Bias-Variance & Naïve Bayes
CSE 446 Bias-Variance & Naïve Bayes Administrative Homework 1 due next week on Friday Good to finish early Homework 2 is out on Monday Check the course calendar Start early (midterm is right before Homework
More informationMachine Learning with MATLAB --classification
Machine Learning with MATLAB --classification Stanley Liang, PhD York University Classification the definition In machine learning and statistics, classification is the problem of identifying to which
More informationLarge-Scale Lasso and Elastic-Net Regularized Generalized Linear Models
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data
More informationHomework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:
Homework Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression 3.0-3.2 Pod-cast lecture on-line Next lectures: I posted a rough plan. It is flexible though so please come with suggestions Bayes
More informationMachine Learning Lecture 3
Machine Learning Lecture 3 Probability Density Estimation II 19.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Exam dates We re in the process
More informationThe Curse of Dimensionality
The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more
More informationBayes Net Learning. EECS 474 Fall 2016
Bayes Net Learning EECS 474 Fall 2016 Homework Remaining Homework #3 assigned Homework #4 will be about semi-supervised learning and expectation-maximization Homeworks #3-#4: the how of Graphical Models
More informationCOSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor
COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationRecap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach
Truth Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 2.04.205 Discriminative Approaches (5 weeks)
More informationLecture Notes for Chapter 5
Classification - Alternative Techniques Lecture tes for Chapter 5 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Look for accompanying R code on the course web site. Topics Rule-Based Classifier
More informationData Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)
Data Mining: Concepts and Techniques Chapter 9 Classification: Support Vector Machines 1 Support Vector Machines (SVMs) SVMs are a set of related supervised learning methods used for classification Based
More informationMachine Learning Techniques
Machine Learning Techniques ( 機器學習技法 ) Lecture 16: Finale Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系
More informationMachine Learning in Biology
Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant
More informationSupport Vector Machines
Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More information6 Model selection and kernels
6. Bias-Variance Dilemma Esercizio 6. While you fit a Linear Model to your data set. You are thinking about changing the Linear Model to a Quadratic one (i.e., a Linear Model with quadratic features φ(x)
More informationThorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA
Retrospective ICML99 Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA Outline The paper in
More informationNonparametric Methods Recap
Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority
More informationMachine Learning in Action
Machine Learning in Action PETER HARRINGTON Ill MANNING Shelter Island brief contents PART l (~tj\ssification...,... 1 1 Machine learning basics 3 2 Classifying with k-nearest Neighbors 18 3 Splitting
More informationPerceptron as a graph
Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2
More informationMachine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013
Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork
More informationPredictive modelling / Machine Learning Course on Big Data Analytics
Predictive modelling / Machine Learning Course on Big Data Analytics Roberta Turra, Cineca 19 September 2016 Going back to the definition of data analytics process of extracting valuable information from
More informationSUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018
SUPERVISED LEARNING METHODS Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018 2 CHOICE OF ML You cannot know which algorithm will work
More informationCAMCOS Report Day. December 9 th, 2015 San Jose State University Project Theme: Classification
CAMCOS Report Day December 9 th, 2015 San Jose State University Project Theme: Classification On Classification: An Empirical Study of Existing Algorithms based on two Kaggle Competitions Team 1 Team 2
More informationUVA CS 4501: Machine Learning. Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia
UVA CS 4501: Machine Learning Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff Dr. Yanjun Qi University of Virginia Department of Computer Science 1 Where are we? è Five major secfons
More informationInstance-based Learning
Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin
More informationRecognition Part I: Machine Learning. CSE 455 Linda Shapiro
Recognition Part I: Machine Learning CSE 455 Linda Shapiro Visual Recognition What does it mean to see? What is where, Marr 1982 Get computers to see Visual Recognition Verification Is this a car? Visual
More informationSupport Vector Machines + Classification for IR
Support Vector Machines + Classification for IR Pierre Lison University of Oslo, Dep. of Informatics INF3800: Søketeknologi April 30, 2014 Outline of the lecture Recap of last week Support Vector Machines
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationSupport Vector Machines
Support Vector Machines Chapter 9 Chapter 9 1 / 50 1 91 Maximal margin classifier 2 92 Support vector classifiers 3 93 Support vector machines 4 94 SVMs with more than two classes 5 95 Relationshiop to
More informationClassification: Linear Discriminant Functions
Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions
More informationCSC411 Fall 2014 Machine Learning & Data Mining. Ensemble Methods. Slides by Rich Zemel
CSC411 Fall 2014 Machine Learning & Data Mining Ensemble Methods Slides by Rich Zemel Ensemble methods Typical application: classi.ication Ensemble of classi.iers is a set of classi.iers whose individual
More informationInstance-Based Learning: Nearest neighbor and kernel regression and classificiation
Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Emily Fox University of Washington February 3, 2017 Simplest approach: Nearest neighbor regression 1 Fit locally to each
More informationMachine Learning Lecture 3
Many slides adapted from B. Schiele Machine Learning Lecture 3 Probability Density Estimation II 26.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course
More informationEnsemble Learning. Another approach is to leverage the algorithms we have via ensemble methods
Ensemble Learning Ensemble Learning So far we have seen learning algorithms that take a training set and output a classifier What if we want more accuracy than current algorithms afford? Develop new learning
More informationMachine Learning Lecture 3
Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 26.04.206 Discriminative Approaches (5 weeks) Linear
More informationInstance-Based Learning: Nearest neighbor and kernel regression and classificiation
Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Emily Fox University of Washington February 3, 2017 Simplest approach: Nearest neighbor regression 1 Fit locally to each
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining
More informationClassification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska
Classification Lecture Notes cse352 Neural Networks Professor Anita Wasilewska Neural Networks Classification Introduction INPUT: classification data, i.e. it contains an classification (class) attribute
More informationSupervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)
Supervised Learning (contd) Linear Separation Mausam (based on slides by UW-AI faculty) Images as Vectors Binary handwritten characters Treat an image as a highdimensional vector (e.g., by reading pixel
More informationIntroduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering
Introduction to Pattern Recognition Part II Selim Aksoy Bilkent University Department of Computer Engineering saksoy@cs.bilkent.edu.tr RETINA Pattern Recognition Tutorial, Summer 2005 Overview Statistical
More informationMore Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA
More Learning Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA 1 Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14
More informationStructured Learning. Jun Zhu
Structured Learning Jun Zhu Supervised learning Given a set of I.I.D. training samples Learn a prediction function b r a c e Supervised learning (cont d) Many different choices Logistic Regression Maximum
More informationSupport Vector Machines
Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining
More informationCS229 Final Project: Predicting Expected Response Times
CS229 Final Project: Predicting Expected Email Response Times Laura Cruz-Albrecht (lcruzalb), Kevin Khieu (kkhieu) December 15, 2017 1 Introduction Each day, countless emails are sent out, yet the time
More informationPractice EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE
Practice EXAM: SPRING 0 CS 6375 INSTRUCTOR: VIBHAV GOGATE The exam is closed book. You are allowed four pages of double sided cheat sheets. Answer the questions in the spaces provided on the question sheets.
More informationLecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017
Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last
More informationOn Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions
On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions CAMCOS Report Day December 9th, 2015 San Jose State University Project Theme: Classification The Kaggle Competition
More informationClassification. Slide sources:
Classification Slide sources: Gideon Dror, Academic College of TA Yaffo Nathan Ifill, Leicester MA4102 Data Mining and Neural Networks Andrew Moore, CMU : http://www.cs.cmu.edu/~awm/tutorials 1 Outline
More informationNeural Networks (pp )
Notation: Means pencil-and-paper QUIZ Means coding QUIZ Neural Networks (pp. 106-121) The first artificial neural network (ANN) was the (single-layer) perceptron, a simplified model of a biological neuron.
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest
More informationBoosting Simple Model Selection Cross Validation Regularization
Boosting: (Linked from class website) Schapire 01 Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 8 th,
More informationSupport Vector Machines
Support Vector Machines About the Name... A Support Vector A training sample used to define classification boundaries in SVMs located near class boundaries Support Vector Machines Binary classifiers whose
More informationA Taxonomy of Semi-Supervised Learning Algorithms
A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph
More informationLecture 9: Support Vector Machines
Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and
More informationPredicting Computing Prices Dynamically Using Machine Learning
Technical Disclosure Commons Defensive Publications Series December 07, 2017 Predicting Computing Prices Dynamically Using Machine Learning Thomas Price Follow this and additional works at: http://www.tdcommons.org/dpubs_series
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationData Mining Lecture 8: Decision Trees
Data Mining Lecture 8: Decision Trees Jo Houghton ECS Southampton March 8, 2019 1 / 30 Decision Trees - Introduction A decision tree is like a flow chart. E. g. I need to buy a new car Can I afford it?
More informationUVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia
UVA CS 6316/4501 Fall 2016 Machine Learning Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff Dr. Yanjun Qi University of Virginia Department of Computer Science 11/9/16 1 Rough Plan HW5
More informationComputer Vision Group Prof. Daniel Cremers. 8. Boosting and Bagging
Prof. Daniel Cremers 8. Boosting and Bagging Repetition: Regression We start with a set of basis functions (x) =( 0 (x), 1(x),..., M 1(x)) x 2 í d The goal is to fit a model into the data y(x, w) =w T
More informationMachine Learning to Select Best Network Access Point
Technical Disclosure Commons Defensive Publications Series December 12, 2017 Machine Learning to Select Best Network Access Point Thomas Price Eytan Lerba Follow this and additional works at: http://www.tdcommons.org/dpubs_series
More informationEnsemble methods in machine learning. Example. Neural networks. Neural networks
Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you
More informationMachine Learning. Supervised Learning. Manfred Huber
Machine Learning Supervised Learning Manfred Huber 2015 1 Supervised Learning Supervised learning is learning where the training data contains the target output of the learning system. Training data D
More informationPredicting Song Popularity
Predicting Song Popularity James Pham jqpham@stanford.edu Edric Kyauk ekyauk@stanford.edu Edwin Park edpark@stanford.edu Abstract Predicting song popularity is particularly important in keeping businesses
More informationCS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014
CS273 Midterm Eam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014 Your name: Your UCINetID (e.g., myname@uci.edu): Your seat (row and number): Total time is 80 minutes. READ THE
More information