Machine Learning Lecture 3
|
|
- Paul Elliott
- 5 years ago
- Views:
Transcription
1 Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II Discriminative Approaches (5 weeks) Linear Discriminant Functions Support Vector Machines Ensemble Methods & Boosting Randomized Trees, Forests & Ferns Bastian Leibe RWTH Aachen Generative Models (4 weeks) Bayesian Networks Markov Random Fields leibe@vision.rwth-aachen.de Many slides adapted from B. Schiele 2 Topics of This Lecture Recap: Gaussian (or Normal) Distribution Recap: Parametric Methods Maximum Likelihood approach Bayesian Learning Non-Parametric Methods Histograms Kernel density estimation K-Nearest Neighbors k-nn for Classification Bias-Variance tradeoff Mixture distributions Mixture of Gaussians (MoG) Maximum Likelihood estimation attempt 3 One-dimensional case Mean ¹ Variance ¾ 2 N(xj¹; ¾ 2 ) = p ¾ (x ¹)2 exp ½ 2¼¾ 2¾ 2 Multi-dimensional case Mean ¹ Covariance N(xj¹; ) = ½ exp ¾ (2¼) D=2 j j=2 2 (x ¹)T (x ¹) 4 Recap: Maximum Likelihood Approach Recap: Bayesian Learning Approach Computation of the likelihood Single data point: p(x n jµ) Assumption: all data points X = fx ; : : : ; x ng are independent NY L(µ) = p(xjµ) = p(x n jµ) Log-likelihood E(µ) = ln L(µ) = ln p(x n jµ) Estimation of the parameters µ (Learning) Maximize the likelihood (=minimize the negative log-likelihood) Take the derivative and set it to E(µ) p(x njµ) p(x n jµ)! = 0 5 Bayesian view: Consider the parameter vector µ as a random variable. When estimating the parameters, what we compute is p(xjx) = p(x; µjx)dµ Assumption: given µ, this doesn t depend on X anymore p(xjx) = Slide adapted from Bernt Schiele p(x; µjx) = p(xjµ; X)p(µjX) p(xjµ)p(µjx)dµ This is entirely determined by the parameter µ (i.e. by the parametric form of the pdf). 6
2 Bayesian Learning Approach Topics of This Lecture Discussion Estimate for x based on parametric form µ Likelihood of the parametric form µ given the data set X. p(xjµ)l(µ)p(µ) p(xjx) = R dµ L(µ)p(µ)dµ Normalization: integrate over all possible values of µ If we now plug in a (suitable) prior p(µ), we can estimate from the data set X. Prior for the parameters µ p(xjx) 7 Recap: Bayes Decision Theory Parametric Methods Recap: Maximum Likelihood approach Bayesian Learning Non-Parametric Methods Histograms Kernel density estimation K-Nearest Neighbors k-nn for Classification Bias-Variance tradeoff Mixture distributions Mixture of Gaussians (MoG) Maximum Likelihood estimation attempt 8 Non-Parametric Methods Histograms Non-parametric representations Often the functional form of the distribution is unknown Basic idea: Partition the data space into distinct bins with widths i and count the number of observations, n i, in each bin. 3 N =0 2 x Estimate probability density from data Histograms Kernel density estimation (Parzen window / Gaussian kernels) k-nearest-neighbor Often, the same width is used for all bins, i = This can be done, in principle, for any dimensionality D but the required number of bins grows exponentially with D! 9 0 Histograms Summary: Histograms The bin width M acts as a smoothing factor. not smooth enough about OK too smooth Properties Very general. In the limit (N!), every probability density can be represented. No need to store the data points once histogram is computed. Rather brute-force Problems High-dimensional feature spaces D-dimensional space with M bins/dimension will require M D bins! Requires an exponentially growing number of data points Curse of dimensionality Discontinuities at bin edges Bin size? too large: too much smoothing too small: too much noise 2 2
3 Statistically Better-Founded Approach Data point x comes from pdf p(x) P = p(y)dy Probability that x falls into small region R R If R is sufficiently small, p(x) is roughly constant P = Let V be the volume of R R p(y)dy ¼ p(x)v If the number N of samples is sufficiently large, we can estimate P as P = K N ) K NV 3 Statistically Better-Founded Approach fixed V determine K Kernel methods Example: Determine the number K of data points inside a fixed window Kernel Methods K NV fixed K determine V K-Nearest Neighbor 4 Kernel Methods Kernel Methods: Parzen Window Parzen Window Hypercube of dimension D with edge length h: k(u) = K = ½ ; jui 2 ; i = ; : : : ; D 0; else Kernel function k( x x n ) V = h Probability density estimate: K NV = Nh D k(u)du = h d k( x x n ) h 5 Interpretations. We place a kernel window k at location x and count how many data points fall inside it. 2. We place a kernel window k around each data point x n and sum up their influences at location x. Direct visualization of the density. Still, we have artificial discontinuities at the cube boundaries We can obtain a smoother density model if we choose a smoother kernel function, e.g. a Gaussian 6 Kernel Methods: Gaussian Kernel Gauss Kernel: Examples Gaussian kernel Kernel function ¾ k(u) = exp ½ u2 (2¼h 2 ) =2 2h 2 K = k(x x n ) V = k(u)du = not smooth enough about OK Probability density estimate K NV = N ½ (2¼) D=2 h exp jjx x njj 2 ¾ 2h 2 too smooth h acts as a smoother
4 Kernel Methods Statistically Better-Founded Approach In general Any kernel such that K NV can be used. Then K = k(x x n ) fixed V determine K Kernel Methods fixed K determine V K-Nearest Neighbor And we get the probability density estimate K NV = N k(x x n ) K-Nearest Neighbor Increase the volume V until the K next data points are found. Slide adapted from Bernt Schiele 9 20 K-Nearest Neighbor k-nearest Neighbor: Examples Nearest-Neighbor density estimation Fix K, estimate V from the data. Consider a hypersphere centred on x and let it grow to a volume V? that includes K of the given N data points. Then K = 3 not smooth enough about OK Side note Strictly speaking, the model produced by K-NN is not a true density model, because the integral over all space diverges. E.g. consider K = and a sample exactly on a data point x = x j. too smooth K acts as a smoother Summary: Kernel and k-nn Density Estimation K-Nearest Neighbor Classification Properties Very general. In the limit (N!), every probability density can be represented. No computation involved in the training phase Simply storage of the training set Problems Requires storing and computing with the entire dataset. Computational cost linear in the number of data points. This can be improved, at the expense of some computation during training, by constructing efficient tree-based search structures. Kernel size / K in K-NN? Too large: too much smoothing Too small: too much noise 23 Bayesian Classification p(c j jx) = p(xjc j)p(c j ) p(x) Here we have K NV p(xjc j ) ¼ K j N j V p(c j ) ¼ N j N p(c j jx) ¼ K j N j NV N j V N K k-nearest Neighbor classification = K j K 24 4
5 K-Nearest Neighbors for Classification K-Nearest Neighbors for Classification Results on an example data set K = 3 K = 25 K acts as a smoothing parameter. Theoretical guarantee For N!, the error rate of the -NN classifier is never more than twice the optimal error (obtained from the true conditional class distributions). 26 Bias-Variance Tradeoff Probability density estimation Histograms: bin size? M too large: too smooth M too small: not smooth enough Kernel methods: kernel size? h too large: too smooth h too small: not smooth enough K-Nearest Neighbor: K? K too large: too smooth K too small: not smooth enough Too much bias Too much variance This is a general problem of many probability density estimation methods Including parametric methods and mixture models Discussion The methods discussed so far are all simple and easy to apply. They are used in many practical applications. However Histograms scale poorly with increasing dimensionality. Only suitable for relatively low-dimensional data. Both k-nn and kernel density estimation require the entire data set to be stored. Too expensive if the data set is large. Simple parametric models are very restricted in what forms of distributions they can represent. Only suitable if the data has the same general form. We need density models that are efficient and flexible! Topics of This Lecture Mixture Distributions Recap: Bayes Decision Theory Parametric Methods Recap: Maximum Likelihood approach Bayesian Learning A single parametric distribution is often not sufficient E.g. for multimodal data Non-Parametric Methods Histograms Kernel density estimation K-Nearest Neighbors k-nn for Classification Bias-Variance tradeoff Mixture distributions Mixture of Gaussians (MoG) Maximum Likelihood estimation attempt Single Gaussian Mixture of two Gaussians
6 Mixture of Gaussians (MoG) Sum of M individual Normal distributions f(x) In the limit, every smooth distribution can be approximated this way (if M is large enough) MX p(xjµ) = p(xjµ j )p(j) j= x 3 Mixture of Gaussians Notes MX p(xjµ) = p(xjµ j )p(j) j= p(xjµ j ) = N(xj¹ j ; ¾ 2 j ) = p(j) = ¼ j p 2¼¾j exp X M with 0 ¼ j and ¼ j =. The mixture density integrates to : The mixture parameters are µ = (¼ ; ¹ ; ¾ ; : : : ; ; ¼ M ; ¹ M ; ¾ M ) Slide adapted from Bernt Schiele Likelihood of measurement x given mixture component j ( j= (x ¹ j) 2 2¾ 2 j p(x)dx = ) Prior of component j 32 Mixture of Gaussians (MoG) Mixture of Multivariate Gaussians Generative model j p(j) = ¼ j Weight of mixture component p(x) 2 3 x p(xjµ j ) Mixture component p(x) x Mixture density MX p(xjµ) = p(xjµ j )p(j) j= Mixture of Multivariate Gaussians Mixture of Multivariate Gaussians Multivariate Gaussians MX p(xjµ) = p(xjµ j )p(j) p(xjµ j ) = j= ½ exp ¾ (2¼) D=2 j j j=2 2 (x ¹ j) T j (x ¹ j ) Generative model j p(j) = ¼ j 3X 2 3 p(xjµ) = ¼ jp(xjµ j) j= Mixture weights / mixture coefficients: X M p(j) = ¼ j with 0 ¼ j and ¼ j = j= Parameters: µ = (¼ ; ¹ ; ; : : : ; ¼ M ; ¹ M ; M ) p(xjµ ) p(xjµ 2) p(xjµ 3)
7 Mixture of Gaussians st Estimation Attempt Mixture of Gaussians st Estimation Attempt Maximum Likelihood Minimize E = ln L(µ) = ln p(x n jµ) Let s first look at ¹ j = j E We can already see that this will be difficult, since ( K ) X ln p(xj¼; ¹; ) = ln ¼ k N(x n j¹ k ; k ) k= ¹ @¹ p(x n jµ j ) = j K j k= p(x njµ k ) Ã! = p(x n jµ j ) (x n ¹ j ) P K k= p(x njµ k ) X N ¼ j N (x n j¹ j ; j )! = P (x n ¹ j ) P K k= ¼ = 0 K k= p(x (x n ¹ j )p(x n jµ j ) =! 0 njµ k ) kn (x n j¹ k ; k ) We thus obtain ) ¹ j = j(x n )x n j(x j N (xnj¹ k ; k) = (xn ¹ j )N (xnj¹ k ; k) = j (x n ) responsibility of component j for x n Slide adapted from Bernt Schiele This will cause problems! Mixture of Gaussians st Estimation Attempt Mixture of Gaussians Other Strategy But Other strategy: ¹ j = j(x n )x n j(x n ) I.e. there is no direct analytical = f (¼ ; ; ; : : : ; ¼ M ; ¹ M ; M ) j Complex gradient function (non-linear mutual dependencies) Optimization of one Gaussian depends on all other Gaussians! It is possible to apply iterative numerical optimization here, but in the following, we will see a simpler method. Observed data: Unobserved data: Unobserved = hidden variable : j x h(j = jx n ) = h(j = 2jx n ) = Mixture of Gaussians Other Strategy Mixture of Gaussians Other Strategy Assuming we knew the values of the hidden variable Assuming we knew the mixture components assumed known ML for Gaussian # ML for Gaussian #2 p(j = jx) p(j = 2jx) assumed known j j h(j = jx n ) = h(j = 2jx n ) = ¹ = n)x n i= h(j = jx n) ¹ 2 = n)x n i= h(j = 2jx n) Bayes decision rule: Decide j = if p(j = jx n ) > p(j = 2jx n )
8 Mixture of Gaussians Other Strategy References and Further Reading Chicken and egg problem what comes first? More information in Bishop s book Gaussian distribution and ML: Ch..2.4 and Bayesian Learning: Ch..2.3 and Nonparametric methods: Ch Additional information can be found in Duda & Hart ML estimation: Ch. 3.2 We don t know any of those! j Bayesian Learning: Ch Nonparametric methods: Ch In order to break the loop, we need an estimate for j. Christopher M. Bishop Pattern Recognition and Machine Learning Springer, 2006 E.g. by clustering Next lecture 43 R.O. Duda, P.E. Hart, D.G. Stork Pattern Classification 2 nd Ed., Wiley-Interscience,
Machine Learning Lecture 3
Many slides adapted from B. Schiele Machine Learning Lecture 3 Probability Density Estimation II 26.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course
More informationMachine Learning Lecture 3
Machine Learning Lecture 3 Probability Density Estimation II 19.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Exam dates We re in the process
More informationRecap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach
Truth Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 2.04.205 Discriminative Approaches (5 weeks)
More informationHomework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:
Homework Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression 3.0-3.2 Pod-cast lecture on-line Next lectures: I posted a rough plan. It is flexible though so please come with suggestions Bayes
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Machine Learning Algorithms (IFT6266 A7) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationCOMPUTATIONAL STATISTICS UNSUPERVISED LEARNING
COMPUTATIONAL STATISTICS UNSUPERVISED LEARNING Luca Bortolussi Department of Mathematics and Geosciences University of Trieste Office 238, third floor, H2bis luca@dmi.units.it Trieste, Winter Semester
More informationMachine Learning and Pervasive Computing
Stephan Sigg Georg-August-University Goettingen, Computer Networks 17.12.2014 Overview and Structure 22.10.2014 Organisation 22.10.3014 Introduction (Def.: Machine learning, Supervised/Unsupervised, Examples)
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest
More informationNon-Parametric Modeling
Non-Parametric Modeling CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Non-Parametric Density Estimation Parzen Windows Kn-Nearest Neighbor
More informationUniversity of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques.
. Non-Parameteric Techniques University of Cambridge Engineering Part IIB Paper 4F: Statistical Pattern Processing Handout : Non-Parametric Techniques Mark Gales mjfg@eng.cam.ac.uk Michaelmas 23 Introduction
More informationUniversity of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques
University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2015 11. Non-Parameteric Techniques
More informationContent-based image and video analysis. Machine learning
Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all
More informationUniversity of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques
University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2011 11. Non-Parameteric Techniques
More informationNonparametric Methods
Nonparametric Methods Jason Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Nonparametric Methods 1 / 49 Nonparametric Methods Overview Previously, we ve assumed that the forms of the underlying densities
More informationChapter 4: Non-Parametric Techniques
Chapter 4: Non-Parametric Techniques Introduction Density Estimation Parzen Windows Kn-Nearest Neighbor Density Estimation K-Nearest Neighbor (KNN) Decision Rule Supervised Learning How to fit a density
More informationNotes and Announcements
Notes and Announcements Midterm exam: Oct 20, Wednesday, In Class Late Homeworks Turn in hardcopies to Michelle. DO NOT ask Michelle for extensions. Note down the date and time of submission. If submitting
More informationClustering Lecture 5: Mixture Model
Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics
More informationNonparametric Classification. Prof. Richard Zanibbi
Nonparametric Classification Prof. Richard Zanibbi What to do when feature distributions (likelihoods) are not normal Don t Panic! While they may be suboptimal, LDC and QDC may still be applied, even though
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 2014-2015 Jakob Verbeek, November 28, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15
More informationMachine Learning Lecture 9
Course Outline Machine Learning Lecture 9 Fundamentals ( weeks) Bayes Decision Theory Probability Density Estimation Nonlinear SVMs 19.05.013 Discriminative Approaches (5 weeks) Linear Discriminant Functions
More informationMachine Learning Lecture 11
Machine Learning Lecture 11 Random Forests 23.11.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory Probability
More informationAn Introduction to PDF Estimation and Clustering
Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 1 An Introduction to PDF Estimation and Clustering David Corrigan corrigad@tcd.ie Electrical and Electronic Engineering Dept., University
More informationPATTERN CLASSIFICATION AND SCENE ANALYSIS
PATTERN CLASSIFICATION AND SCENE ANALYSIS RICHARD O. DUDA PETER E. HART Stanford Research Institute, Menlo Park, California A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS New York Chichester Brisbane
More informationComputer Vision Lecture 6
Course Outline Computer Vision Lecture 6 Segmentation Image Processing Basics Structure Extraction Segmentation Segmentation as Clustering Graph-theoretic Segmentation 12.11.2015 Recognition Global Representations
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-
More informationDynamic Thresholding for Image Analysis
Dynamic Thresholding for Image Analysis Statistical Consulting Report for Edward Chan Clean Energy Research Center University of British Columbia by Libo Lu Department of Statistics University of British
More informationMachine Learning Lecture 9
Course Outline Machine Learning Lecture 9 Fundamentals ( weeks) Bayes Decision Theory Probability Density Estimation Nonlinear SVMs 30.05.016 Discriminative Approaches (5 weeks) Linear Discriminant Functions
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationPerceptron as a graph
Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2
More informationSD 372 Pattern Recognition
SD 372 Pattern Recognition Lab 2: Model Estimation and Discriminant Functions 1 Purpose This lab examines the areas of statistical model estimation and classifier aggregation. Model estimation will be
More informationExpectation Maximization: Inferring model parameters and class labels
Expectation Maximization: Inferring model parameters and class labels Emily Fox University of Washington February 27, 2017 Mixture of Gaussian recap 1 2/26/17 Jumble of unlabeled images HISTOGRAM blue
More informationDensity estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate
Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14
More informationWhat is machine learning?
Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship
More informationInstance-Based Learning: Nearest neighbor and kernel regression and classificiation
Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Emily Fox University of Washington February 3, 2017 Simplest approach: Nearest neighbor regression 1 Fit locally to each
More informationLast week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints
Last week Multi-Frame Structure from Motion: Multi-View Stereo Unknown camera viewpoints Last week PCA Today Recognition Today Recognition Recognition problems What is it? Object detection Who is it? Recognizing
More informationFisher vector image representation
Fisher vector image representation Jakob Verbeek January 13, 2012 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.11.12.php Fisher vector representation Alternative to bag-of-words image representation
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationIntroduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering
Introduction to Pattern Recognition Part II Selim Aksoy Bilkent University Department of Computer Engineering saksoy@cs.bilkent.edu.tr RETINA Pattern Recognition Tutorial, Summer 2005 Overview Statistical
More informationMachine Learning. Chao Lan
Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian
More informationMachine Learning. Supervised Learning. Manfred Huber
Machine Learning Supervised Learning Manfred Huber 2015 1 Supervised Learning Supervised learning is learning where the training data contains the target output of the learning system. Training data D
More informationComputer Vision Lecture 6
Computer Vision Lecture 6 Segmentation 12.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Image Processing Basics Structure Extraction Segmentation
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More informationInstance-based Learning
Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin
More informationBayes Risk. Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Discriminative vs Generative Models. Loss functions in classifiers
Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Examine each window of an image Classify object class within each window based on a training set images Example: A Classification Problem Categorize
More informationInstance-based Learning
Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 15 th, 2007 2005-2007 Carlos Guestrin 1 1-Nearest Neighbor Four things make a memory based learner:
More informationComputer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models
Group Prof. Daniel Cremers 4a. Inference in Graphical Models Inference on a Chain (Rep.) The first values of µ α and µ β are: The partition function can be computed at any node: Overall, we have O(NK 2
More informationClassifiers for Recognition Reading: Chapter 22 (skip 22.3)
Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Examine each window of an image Classify object class within each window based on a training set images Slide credits for this chapter: Frank
More informationModel Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer
Model Assessment and Selection Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Model Training data Testing data Model Testing error rate Training error
More informationComputer Vision 2 Lecture 8
Computer Vision 2 Lecture 8 Multi-Object Tracking (30.05.2016) leibe@vision.rwth-aachen.de, stueckler@vision.rwth-aachen.de RWTH Aachen University, Computer Vision Group http://www.vision.rwth-aachen.de
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationCOMP 551 Applied Machine Learning Lecture 13: Unsupervised learning
COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551
More informationCS 195-5: Machine Learning Problem Set 5
CS 195-5: Machine Learning Problem Set 5 Douglas Lanman dlanman@brown.edu 26 November 26 1 Clustering and Vector Quantization Problem 1 Part 1: In this problem we will apply Vector Quantization (VQ) to
More informationMultivariate Data Analysis and Machine Learning in High Energy Physics (V)
Multivariate Data Analysis and Machine Learning in High Energy Physics (V) Helge Voss (MPI K, Heidelberg) Graduierten-Kolleg, Freiburg, 11.5-15.5, 2009 Outline last lecture Rule Fitting Support Vector
More informationNearest Neighbor with KD Trees
Case Study 2: Document Retrieval Finding Similar Documents Using Nearest Neighbors Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox January 22 nd, 2013 1 Nearest
More informationInstance-Based Learning: Nearest neighbor and kernel regression and classificiation
Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Emily Fox University of Washington February 3, 2017 Simplest approach: Nearest neighbor regression 1 Fit locally to each
More informationMachine Learning Lecture 12
Course Outline Machine Learning Lecture 2 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Randomized Trees, Forests, and Ferns.6.25 Discriminative Approaches (5 weeks) Linear
More informationRobust Shape Retrieval Using Maximum Likelihood Theory
Robust Shape Retrieval Using Maximum Likelihood Theory Naif Alajlan 1, Paul Fieguth 2, and Mohamed Kamel 1 1 PAMI Lab, E & CE Dept., UW, Waterloo, ON, N2L 3G1, Canada. naif, mkamel@pami.uwaterloo.ca 2
More informationAdaptive Metric Nearest Neighbor Classification
Adaptive Metric Nearest Neighbor Classification Carlotta Domeniconi Jing Peng Dimitrios Gunopulos Computer Science Department Computer Science Department Computer Science Department University of California
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationMachine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013
Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork
More informationDensity estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate
Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationCS4495/6495 Introduction to Computer Vision. 8C-L1 Classification: Discriminative models
CS4495/6495 Introduction to Computer Vision 8C-L1 Classification: Discriminative models Remember: Supervised classification Given a collection of labeled examples, come up with a function that will predict
More information9881 Project Deliverable #1
Memorial University of Newfoundland Pattern Recognition Lecture 3 May 9, 2006 http://www.engr.mun.ca/~charlesr Office Hours: Tuesdays & Thursdays 8:30-9:30 PM EN-3026 9881 Project Deliverable #1 Report
More informationECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov
ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern
More informationMachine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016
Machine Learning 10-701, Fall 2016 Nonparametric methods for Classification Eric Xing Lecture 2, September 12, 2016 Reading: 1 Classification Representing data: Hypothesis (classifier) 2 Clustering 3 Supervised
More informationMachine Learning. Classification
10-701 Machine Learning Classification Inputs Inputs Inputs Where we are Density Estimator Probability Classifier Predict category Today Regressor Predict real no. Later Classification Assume we want to
More informationClustering web search results
Clustering K-means Machine Learning CSE546 Emily Fox University of Washington November 4, 2013 1 Clustering images Set of Images [Goldberger et al.] 2 1 Clustering web search results 3 Some Data 4 2 K-means
More informationGradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz
Gradient Descent Wed Sept 20th, 2017 James McInenrey Adapted from slides by Francisco J. R. Ruiz Housekeeping A few clarifications of and adjustments to the course schedule: No more breaks at the midpoint
More informationPerformance Measures
1 Performance Measures Classification F-Measure: (careful: similar but not the same F-measure as the F-measure we saw for clustering!) Tradeoff between classifying correctly all datapoints of the same
More informationUnsupervised Learning
Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised
More informationNonparametric Methods Recap
Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority
More information9.913 Pattern Recognition for Vision. Class I - Overview. Instructors: B. Heisele, Y. Ivanov, T. Poggio
9.913 Class I - Overview Instructors: B. Heisele, Y. Ivanov, T. Poggio TOC Administrivia Problems of Computer Vision and Pattern Recognition Overview of classes Quick review of Matlab Administrivia Instructors:
More informationExpectation Maximization (EM) and Gaussian Mixture Models
Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation
More informationIntroduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones
Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones What is machine learning? Data interpretation describing relationship between predictors and responses
More informationMachine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves
Machine Learning A 708.064 11W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence I [2 P] a) [1 P] Give an example for a probability distribution P (A, B, C) that disproves
More informationBayes Net Learning. EECS 474 Fall 2016
Bayes Net Learning EECS 474 Fall 2016 Homework Remaining Homework #3 assigned Homework #4 will be about semi-supervised learning and expectation-maximization Homeworks #3-#4: the how of Graphical Models
More informationIBL and clustering. Relationship of IBL with CBR
IBL and clustering Distance based methods IBL and knn Clustering Distance based and hierarchical Probability-based Expectation Maximization (EM) Relationship of IBL with CBR + uses previously processed
More informationMixture Models and the EM Algorithm
Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is
More informationDr. Ulas Bagci
CAP5415-Computer Vision Lecture 11-Image Segmentation (BASICS): Thresholding, Region Growing, Clustering Dr. Ulas Bagci bagci@ucf.edu 1 Image Segmentation Aim: to partition an image into a collection of
More informationGaussian Processes for Robotics. McGill COMP 765 Oct 24 th, 2017
Gaussian Processes for Robotics McGill COMP 765 Oct 24 th, 2017 A robot must learn Modeling the environment is sometimes an end goal: Space exploration Disaster recovery Environmental monitoring Other
More informationBioimage Informatics
Bioimage Informatics Lecture 14, Spring 2012 Bioimage Data Analysis (IV) Image Segmentation (part 3) Lecture 14 March 07, 2012 1 Outline Review: intensity thresholding based image segmentation Morphological
More informationSupport Vector Machines
Support Vector Machines About the Name... A Support Vector A training sample used to define classification boundaries in SVMs located near class boundaries Support Vector Machines Binary classifiers whose
More informationPattern Recognition ( , RIT) Exercise 1 Solution
Pattern Recognition (4005-759, 20092 RIT) Exercise 1 Solution Instructor: Prof. Richard Zanibbi The following exercises are to help you review for the upcoming midterm examination on Thursday of Week 5
More informationUndirected Graphical Models. Raul Queiroz Feitosa
Undirected Graphical Models Raul Queiroz Feitosa Pros and Cons Advantages of UGMs over DGMs UGMs are more natural for some domains (e.g. context-dependent entities) Discriminative UGMs (CRF) are better
More informationCS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed
More informationA Taxonomy of Semi-Supervised Learning Algorithms
A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph
More informationHomework #4 Programming Assignment Due: 11:59 pm, November 4, 2018
CSCI 567, Fall 18 Haipeng Luo Homework #4 Programming Assignment Due: 11:59 pm, ovember 4, 2018 General instructions Your repository will have now a directory P4/. Please do not change the name of this
More informationComparison Study of Different Pattern Classifiers
Comparison Study of Different Pattern s Comparison Study of Different Pattern s Ameet Joshi, Shweta Bapna, Sravanya Chunduri Abstract This paper presents a comparison study of the different parametric
More informationMachine Learning Lecture 16
ourse Outline Machine Learning Lecture 16 undamentals (2 weeks) ayes ecision Theory Probability ensity stimation Undirected raphical Models & Inference 28.06.2016 iscriminative pproaches (5 weeks) Linear
More informationAn Introduction to Pattern Recognition
An Introduction to Pattern Recognition Speaker : Wei lun Chao Advisor : Prof. Jian-jiun Ding DISP Lab Graduate Institute of Communication Engineering 1 Abstract Not a new research field Wide range included
More informationUVA CS 4501: Machine Learning. Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia
UVA CS 4501: Machine Learning Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff Dr. Yanjun Qi University of Virginia Department of Computer Science 1 Where are we? è Five major secfons
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationComputer Vision II Lecture 4
Computer Vision II Lecture 4 Color based Tracking 29.04.2014 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Single-Object Tracking Background modeling
More informationColorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.
Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ 1 Image Segmentation Some material for these slides comes from https://www.csd.uwo.ca/courses/cs4487a/
More informationLecture 11: Classification
Lecture 11: Classification 1 2009-04-28 Patrik Malm Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University 2 Reading instructions Chapters for this lecture 12.1 12.2 in
More information