Machine Learning for. Artem Lind & Aleskandr Tkachenko

Similar documents
Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS

Evaluating Classifiers

Machine Learning: Algorithms and Applications Mockup Examination

Evaluating Classifiers

CISC 4631 Data Mining

Network Traffic Measurements and Analysis

CS 229 Midterm Review

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

Nearest Neighbor Classification

CSCI567 Machine Learning (Fall 2014)

Pattern recognition (4)

CSE 158. Web Mining and Recommender Systems. Midterm recap

ECLT 5810 Evaluation of Classification Quality

Introduction to Support Vector Machines

HOG-based Pedestriant Detector Training

List of Exercises: Data Mining 1 December 12th, 2015

Information Management course

Evaluation of different biological data and computational classification methods for use in protein interaction prediction.

Application of Support Vector Machine Algorithm in Spam Filtering

Support vector machines

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

Support Vector Machines

INF 4300 Classification III Anne Solberg The agenda today:

k-nearest Neighbors + Model Selection

Naïve Bayes for text classification

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Statistics 202: Statistical Aspects of Data Mining

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Adaptive Learning of an Accurate Skin-Color Model

Hsiaochun Hsu Date: 12/12/15. Support Vector Machine With Data Reduction

Weka ( )

Data Mining Classification: Bayesian Decision Theory

Classification. Slide sources:

k Nearest Neighbors Super simple idea! Instance-based learning as opposed to model-based (no pre-processing)

Pattern Recognition for Neuroimaging Data

SD 372 Pattern Recognition

Content-based image and video analysis. Machine learning

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati

Introduction to Machine Learning

Supervised Learning Classification Algorithms Comparison

CS145: INTRODUCTION TO DATA MINING

Kernel Methods & Support Vector Machines

Experimental Design + k- Nearest Neighbors

Tutorial on Machine Learning Tools

Large Scale Data Analysis Using Deep Learning

ECG782: Multidimensional Digital Signal Processing

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

CS4491/CS 7265 BIG DATA ANALYTICS

Bus Detection and recognition for visually impaired people

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

CS249: ADVANCED DATA MINING

Louis Fourrier Fabien Gaie Thomas Rolf

Classification Part 4

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

CSE 573: Artificial Intelligence Autumn 2010

All lecture slides will be available at CSC2515_Winter15.html

The Curse of Dimensionality

Machine Learning with MATLAB --classification

Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018

Learning to Recognize Faces in Realistic Conditions

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

CS5670: Computer Vision

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

LECTURE 5: DUAL PROBLEMS AND KERNELS. * Most of the slides in this lecture are from

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

Lecture 9: Support Vector Machines

EPL451: Data Mining on the Web Lab 5

Contents. Preface to the Second Edition

Support Vector Machines

ECE 5470 Classification, Machine Learning, and Neural Network Review

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Chapter 3: Supervised Learning

CS4495/6495 Introduction to Computer Vision. 8C-L1 Classification: Discriminative models

Discriminative classifiers for image recognition

Chakra Chennubhotla and David Koes

Introduction to Automated Text Analysis. bit.ly/poir599

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

Search Engines. Information Retrieval in Practice

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

Applying Supervised Learning

Lecture 11: Classification

Object Recognition. Lecture 11, April 21 st, Lexing Xie. EE4830 Digital Image Processing

Support Vector Machines

Gaussian Processes for Robotics. McGill COMP 765 Oct 24 th, 2017

Data Mining Classification: Alternative Techniques. Imbalanced Class Problem

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

3D Object Recognition using Multiclass SVM-KNN

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

Computer Vision. Exercise Session 10 Image Categorization

Introduction to Artificial Intelligence

Machine Learning. Chao Lan

CS 584 Data Mining. Classification 3

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Transcription:

Machine Learning for Object Recognition Artem Lind & Aleskandr Tkachenko

Outline Problem overview Classification demo Examples of learning algorithms Probabilistic modeling Bayes classifier Maximum margin classification SVM

Object Recognition as a Classification Task

Typical Workflow 1. Preprocessing Images 2. Extracting features 3. Learning classifier 4. Assessing results

Object representation Vectors of quantitative descriptors (length, weight, area) raw pixel values String and trees of structural descriptors Capture spatial relationships between features

Object representation Vectors of quantitative descriptors (length, weight, area) raw pixel values String and trees of structural descriptors Capture spatial relationships between features

Classification demo: Iris dataset Sepal Petal

Classification demo: Iris dataset Two classes: Iris-versicolor, Iris-virginica Two features: sepal length, petal length

Iris dataset: feature vector representation

Classification demo: linear classification

Classification demo: linear classifier How good is my classifier?

Classifier evaluation Confusion Matrix a b <-- classified as 44 6 a = Iris-versicolor True positives False negatives False positives 8 42 b = Iris-virginica True Negatives

Classifier evaluation Confusion Matrix a b <-- classified as 44 6 a = Iris-versicolor True positives False negatives False positives 8 42 b = Iris-virginica True Negatives Precision = TP /(TP +FP ) Recall = TP /(TP +FN) Accuracy = (TP + TN) /( FP + FN) F-Measure = harmonic_mean(precision, Recall)

Classifier evaluation Confusion Matrix a b <-- classified as 44 6 a = Iris-versicolor 8 42 b = Iris-virginica Summary Correctly Classified Instances 86 86 % Incorrectly Classified Instances 14 14 % Total Number of Instances 100 Detailed Accuracy By Class TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.88 0.16 0.846 0.88 0.863 0.949 Iris-versicolor 0.84 0.12 0.875 0.84 0.857 0.949 Iris-virginica

Classifier evaluation Most training algorithms optimize Accuracy / Precision / Recall for the given data However, we want classifier to perform well on unseen data This makes algorithms and theory way more complicated. This makes validation somewhat more complicated.

Proper validation Split the data: Training set Test set Validation Crossvalidation -If the data is scarce

Common workflow Summary: Get a decent dataset Identify discriminative features Train your classifier on the training set Validate on the test set

Classifiers Probabilistic modeling Bayesclassifier Margin maximization Support Vector Machines

Bayesclassifier (for binary classification) Learning: Estimate from the training data P(Class X) Classifying: Bayes Decision Rule: Predict C1, if P( C1 X) > P(C2 X); otherwise C2

Bayesclassifier Predict C1, if P( C1 X) > P(C2 X);otherwise C2 P( X C1) P(C1) / P(X) > P(X C2)P(C2) / P(X) P( X C1) P(C1) > P(X C2)P(C2)

Bayesclassifier Predict C1, if P( X C1) P(C1) > P(X C2) P(C2);otherwise C2

Bayesclassifier If P( X Class)and P(Class)are known, then the classifier is optimal. In practice, distributions P(X Class) and P(Class)are unknown and need to be estimated from the data: P(Class): Assign equal probability to all classes Use prior knowledge P(C) = #examples_in_c/ #examples P(X Class): Some well-behaved distribution expressed in an analytical form. Parameters are estimated based on data for each class. The closer this assumption is to reality, the closer the Bayes classifier approaches the optimum.

Bayesclassifier for Gaussian patterns p( x c ) = N( µ, σ ) i i i ( x µ ) i 1 2 i p x c P c e P c i 2πσ 2σ ( i ) ( i ) = ( i ), = 1,2 µ and σ i 2 i i 2 are sample mean and variance for the class i.

Example from the book

Summary Advantages: Creates linear boundaries which are simple to compute. Easy to implement. Disadvantages: is based on a single prototype per class (class center) which is often insufficient in practice. Usually won t work well for too many classes.

Support Vector Machines

SVM: Maximum margin classification

SVM: Maximum margin classification

SVM: Maximum margin classification

SVM: Maximum margin classification

SVM: Maximum margin classification

SVM: Maximum margin classification Hyperplane: For any point x, its distance to the hyperplane: Assuming all points are classified correctly: The margin is then:

Maximizing the margin Find (w, b) such that Margin is maximal, can be shown to be equivalent with: This is doable using efficient optimization algorithms.

SVM: Non separable case

SVM: How to choose Cparameter? Cparameter penalizes training points within the margin Large C-value can lead to over-fitting. Cross-validation + Grid search Ad hock: Find C-values which give you zero errors on the training set.

Nonlinear case

Solution: a nonlinear map Instead of classifying points x, we ll classify points in a higher-dimensional space.

Kernel methods For nearly any linear classifier: Trained on a dataset The resulting vector w can be represented as: Which means:

Kernel methods

Kernel methods Function Kis called a kernel, it measures similarity between objects. The computation of is unnecessary. You can use any type of data. Your method is nonlinear. Any linear method can be kernelized. Kernels can be combined.

Basic kernels How to choose the kernel?

SVM: more that 2 classes

SVM in practice

SVM Summary SVMs are maximum margin classifiers. SVMs can handle non-separable data. SVMs are known to perform well in high dimensional problems with few examples. Depending on the kernel, SVMs can be slow during classification Kernelizable

References Massachusetts Institute of technology course: 9.913 Pattern Recognition for Machine Vision http://ocw.mit.edu/courses/brain-and-cognitivesciences/9-913-pattern-recognition-for-machine-visionfall-2004/ A Practical Guide to Support Vector Classication http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf Konstantin Tretyakov slides on datamining http://courses.cs.ut.ee/2009/dm/main/lectures

Demo