Ensembles. An ensemble is a set of classifiers whose combined results give the final decision. test feature vector

Similar documents
More Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

ECG782: Multidimensional Digital Signal Processing

Support Vector Machines

Support Vector Machines

All lecture slides will be available at CSC2515_Winter15.html

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Machine Learning Classifiers and Boosting

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017

Naïve Bayes for text classification

Support vector machines

Recognition Part I: Machine Learning. CSE 455 Linda Shapiro

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

Bagging and Boosting Algorithms for Support Vector Machine Classifiers

Module 4. Non-linear machine learning econometrics: Support Vector Machine

SUPPORT VECTOR MACHINES

HW2 due on Thursday. Face Recognition: Dimensionality Reduction. Biometrics CSE 190 Lecture 11. Perceptron Revisited: Linear Separators

Introduction to Support Vector Machines

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Support Vector Machines

Active learning for visual object recognition

An Introduction to Pattern Recognition

More on Classification: Support Vector Machine

SUPPORT VECTOR MACHINE ACTIVE LEARNING

CSE 573: Artificial Intelligence Autumn 2010

An Introduction to Machine Learning

Content-based image and video analysis. Machine learning

Semi-supervised learning and active learning

Classification: Feature Vectors

Basis Functions. Volker Tresp Summer 2016

CS 229 Midterm Review

Optimal Separating Hyperplane and the Support Vector Machine. Volker Tresp Summer 2018

3 Perceptron Learning; Maximum Margin Classifiers

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

A Machine Learning Approach: SVM for Image Classification in CBIR

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Search Engines. Information Retrieval in Practice

Clustering: Classic Methods and Modern Views

Data Mining and Analytics

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Practice EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

A Dendrogram. Bioinformatics (Lec 17)

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Performance Measures

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Classification by Support Vector Machines

Large Margin Classification Using the Perceptron Algorithm

Generative and discriminative classification techniques

Instance-based Learning

Classification by Support Vector Machines

Machine Learning in Biology

Support Vector Machines + Classification for IR

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

732A54/TDDE31 Big Data Analytics

Kernels and Clustering

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

DECISION-TREE-BASED MULTICLASS SUPPORT VECTOR MACHINES. Fumitake Takahashi, Shigeo Abe

Information Management course

Support Vector Machines

Basis Functions. Volker Tresp Summer 2017

CS6220: DATA MINING TECHNIQUES

Semantic Segmentation. Zhongang Qi

Intro to Artificial Intelligence

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Object recognition. Methods for classification and image representation

Selection of Scale-Invariant Parts for Object Class Recognition

Rule extraction from support vector machines

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Note Set 4: Finite Mixture Models and the EM Algorithm

SUPPORT VECTOR MACHINES

Contents. Preface to the Second Edition

The Curse of Dimensionality

For Monday. Read chapter 18, sections Homework:

Data Mining in Bioinformatics Day 1: Classification

Random projection for non-gaussian mixture models

Discriminative classifiers for image recognition

CS 343: Artificial Intelligence

Support Vector Machines

LECTURE 5: DUAL PROBLEMS AND KERNELS. * Most of the slides in this lecture are from

Feature scaling in support vector data description

CS 343H: Honors AI. Lecture 23: Kernels and clustering 4/15/2014. Kristen Grauman UT Austin

Fast Learning for Statistical Face Detection

Machine Learning : Clustering, Self-Organizing Maps

Face detection and recognition. Many slides adapted from K. Grauman and D. Lowe

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

Chakra Chennubhotla and David Koes

Application of Support Vector Machine In Bioinformatics

Ensemble Image Classification Method Based on Genetic Image Network

Support Vector Machines (a brief introduction) Adrian Bevan.

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Machine Learning. Unsupervised Learning. Manfred Huber

1 Case study of SVM (Rob)

Object Class Recognition using Images of Abstract Regions

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Support vector machines. Dominik Wisniewski Wojciech Wawrzyniak

Transcription:

Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1 classifier 2 classifier 3 super classifier result 1

* *A model is the learned decision rule. It can be as simple as a hyperplane in n-space (ie. a line in 2D or plane in 3D) or in the form of a decision tree or other modern classifier. 2

Majority Vote for Several Linear Models 3

4

5

6

Idea of Boosting 7

Boosting In More Detail (Pedro Domingos Algorithm) 1. Set all E weights to 1, and learn H1. 2. Repeat m times: increase the weights of misclassified Es, and learn H2, Hm. 3. H1..Hm have weighted majority vote when classifying each test Weight(H)=accuracy of H on the training data 8

ADABoost ADABoost boosts the accuracy of the original learning algorithm. If the original learning algorithm does slightly better than 50% accuracy, ADABoost with a large enough number of classifiers is guaranteed to classify the training data perfectly. 9

ADABoost Weight Updating for j = 1 to N do /* go through training samples */ if h[m](xj) <> yj then error <- error + wj for j = 1 to N do if h[m](xj) = yj then w[j] <- w[j] * error/(1-error) 10

Sample Application: Insect Recognition Doroneuria (Dor) Using circular regions of interest selected by an interest operator, train a classifier to recognize the different classes of insects. 11

Boosting Comparison ADTree classifier only (alternating decision tree) Correctly Classified Instances 268 70.1571 % Incorrectly Classified Instances 114 29.8429 % Mean absolute error 0.3855 Relative absolute error 77.2229 % Classified as -> Hesperperla Doroneuria Real Hesperperlas Real Doroneuria 167 28 51 136 12

Boosting Comparison AdaboostM1 with ADTree classifier Correctly Classified Instances 303 79.3194 % Incorrectly Classified Instances 79 20.6806 % Mean absolute error 0.2277 Relative absolute error 45.6144 % Classified as -> Hesperperla Doroneuria Real Hesperperlas Real Doroneuria 167 28 51 136 13

Boosting Comparison RepTree classifier only (reduced error pruning) Correctly Classified Instances 294 75.3846 % Incorrectly Classified Instances 96 24.6154 % Mean absolute error 0.3012 Relative absolute error 60.606 % Classified as -> Hesperperla Doroneuria Real Hesperperlas Real Doroneuria 169 41 55 125 14

Boosting Comparison AdaboostM1 with RepTree classifier Correctly Classified Instances 324 83.0769 % Incorrectly Classified Instances 66 16.9231 % Mean absolute error 0.1978 Relative absolute error 39.7848 % Classified as -> Hesperperla Doroneuria Real Hesperperlas Real Doroneuria 180 30 36 144 15

References AdaboostM1: Yoav Freund and Robert E. Schapire (1996). "Experiments with a new boosting algorithm". Proc International Conference on Machine Learning, pages 148-156, Morgan Kaufmann, San Francisco. ADTree: Freund, Y., Mason, L.: "The alternating decision tree learning algorithm". Proceeding of the Sixteenth International Conference on Machine Learning, Bled, Slovenia, (1999) 124-133. 16

17

Neural Net Learning Motivated by studies of the brain. A network of artificial neurons that learns a function. Doesn t have clear decision rules like decision trees, but highly successful in many different applications. (e.g. face detection) Our hierarchical classifier used neural net classifiers as its components. 18

Back-Propagation Illustration ARTIFICIAL NEURAL NETWORKS Colin Fahey's Guide (Book CD)

Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable to neural nets in a handwriting recognition class. Was introduced to computer vision researchers by Tomaso Poggio at MIT who started using it for face detection and got better results than neural nets. Has become very popular and widely used with packages available. 32

Support Vector Machines (SVM) Support vector machines are learning algorithms that try to find a hyperplane that separates the different classes of data the most. They are a specific kind of kernel machines based on two key ideas: maximum margin hyperplanes a kernel trick 33

Maximal Margin (2 class problem) In 2D space, a hyperplane is a line. margin In 3D space, it is a plane. hyperplane Find the hyperplane with maximal margin for all the points. This originates an optimization problem which has a unique solution. 34

Support Vectors The weights α i associated with data points are zero, except for those points closest to the separator. The points with nonzero weights are called the support vectors (because they hold up the separating plane). Because there are many fewer support vectors than total data points, the number of parameters defining the optimal separator is small. 35

36

Kernels A kernel is just a similarity function. It takes 2 inputs and decides how similar they are. Kernels offer an alternative to standard feature vectors. Instead of using a bunch of features, you define a single kernel to decide the similarity between two objects. 37

Kernels and SVMs Under some conditions, every kernel function can be expressed as a dot product in a (possibly infinite dimensional) feature space (Mercer s theorem) SVM machine learning can be expressed in terms of dot products. So SVM machines can use kernels instead of feature vectors. 38

The Kernel Trick The SVM algorithm implicitly maps the original data to a feature space of possibly infinite dimension in which data (which is not separable in the original space) becomes separable in the feature space. Original space R k 0 0 1 1 0 1 1 0 0 Kernel trick Feature space R n 1 0 0 0 0 0 1 1 1 39

Kernel Functions The kernel function is designed by the developer of the SVM. It is applied to pairs of input data to evaluate dot products in some corresponding feature space. Kernels can be all sorts of functions including polynomials and exponentials. 40

Kernel Function used in our 3D Computer Vision Work k(a,b) = exp(-θ 2 AB/σ 2 ) A and B are shape descriptors (big vectors). θ is the angle between these vectors. σ 2 is the width of the kernel. 41

What do SVMs solve? The SVM is looking for the best separating plane in its alternate space. It solves a quadratic programming optimization problem argmax Σα j -1/2 Σα j α k y j y k (x j x k ) α j j,k subject to α j > 0 and Σα j y j = 0. j The equation for the separator for these optimal α j is h(x) = sign(σα j y j (x x j ) b) j 42

Unsupervised Learning Find patterns in the data. Group the data into clusters. Many clustering algorithms. K means clustering EM clustering Graph-Theoretic Clustering Clustering by Graph Cuts etc 43

Clustering by K-means Algorithm Form K-means clusters from a set of n-dimensional feature vectors 1. Set ic (iteration count) to 1 2. Choose randomly a set of K means m 1 (1),, m K (1). 3. For each vector x i, compute D(x i,m k (ic)), k=1, K and assign x i to the cluster C j with nearest mean. 4. Increment ic by 1, update the means to get m 1 (ic),,m K (ic). 5. Repeat steps 3 and 4 until C k (ic) = C k (ic+1) for all k. 44

K-Means Classifier (shown on RGB color data) original data one RGB per pixel color clusters 45

K-Means EM The clusters are usually Gaussian distributions. Boot Step: Initialize K clusters: C 1,, C K (µ j, Σ j ) and P(C j ) for each cluster j. Iteration Step: Estimate the cluster of each datum p( C j xi ) Re-estimate the cluster parameters ( j, Σ j ), p( C j ) µ For each cluster j Expectation Maximization The resultant set of clusters is called a mixture model; if the distributions are Gaussian, it s a Gaussian mixture. 46

47 EM Algorithm Summary Boot Step: Initialize K clusters: C 1,, C K Iteration Step: Expectation Step Maximization Step (µ j, Σ j ) and p(c j ) for each cluster j. = = j j j i j j i i j j i i j C p C x p C p C x p x p C p C x p x C p ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( = Σ i i j T j i j i i i j j x C p x x x C p ) ( ) ( ) ( ) ( µ µ = i i j i i i j j x C p x x C p ) ( ) ( µ N x C p C p i i j j = ) ( ) (

EM Clustering using color and texture information at each pixel (from Blobworld) 48

EM for Classification of Images in Terms of their Color Regions Initial Model for trees Final Model for trees EM Initial Model for sky Final Model for sky 49

Sample Results cheetah 50

Sample Results (Cont.) grass 51

Sample Results (Cont.) lion 52