Bayes Risk. Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Discriminative vs Generative Models. Loss functions in classifiers

Similar documents
Classifiers for Recognition Reading: Chapter 22 (skip 22.3)

Supervised Sementation: Pixel Classification

CS4495/6495 Introduction to Computer Vision. 8C-L1 Classification: Discriminative models

Announcements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14

Image Analysis. Window-based face detection: The Viola-Jones algorithm. iphoto decides that this is a face. It can be trained to recognize pets!

Discriminative classifiers for image recognition

Classifiers and Detection. D.A. Forsyth

Object Recognition. Lecture 11, April 21 st, Lexing Xie. EE4830 Digital Image Processing

Skin and Face Detection

Face/Flesh Detection and Face Recognition

Generic Object-Face detection

Object recognition (part 1)

Previously. Window-based models for generic object detection 4/11/2011

Object detection as supervised classification

Segmentation by Clustering Reading: Chapter 14 (skip 14.5)

9.913 Pattern Recognition for Vision. Class I - Overview. Instructors: B. Heisele, Y. Ivanov, T. Poggio

Segmentation by Clustering. Segmentation by Clustering Reading: Chapter 14 (skip 14.5) General ideas

Machine Learning Lecture 3

Classification and Detection in Images. D.A. Forsyth

Window based detectors

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach

Artificial Neural Networks (Feedforward Nets)

Statistical Learning Part 2 Nonparametric Learning: The Main Ideas. R. Moeller Hamburg University of Technology

What is an edge? Paint. Depth discontinuity. Material change. Texture boundary

Chamfer matching. More on template matching. Distance transform example. Computing the distance transform. Shape based matching.

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

convolution shift invariant linear system Fourier Transform Aliasing and sampling scale representation edge detection corner detection

Classification Algorithms in Data Mining

The exam is closed book, closed notes except your one-page cheat sheet.

Categorization by Learning and Combining Object Parts

Machine Learning Lecture 3

Anno accademico 2006/2007. Davide Migliore

A robust method for automatic player detection in sport videos

A Hybrid Face Detection System using combination of Appearance-based and Feature-based methods

Machine Learning 13. week

Face detection. Bill Freeman, MIT April 5, 2005

Applications Video Surveillance (On-line or off-line)

Object Detection System

Object recognition. Methods for classification and image representation

Texture. Texture. 2) Synthesis. Objectives: 1) Discrimination/Analysis

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

Object Category Detection: Sliding Windows

Data Mining Classification: Bayesian Decision Theory

Machine Learning Lecture 3

Pattern Recognition ( , RIT) Exercise 1 Solution

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

CS 584 Data Mining. Classification 1

Recap Image Classification with Bags of Local Features

Face Detection Using Convolutional Neural Networks and Gabor Filters

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Big Data Era Time Domain Astronomy

Facial Expression Classification with Random Filters Feature Extraction

Object Category Detection: Sliding Windows

Object Detection Design challenges

ECG782: Multidimensional Digital Signal Processing

An Object Detection System using Image Reconstruction with PCA

Non-Parametric Modeling

Algorithms for Recognition of Low Quality Iris Images. Li Peng Xie University of Ottawa

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Generative and discriminative classification techniques

Tutorial on Machine Learning Tools

CS 534: Computer Vision Texture

Oriented Filters for Object Recognition: an empirical study

Template Matching Rigid Motion

Machine Learning and Pervasive Computing

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Final Exam Study Guide

CS6716 Pattern Recognition

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Templates and Background Subtraction. Prof. D. Stricker Doz. G. Bleser

2. Basic Task of Pattern Classification

Boosting Sex Identification Performance

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

Object Recognition II

Detecting Pedestrians Using Patterns of Motion and Appearance

Human detection solution for a retail store environment

Lecture 3. Oct

Subject-Oriented Image Classification based on Face Detection and Recognition

human vision: grouping k-means clustering graph-theoretic clustering Hough transform line fitting RANSAC

Computation of Rotation Local Invariant Features using the Integral Image for Real Time Object Detection

Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade

Generic Object Detection Using Improved Gentleboost Classifier

Recognition problems. Face Recognition and Detection. Readings. What is recognition?

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Detecting Pedestrians Using Patterns of Motion and Appearance (Viola & Jones) - Aditya Pabbaraju


PART VI High-Level Vision: Probabilistic and Inferential Methods

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Verification: is that a lamp? What do we mean by recognition? Recognition. Recognition

Image Segmentation. Ross Whitaker SCI Institute, School of Computing University of Utah

Template Matching Rigid Motion. Find transformation to align two images. Focus on geometric features

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

Histograms of Oriented Gradients for Human Detection p. 1/1

Rapid Natural Scene Text Segmentation

Other Linear Filters CS 211A

Robert Collins CSE598G. Robert Collins CSE598G

Transcription:

Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Examine each window of an image Classify object class within each window based on a training set images Example: A Classification Problem Categorize images of fish say, Atlantic salmon vs. Pacific salmon Use features such as length, width, lightness, fin shape & number, mouth position, etc. Steps 1. Preprocessing (e.g., background subtraction) 2. Feature extraction 3. Classification Slide credits for this chapter: Frank Dellaert, Forsyth & Ponce, Paul Viola, Christopher Rasmussen example from Duda & Hart Bayes Risk Some errors may be inevitable: the minimum risk (shaded area) is called the Bayes risk Discriminative vs Generative Models Finding a decision boundary is not the same as modeling a conditional density. Probability density functions (area under each curve sums to 1) Loss functions in classifiers Loss some errors may be more expensive than others e.g. a fatal disease that is easily cured by a cheap medicine with no side-effects -> false positives in diagnosis are better than false negatives We discuss two class classification: L(1->2) is the loss caused by calling 1 a 2 Total risk of using classifier s Histogram based classifiers Use a histogram to represent the class-conditional densities (i.e. p(x 1), p(x 2), etc) Advantage: Estimates converge towards correct values with enough data Disadvantage: Histogram becomes big with high dimension so requires too much data but maybe we can assume feature independence? 1

Example Histograms Kernel Density Estimation Parzen windows: Approximate probability density by estimating local density of points (same idea as a histogram) Convolve points with window/kernel function (e.g., Gaussian) using scale parameter (e.g., sigma) from Hastie et al. Density Estimation at Different Scales Example: Kernel Density Estimation Decision Boundaries Example: Density estimates for 5 data points with differentlyscaled kernels Scale influences accuracy vs. generality (overfitting) from Duda et al. from Duda et al. Smaller Larger Application: Skin Colour Histograms Skin has a very small range of (intensity independent) colours, and little texture Compute colour measure, check if colour is in this range, check if there is little texture (median filter) Get class conditional densities (histograms), priors from data (counting) Classifier is Skin Colour Models Skin chrominance points courtesy of G. Loy Smoothed, [0,1]-normalized 2

Skin Colour Classification Results courtesy of G. Loy Figure from Statistical color models with application to skin detection, M.J. Jones and J. Rehg, Proc. Computer Vision and Pattern Recognition, 1999 copyright 1999, IEEE ROC Curves (Receiver operating characteristics) Plots trade-off between false positives and false negatives for different values of a threshold Nearest Neighbor Classifier Assign label of nearest training data point to each test data point Figure from Statistical color models with application to skin detection, M.J. Jones and J. Rehg, Proc. Computer Vision and Pattern Recognition, 1999 copyright 1999, IEEE from Duda et al. Voronoi partitioning of feature space for 2-category 2-D and 3-D data K-Nearest Neighbors For a new point, find the k closest points from training data Labels of the k points vote to classify Avoids fixed scale choice uses data itself (can be very important in practice) Simple method that works well if the distance measure correctly weights the various dimensions Neural networks Compose layered classifiers Use a weighted sum of elements at the previous layer to compute results at next layer Apply a smooth threshold function from each layer to the next (introduces non-linearity) Initialize the network with small random weights Learn all the weights by performing gradient descent (i.e., perform small adjustments to improve results) k = 5 Example density estimate from Duda et al. 3

Output units Hidden units Training Adjust parameters to minimize error on training set Perform gradient descent, making small changes in the direction of the derivative of error with respect to each parameter Stop when error is low, and hasn t changed much Network itself is designed by hand to suit the problem, so only the weights are learned Input units Architecture of the complete system: they use another neural net to estimate orientation of the face, then rectify it. They search over scales to find bigger/smaller faces. The vertical face-finding part of Rowley, Baluja and Kanade s system Figure from Rotation invariant neural-network based face detection, H.A. Rowley, S. Baluja and T. Kanade, Proc. Computer Vision and Pattern Recognition, 1998, copyright 1998, IEEE Figure from Rotation invariant neural-network based face detection, H.A. Rowley, S. Baluja and T. Kanade, Proc. Computer Vision and Pattern Recognition, 1998, copyright 1998, IEEE Face Finder: Training Positive examples: Preprocess ~1,000 example face images into 20 x 20 inputs Generate 15 clones of each with small random rotations, scalings, translations, reflections Negative examples Test net on 120 known no-face images from Rowley et al. Face Finder: Results 79.6% of true faces detected with few false positives over complex test set 135 true faces 125 detected 12 false positives from Rowley et al. from Rowley et al. 4

Face Finder Results: Examples of Misses Find the face! from Rowley et al. The human visual system needs to apply serial attention to detect faces (context often helps to predict where to look) Convolutional neural networks Template matching using NN classifiers seems to work Low-level features are linear filters why not learn the filter kernels, too? A convolutional neural network, LeNet; the layers filter, subsample, filter, subsample, and finally classify based on outputs of this process. Figure from Gradient-Based Learning Applied to Document Recognition, Y. Lecun et al Proc. IEEE, 1998 copyright 1998, IEEE Support Vector Machines Try to obtain the decision boundary directly potentially easier, because we need to encode only the geometry of the boundary, not any irrelevant wiggles in the posterior. Not all points affect the decision boundary LeNet is used to classify handwritten digits. Notice that the test error rate is not the same as the training error rate, because the learning overfits to the training data. Figure from Gradient-Based Learning Applied to Document Recognition, Y. Lecun et al Proc. IEEE, 1998 copyright 1998, IEEE 5

Support Vectors Pedestrian Detection with SVMs Figure from, A general framework for object detection, by C. Papageorgiou, M. Oren and T. Poggio, Proc. Int. Conf. Computer Vision, 1998, copyright 1998, IEEE Figure from, A general framework for object detection, by C. Papageorgiou, M. Oren and T. Poggio, Proc. Int. Conf. Computer Vision, 1998, copyright 1998, IEEE 6