Basic Problem Addressed. The Approach I: Training. Main Idea. The Approach II: Testing. Why a set of vocabularies?

Similar documents
Determinant of homography-matrix-based multiple-object recognition

Human Detection and Action Recognition. in Video Sequences

Multiple-Choice Questionnaire Group C

Metric Learning for Large Scale Image Classification:

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi

Part-based and local feature models for generic object recognition

Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study

Part based models for recognition. Kristen Grauman

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

Bag-of-features. Cordelia Schmid

Efficient Kernels for Identifying Unbounded-Order Spatial Features

Improving Recognition through Object Sub-categorization

Patch-based Object Recognition. Basic Idea

Computer Vision for HCI. Topics of This Lecture

Object Classification Problem

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Image Features: Detection, Description, and Matching and their Applications

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

Tensor Decomposition of Dense SIFT Descriptors in Object Recognition

Lecture 12 Recognition

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Visual Object Recognition

COMPARISON OF MODERN DESCRIPTION METHODS FOR THE RECOGNITION OF 32 PLANT SPECIES

Fusing shape and appearance information for object category detection

Image classification Computer Vision Spring 2018, Lecture 18

String distance for automatic image classification

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

CS229: Action Recognition in Tennis

Beyond Bags of Features

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Visual Object Recognition

T.Harini et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (5), 2011,

Introduction. Introduction. Related Research. SIFT method. SIFT method. Distinctive Image Features from Scale-Invariant. Scale.

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

Local Features based Object Categories and Object Instances Recognition

Lecture 12 Recognition. Davide Scaramuzza

Instance-level recognition II.

Evaluation of GIST descriptors for web scale image search

Objects classification in still images using the region covariance descriptor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

CS 4495 Computer Vision Classification 3: Bag of Words. Aaron Bobick School of Interactive Computing

A Tale of Two Object Recognition Methods for Mobile Robots

Evaluation and comparison of interest points/regions

Linear combinations of simple classifiers for the PASCAL challenge

CPPP/UFMS at ImageCLEF 2014: Robot Vision Task

Video Google faces. Josef Sivic, Mark Everingham, Andrew Zisserman. Visual Geometry Group University of Oxford

Ranking Error-Correcting Output Codes for Class Retrieval

Selection of Scale-Invariant Parts for Object Class Recognition

2D Image Processing Feature Descriptors

Three things everyone should know to improve object retrieval. Relja Arandjelović and Andrew Zisserman (CVPR 2012)

Object and Class Recognition I:

Instance-level recognition part 2

Metric Learning for Large-Scale Image Classification:

Lecture 16: Object recognition: Part-based generative models

Fuzzy based Multiple Dictionary Bag of Words for Image Classification

SUPPORT VECTOR MACHINES

Performance Evaluation of Scale-Interpolated Hessian-Laplace and Haar Descriptors for Feature Matching

From Structure-from-Motion Point Clouds to Fast Location Recognition

Linear methods for supervised learning

CSE 573: Artificial Intelligence Autumn 2010

CS6670: Computer Vision

Content-Based Image Classification: A Non-Parametric Approach

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

Object Detection Using Segmented Images

Image Processing. Image Features

Today. Main questions 10/30/2008. Bag of words models. Last time: Local invariant features. Harris corner detector: rotation invariant detection

Aggregating Descriptors with Local Gaussian Metrics

Specular 3D Object Tracking by View Generative Learning

CS 343: Artificial Intelligence

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

Multimedia Retrieval Exercise Course 10 Bag of Visual Words and Support Vector Machine

Generic object recognition using graph embedding into a vector space

6.819 / 6.869: Advances in Computer Vision

Video Google: A Text Retrieval Approach to Object Matching in Videos

A Survey on Image Classification using Data Mining Techniques Vyoma Patel 1 G. J. Sahani 2

OBJECT CATEGORIZATION

Lecture 9: Support Vector Machines

An Evaluation of Volumetric Interest Points

Classification: Feature Vectors

The SIFT (Scale Invariant Feature

INRIA-LEAR S VIDEO COPY DETECTION SYSTEM. INRIA LEAR, LJK Grenoble, France 1 Copyright detection task 2 High-level feature extraction task

Ship Classification Using an Image Dataset

Local features: detection and description. Local invariant features

Sampling Strategies for Object Classifica6on. Gautam Muralidhar

Prof. Feng Liu. Spring /26/2017

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

A Supervised Learning Framework for Generic Object Detection in Images

Applying Supervised Learning

Discriminative classifiers for image recognition

A Discriminative Framework for Texture and Object Recognition Using Local Image Features

Video annotation based on adaptive annular spatial partition scheme

Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

Recognition with Bag-ofWords. (Borrowing heavily from Tutorial Slides by Li Fei-fei)

Artistic ideation based on computer vision methods

Efficient Representation of Local Geometry for Large Scale Object Retrieval

Satellite Scene Image Classification for Adaptive Target Tracking

VOTING METHODS FOR GENERIC OBJECT DETECTION

Effect of Salient Features in Object Recognition

Pattern recognition (3)

Transcription:

Visual Categorization With Bags of Keypoints. ECCV,. G. Csurka, C. Bray, C. Dance, and L. Fan. Shilpa Gulati //7 Basic Problem Addressed Find a method for Generic Visual Categorization Visual Categorization: Identifying whether objects of one or more types are present in an image. Generic: Method generalizes to new object types. Invariant to scale, rotation, affine transformation, lighting changes, occlusion, intra-class variations etc. Main Idea Applying the bag-of-keywords approach for text categorization to visual categorization. Constructing vocabulary of feature vectors from clustered descriptors of images. The Approach I: Training Extract interest points from a dataset of training images and attach descriptors to them. Cluster the keypoints and construct a set of vocabularies (Why a set? Next slide. Train a multi-class qualifier using bags-of-keypoints around the cluster centers. Why a set of vocabularies? The approach is motivated by text categorization (spam filtering for example. For text, the keywords have a clear meaning (Lottery! Deal! Affine Invariance. Hence finding a vocabulary is easy. For images, keypoints don t necessarily have repeatable meanings. Hence find a set, then experiment and find the best vocabulary and classifier. The Approach II: Testing Given a new image, get its keypoint descriptors. Label each keypoint with its closest cluster center in feature space. Categorize the objects using the multi-class classifier learnt earlier: Naïve Bayes Support Vector Machines (SVMs

Feature Extraction and Description From a database of images: Extract interest points using Harris affine detector. It was shown in Mikolajczyk and Schmid ( that scale invariant interest point detectors are not sufficient to handle affine transformations. Attach SIFT descriptors to the interest points. A SIFT description is 8 dimension vector. SIFT descriptors were found to be best for matching in Mikolajczyk and Schmid (. Visual Vocabulary Construction Use a k-means clustering algorithm to form a set of clusters of feature vectors. The feature vectors associated with the cluster centers (V..V m form a vocabulary. Vocabulary is V = {V, V..,V m } Construct multiple vocabularies. Find multiple sets of clusters using different values of k. V V V m 7 Slide inspired by [] 8 Clustering Example All features Clusters Categorization by Naïve Bayes I: Training Extract keypoint descriptors from a set of labeled images. Put the descriptor in the cluster or bag with minimum distance from cluster center. Count the number of keypoints in each bag. Minimum distance from V F Image of category C i n i n i n im V V V m n ij is the total number of times a feature near V j occurs in training images of category i Image taken from [] If a feature in image I is nearest to cluster center Vj, we say that keypoint j has occurred in image I Slide inspired by [] Categorization by Naïve Bayes II: Training For each category C i, P(C i = Number of images of category C i / Total number of images In all images I of category C i, For each keypoint V j P (V j C i = Number of keypoints V j in I / Total number of keypoints in I = n ij / n i But use Laplace smoothing to avoid numbers near zero. P (Vj C i = (n ij + / (n i + V Categorization by Naïve Bayes III: Testing P (C i Image = βp(c i P(Image C i = βp(c i P(V, V,..,V m C i = βp(c i m i= P( V C i i Slide inspired by [] Slide inspired by []

SVM: Brief Introduction SVM classifier finds a hyperplane that separates two-class data with maximum margin. maximum margin hyperplane. Equation f(x is the target (classifying function support vectors Two class dataset with linearly separable classes. Maximum margin hyperplane give greatest separation between classes. The data instances closest to the hyperplane are called support vectors. Categorization by SVM I: Training The classifying function is f (x = sign ( i y i β i K(x, x i + b x i is a feature vector from the training images, y i is the label for x i (yes, in category C i, or no not in C i, β i and b have to be learnt. Data is not always linearly separable (Non linear SVM A function Φ maps original data space to higher dimensional space. K(x, x i = Φ(x.Φ(x i Categorization by SVM II: Training For an image of category C i, x i is a vector formed by the number of occurrences of keypoints V in the image. The parameters are sometimes learnt using Sequential Quadratic Programming. The approach used in the paper is not mentioned. For the m class problem, the authors train m SVMs, each to distinguish Categorization by SVM III: Testing Given a query image, assign it to the category with the highest SVM output. some category C i from the other m-. Experiments Two databases DB: In-house. 77 images. 7 object classes: faces, buildings, trees, cars, phones, bikes. Some images contain objects from multiple classes. But large proprtion of image is occupied by target image. DB: Freely available from various sites. About images. object classes: faces, airplanes, cars (rear, cars(side and motorbikes(side. Performance Metrics Confusion Matrix, M m ij = Number of images from category j identified by the classifier as category i. Overall Error Rate, R Accuracy = Total number of correctly classified test images/total number of test images R = Accuracy Mean Rank, MR MR for category j = E [rank of class j in classified output true class is j] 7 8

Finding Value of k Naïve Bayes Results for DB Error rate decreases with increasing k. Decrease is low after k >. Choose k =. Good tradeoff between accuracy and speed. selected operating point True Mean rank 7 8..88 8. 7. 7 7. Confusion Matrix for Naïve Bayes on DB Overall error rate = 8% 7.7.7 Graph of error rate vs. k for Naïve Bayes for DB Graph is taken from [] Table taken from [] SVM Results Linear SVM gives best results out of linear, quadratic and cubic, except for cars. Quadratic gives best results on cars. How do we know these will work for other categories? What if we have to use higher degrees? Only time and more experiments will tell. SVM Results Results for DB True Mean rank 8..77 8.8 8..8. Confusion Matrix for SVM on DB Error rate for faces = %. But increased rate of confusion with other categories due to larger number of faces in the training set. 7. Overall error rate = % Multiple Object Instances: Correctly Classified Partially Visible Objects: Correctly Classified Images taken from [] Images taken from []

Images with Multi-Category Objects Conclusions Good results for 7 category database. However time information (for training and testing not provided! SVMs superior to Naïve Bayes. Robust to background clutter. Extension is to test on databases where the target object does NOT form a large fraction of the image. May need to include geometric information. Images taken from [] References. G. Csurka, C. Bray, C. Dance, and L. Fan. Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, ECCV,.. Gabriela Csurka, Jutta Willamowski, Christopher Dance. Xerox Research Centre Europe, Grenoble, France. Weak Geometry for Visual Categorization. Presentation Slides.. R. Mooney. Computer Science Department, University of Texas at Austin. CS L: Machine Learning - Text Categorization. Lecture Slides. 7 SVM Results Results on DB (frontal Airplanes (side (rear (side Motorbikes (side Mean rank (frontal...7..7 Airplanes (side...... (rear.7. 7.7... Confusion Matrix for SVM on DB (side.... Motorbikes (side..7...7. 8