OBJECT CATEGORIZATION

Size: px
Start display at page:

Download "OBJECT CATEGORIZATION"

Transcription

1 OBJECT CATEGORIZATION Ing. Lorenzo Seidenari Slides: Ing. Lamberto Ballan November 18th, 2009

2 What is an Object? Merriam-Webster Definition: Something material that may be perceived by the senses. You already know two tasks about objects: Single object recognition: find that logo Object categorization: find a face

3 Why do we care about categorization? Perception of function: We can perceive the 3D shape, texture, material properties, without knowing about objects. But, the concept of category encapsulates also information about what can we do with those objects. We therefore include the perception of function as a proper indeed, crucial- subject for vision science, from Vision Science, chapter 9, Palmer. ICCV09 Short Course: Fei Fei, Torralba, Fergus

4 The perception of function Direct perception (affordances): Flat surface Horizontal Knee-high Sittable upon affordance: quality of an object that allows to perform an action (Gibson 1977). Mediated perception (Categorization) Flat surface Horizontal Knee-high Chair Sittable upon Chair Chair Chair? ICCV09 Short Course: Fei Fei, Torralba, Fergus

5 Direct perception Some aspects of an object function can be perceived directly Functional form: Some forms clearly indicate to a function ( sittable-upon, container, cutting device, ) Sittable-upon Sittable-upon It does not seem easy to sit-upon this Sittable-upon ICCV09 Short Course: Fei Fei, Torralba, Fergus

6 Text Indexing and Categorization Text categorization: the task is to assign a document to one or more categories based on its content is it something about medicine/biology? is it a document about business? Why is it useful? Detecting and indexing similar text/documents in large corpora Clustering document by topic Extracting mid/high level concepts from documents The Bag of Words (BoW) model, combined with advanced classification techniques, is able to perform state-of-the-art results A text - such a sentence or a document - is represented as an unordered collection of words, disregarding grammar and even word order; Three elements: i) a vocabulary, ii) an histogram representation of documents, iii) a classification method

7 Same approach usable with visual data An image can be treated as a document, and features extracted from the image are considered as the "visual words"... image of an object category bag of visual words D1: face D2: bike D3: violin Bag of (visual) Words: an image is represented as an unordered collection of visual words Vocabulary (codewords)

8 Why is it useful? Object recognition and categorization Bag of Visual Words have been successfully used to object categorization in images (e.g. faces, car, airplanes...) and, more recently, for action recognition in video sequences (e.g. running, walking, clapping...) Aim: find (annotate) objects in this photo (a very optimistic result...)

9 Three stages 1. Codebook (vocabulary) formation and feature assignment Given a training set, local descriptors (e.g. SIFT) are collected and a clustering algorithm is used to perform a quantization of the feature space Then, each cluster s center is used as an iconic word and local descriptors are assigned to the nearest word using an appropriate distance (e.g. Euclidean); the result is a Bag-of-Words representation 2. Train a classifier to discriminate vectors corresponding to positive and negative training images Usually Support Vector Machines (SVM) are used as classifiers 3. Apply the trained classifiers to the test image Note: the approach is the same... but the first stage - codebook formation and feature assignment - is really challenging because visual words have to be defined in advance using a clustering algorithm (e.g. k-means)

10 Note: we have to train a classifier (detector) for each object class... Training images Test images Courtesy A. Zisserman

11 Feature detection Given an image, feature detection is the process of extracting local patches (regions) There are several methods: Random sampling Regular grid (dense sampling): the image is segmented by some horizontal and vertical lines It shows very good results for natural scene categorization (Fei-Fei and Perona, CVPR 2005) Interest Points (sparse sampling): local patches are detected by interest point detectors that are able to select salient regions (such as edges, corners, blobs); several different techniques (Mikolajczyk et al., IJCV 2005) Harris corner detector Difference of Gaussian (DoG); it is the SIFT detector (Lowe, IJCV 2004) Affine covariant patches

12 Regular grid It is probably the most simple method for feature detection An evenly sampled grid spaced at given values (e.g. 10x10 pixels) for a given image Despite of its simplicity, it provides good results for textures and natural scenes because it is able to describe more regions respect to interest points techniques

13 Interest points Local patches are detected at most salient regions (such as the regions attracting human attention) It use more information about the image itself respect to random or grid sampling An example of local patches detected using affine covariant features

14 Comparison: dense (grid) sampling vs sparse (interest points) sampling Dense sampling Advantage: it is able to describe the global content of an image Disadvantage: it uses little information of an image itself It has been used successfully for textures and natural scenes categorization Sparse sampling Advantage: it is able to detect ( select ) salient regions that are related to the more attractive and informative regions Disadvantage: depending on the interest points technique and the type/resolution of the image, sometimes only few regions are detected It has been used for specific object recognition and categorization (better for describing background/foreground)

15 Note: this is the basis of the popular SIFT, HOG, (Generalized)Shape Context methods Feature representation Local features are represented by local descriptors Several different information can be used but, usually, edge or gradient orientation histograms are the most common choice Common framework: Divide local region into spatial cells Calculate orientation of image gradient at each pixel Pool quantized orientations over each cell: i) descriptor contains an orientation histogram for each cell, ii) weight votes by gradient magnitude

16 SIFT descriptors However, the most common choice is the SIFT descriptor (because it exhibits the highest matching accuracies) Standard SIFT is computed as a set of orientation histograms on 4x4 pixel neighborhoods (contribution of each pixel is weighted by the gradient magnitude and by a gaussian equal to 1.5 times the scale of the keypoint) histograms contains 8 bins each (corresponding to 8 orientations) each descriptor contains a 4x4 array of 16 histograms around the keypoint this leads to a SIFT descriptor with (4x4x8) 128 elements Fig: standard SIFT descriptor

17 Combined feature descriptors Local descriptors (like SIFT) are usually based only on luminance and shape, so they use grey-scale values and ignore color it is very difficult to select a color model that it sufficiently robust and general nevertheless, color is very important to describe/distinguish objects or scenes Different types of descriptor can be combined to improve representation; the most common combination is between a local shape-descriptor (e.g. SIFT) and a color descriptor (e.g. color histogram in a smart color space like Luv or HSV) Figure: an example of color-sift descriptor (van de Weijer and Schmid, ECCV 2006). The combined descriptor is obtained by fusion of standard SIFT and a Hue descriptor calculated in a Color Invariant Space. Courtesy J. van de Weijer

18 Codebook formation The Bag-of-Words model is built through the creation of a discrete visual vocabulary (codebook) A vocabulary in the object/scene classification domain is commonly obtained by following one of two approaches: Annotation approach Data-driven approach Annotation approach: A vocabulary is obtained by assigning meaningful labels to image patches (e.g. sky, water, vegetation, etc.) Data-driven approach: It is required to perform a vector quantization for large sets of feature-vectors (usually in a high-dimensional space) This is performed by clustering of feature vectors

19 Data-driven approach Visual words are defined by clustering of feature vectors. An example:

20 The performance of this approach depends on the quantization method and on the number of words that are selected The most common quantization approach is the use of k-means clustering: the main reasons are its simplicity and convergence speed Examples of visual words: Courtesy A. Zisserman

21 k-means clustering It is an algorithm to cluster n objects, based on their feature-vector representation, into k<n partitions The objective it tries to achieve is to minimize global intra-cluster variance, or the squared error function: where k is the number of clusters, Si (i=1,...,k) are cluster partitions, and μi is the centroid (or mean point) of all the points xj Si The most common form of k-means is the Lloyd s algorithm: - They are often used as synonymously, but in reality Lloyd s method is an heuristic for solving the k-means problem - Other variations exist but it has remained popular because it converges extremely rapidly in practice

22 k-means clustering: Lloyd s algorithm Lloyd s algorithm is an heuristic iterative solution for the k-means problem 1. It starts by partitioning the n input points into k initial sets, either at random or using some heuristic data 2. It then calculates the centroid μi of each set Si (with i=1,...,k) 3. It constructs a new partition by associating each point with the closest centroid 4. Finally, the centroids are recalculated for the new clusters, and algorithm repeated by alternate application of these two steps until convergence (which is obtained when i) the points no longer switch cluster or ii) centroids are no longer changed) 1,2) initial random centroids 3) new partition by associating points to nearest centroid 4) centroids are moved to the center of their clusters convergence

23 k-means disadvantages Despite of its popularity, the use of k-means clustering for codebook formation is not the optimal solution It has some main disadvantages: 1. the number of visual words has to be known in advance 2. the clustering is not very robust w.r.t. outliers 3. cluster centers are attracted by the denser regions of the sample distribution, thus providing a more imprecise quantization for the vectors laying in these regions

24 This effect (3), due to the assumption of uniform distribution of the features in the descriptor space, is even more pronounced in high dimensional spaces A representation of this effect can be obtained visualizing a Voronoi tessellation of the feature space: k-means (Voronoi tesselletion) Detail of a dense region that has been split in 4 clusters Note: Voronoi cells do not uniformly cover the feature space...

25 Radius-based clustering Given n vectors, the algorithm starts with an uniform random subsampling s of the original dataset (thus s n) For each xi s (grey circles in fig.), a mean-shift procedure is initialized mean-shift is a procedure for locating the modes of a samples distribution (in other words, it is able to find the densest regions of the distribution) Given a radius R, mean-shift clustering on s is used to find the modes A new cluster center is then allocated on the mode corresponding to the maximal density region All vectors on the original set n within a distance < R from the center are labeled as members of this clusters and eliminated for the following iterations it prevents the algorithm from repeatedly assigning centers to the same high-density region It can be stopped when a sufficient number of clusters (words) has been identified R

26 In this way, cluster centers are allocated more uniformly A representation of this effect can be obtained visualizing a Voronoi tessellation of the feature space (it is compared to k-means) k-means clustering (Voronoi tesselletion) radius-based clustering (Voronoi tesselletion) Note: this dense region, that has been split in 4 clusters using k-means, now is correctly coded by radius-based clustering

27 Meanshift Meanshift estimator find distribution modes non parametrically Radius-based clustering 1. Subsampling 2. Use a meanshift estimator for each point 3. Densest mode M is found. 4. Each point at distance r from M. 5. Assigned points are removed. 6. Stop if enough cluster are found or no more clusters are found..

28 Feature assignment Represent an image as a histogram of visual word frequencies Given the codebook generated in the training stages, each region extracted from the test image has to be assigned to the corresponding visual word usually regions are represented by SIFT descriptors usually these features are hard-assigned to the nearest word (in terms of Euclidean distance) Feature (hard) assignment Feature detection Feature representation BoW model: histogram of visual words Courtesy A. Zisserman

29 Drawbacks: this hard assignment, that takes account only of the closest codeword, lacks to consider two issues: codeword uncertainty: i.e. the problem of selecting the correct codeword when two or more candidates are relevant codeword plausibility: i.e. the problem of selecting the correct codeword when all codewords are too far and no representative Feature assignment Figure: the small blue dots are image features the labeled red circles are codewords the yellow triangle represents an image feature that is correctly assigned to codeword b the green square is a example of the codeword uncertainty problem the light-blue diamond is an example of the codeword plausibility problem

30 A possible solution: a soft-assignment mechanism that is able to consider the information or two (or more) relevant candidates Recently, solutions based on kernel density estimation have been applied to feature assignment in codebook models (van Gemert et al., ECCV 2008, Philbin et al., CVPR 2008) In this way, the word frequency histogram is calculated by smoothing the hard assignment of features to the codeword vocabulary Hard-assignment Soft-assignment (Gaussian kernel)

31 Word frequency distributions In text classification, the problem of selecting a good vocabulary is only related to feature selection and vocabulary size Feature selection: pick only those terms that are really discriminant (e.g. using Mutual Information or Chi-Square statistics) Stop-words removal (the most frequent words like the, of, an, etc.) Stemming (the process for reducing inflected/derived words to their stem, base or root form) Given a natural language textual corpus, the words frequency distribution follows the well-known Zipf s law stop words Zipf's law: states that given a corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table (an ideal Zipf s distribution must be a straight line in log-log scale) most useful words

32 Visual words statistics Zipf s law is one of the basic assumption in text categorization according to this empirical evidence, we can consider words at intermediate frequencies as the most informative for classification Therefore, it is interesting to see how the visual words are distributed in a visual corpus in particular we want to know whether their distribution satisfies Zipf s law how their statistics is related to i) feature detection and ii) quantization techniques (see references [2,3] for more details)

33 An example: the distribution of visual words frequency using k-means and radius-based quantization Note: results are related to action categorization on KTH dataset

34 Vocabulary size Unlike the vocabulary of a text corpus whose size is relatively fixed, the size of a visual-words vocabulary is controlled by the number of clusters Choosing the right vocabulary size involves the trade-off between discriminativity and generalizability with small vocabularies, the visual word is not very discriminative because dissimilar features can map to the same codeword as the vocabulary size increases, the feature becomes more discriminative but meanwhile less generalizable (similar features can map to different codewords) There is no consensus as to the appropriate size of a visual vocabulary it can varies from several hundreds, to thousands and ten of thausands however, it is closely related to the dataset (e.g. image resolution) and the feature detection process usually the optimal size is fixed by experiments (see reference [3] for more details)

35 Usually in text categorization the vocabulary size is reduced, keeping only the most informative terms, using feature selection methods several methods: best results are obtained using Chi-Square statistic (CHImax) and Mutual information (IG) in text corpus, a good feature selection method is able to improve classification performances by reducing vocabulary size... is it the same for a visual corpus? text categorization object categorization

36 Classification Many different approaches; state-of-the-art results using BoW models are obtained by Support Vector Machines (SVM) classifiers An SVM classifier will construct a separating hyperplane in that space, one which maximizes the margin between the two data sets SVM is a binary classifier but, usually, in the visual domain it s extended to multi-class problems the original algorithm can be adapted to non-linear classification problems using the kernel trick method support vector Optimization problem: margin support vector (C is a regularization error-term usually fixed by cross-validation on the training set)

37 Non-linear classification: image features are described by high-dimensional feature vectors therefore, data are usually (always) not-linearly separable... linear kernel is sufficient linearly separable data not-linearly separable data use non-linear kernel (e.g. RBF) Optimal kernel choice: stata-of-the-art results are performed by Gaussian kernel using Chi-square as distance between histograms kernel: Chi-square distance:

38 A classification example (on two different datasets) results are closely related to the vocabulary size for big vocabularies linear kernel are the best choice (also, obviously, for computational costs)

39 Weakness of the BoW model No rigorous geometric information of the objects components It s intuitive that objects are made of parts... and relations between parts are really informative! An example ( face detection): Note: All have equal probability for BoW models Not extensively tested yet for viewpoint and scale invariance Segmentation and localization unclear These methods can suffer from poor recall

40 Applications Nowadays, BoW models have been successfully applied to the visual domain by several research groups; it is probably the most popular approach for large-scale categorization problems Object, Scene and Texture categorization in images Event and Action categorization in videos

41 References (1) J. Sivic and A. Zisserman. Video google: a text retrieval approach to object matching in videos. In: Proc. of ICCV, (2) F. Jurie and B. Triggs. Creating efficient codebooks for visual recognition. In: Proc. of ICCV, (3) J. Yang, Y.-G. Jiang, A. G. Hauptmann and C.-W. Ngo. Evaluating bag-of-visualwords representation in scene classification. In: Proc. of MIR, (4) L. Fei-Fei, R. Fergus, A. Torralba. Recognizing and learning object categories. CVPR 2007 short course (Slides, Matlab code, Datasets), URL:

BoW model. Textual data: Bag of Words model

BoW model. Textual data: Bag of Words model BoW model Textual data: Bag of Words model With text, categoriza9on is the task of assigning a document to one or more categories based on its content. It is appropriate for: Detec9ng and indexing similar

More information

Part-based and local feature models for generic object recognition

Part-based and local feature models for generic object recognition Part-based and local feature models for generic object recognition May 28 th, 2015 Yong Jae Lee UC Davis Announcements PS2 grades up on SmartSite PS2 stats: Mean: 80.15 Standard Dev: 22.77 Vote on piazza

More information

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011 Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition

More information

Part based models for recognition. Kristen Grauman

Part based models for recognition. Kristen Grauman Part based models for recognition Kristen Grauman UT Austin Limitations of window-based models Not all objects are box-shaped Assuming specific 2d view of object Local components themselves do not necessarily

More information

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba Adding spatial information Forming vocabularies from pairs of nearby features doublets

More information

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION Ing. Lorenzo Seidenari e-mail: seidenari@dsi.unifi.it What is an Event? Dictionary.com definition: something that occurs in a certain place during a particular

More information

Fuzzy based Multiple Dictionary Bag of Words for Image Classification

Fuzzy based Multiple Dictionary Bag of Words for Image Classification Available online at www.sciencedirect.com Procedia Engineering 38 (2012 ) 2196 2206 International Conference on Modeling Optimisation and Computing Fuzzy based Multiple Dictionary Bag of Words for Image

More information

Lecture 12 Recognition

Lecture 12 Recognition Institute of Informatics Institute of Neuroinformatics Lecture 12 Recognition Davide Scaramuzza 1 Lab exercise today replaced by Deep Learning Tutorial Room ETH HG E 1.1 from 13:15 to 15:00 Optional lab

More information

Discriminative classifiers for image recognition

Discriminative classifiers for image recognition Discriminative classifiers for image recognition May 26 th, 2015 Yong Jae Lee UC Davis Outline Last time: window-based generic object detection basic pipeline face detection with boosting as case study

More information

Lecture 12 Recognition. Davide Scaramuzza

Lecture 12 Recognition. Davide Scaramuzza Lecture 12 Recognition Davide Scaramuzza Oral exam dates UZH January 19-20 ETH 30.01 to 9.02 2017 (schedule handled by ETH) Exam location Davide Scaramuzza s office: Andreasstrasse 15, 2.10, 8050 Zurich

More information

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao Motivation Image search Building large sets of classified images Robotics Background Object recognition is unsolved Deformable shaped

More information

Beyond Bags of Features

Beyond Bags of Features : for Recognizing Natural Scene Categories Matching and Modeling Seminar Instructed by Prof. Haim J. Wolfson School of Computer Science Tel Aviv University December 9 th, 2015

More information

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Introduction to object recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Overview Basic recognition tasks A statistical learning approach Traditional or shallow recognition

More information

Hands on Advanced Bag- of- Words Models for Visual Recogni8on

Hands on Advanced Bag- of- Words Models for Visual Recogni8on Hands on Advanced Bag- of- Words Models for Visual Recogni8on Lamberto Ballan and Lorenzo Seidenari MICC - University of Florence - The tutorial will start at 14:30 - In the meanwhile please download the

More information

Descriptors for CV. Introduc)on:

Descriptors for CV. Introduc)on: Descriptors for CV Content 2014 1.Introduction 2.Histograms 3.HOG 4.LBP 5.Haar Wavelets 6.Video based descriptor 7.How to compare descriptors 8.BoW paradigm 1 2 1 2 Color RGB histogram Introduc)on: Image

More information

Bag-of-features. Cordelia Schmid

Bag-of-features. Cordelia Schmid Bag-of-features for category classification Cordelia Schmid Visual search Particular objects and scenes, large databases Category recognition Image classification: assigning a class label to the image

More information

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce Object Recognition Computer Vision Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce How many visual object categories are there? Biederman 1987 ANIMALS PLANTS OBJECTS

More information

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking Feature descriptors Alain Pagani Prof. Didier Stricker Computer Vision: Object and People Tracking 1 Overview Previous lectures: Feature extraction Today: Gradiant/edge Points (Kanade-Tomasi + Harris)

More information

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features Preliminary Local Feature Selection by Support Vector Machine for Bag of Features Tetsu Matsukawa Koji Suzuki Takio Kurita :University of Tsukuba :National Institute of Advanced Industrial Science and

More information

Patch Descriptors. CSE 455 Linda Shapiro

Patch Descriptors. CSE 455 Linda Shapiro Patch Descriptors CSE 455 Linda Shapiro How can we find corresponding points? How can we find correspondences? How do we describe an image patch? How do we describe an image patch? Patches with similar

More information

Video annotation based on adaptive annular spatial partition scheme

Video annotation based on adaptive annular spatial partition scheme Video annotation based on adaptive annular spatial partition scheme Guiguang Ding a), Lu Zhang, and Xiaoxu Li Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory

More information

Patch Descriptors. EE/CSE 576 Linda Shapiro

Patch Descriptors. EE/CSE 576 Linda Shapiro Patch Descriptors EE/CSE 576 Linda Shapiro 1 How can we find corresponding points? How can we find correspondences? How do we describe an image patch? How do we describe an image patch? Patches with similar

More information

Beyond Bags of features Spatial information & Shape models

Beyond Bags of features Spatial information & Shape models Beyond Bags of features Spatial information & Shape models Jana Kosecka Many slides adapted from S. Lazebnik, FeiFei Li, Rob Fergus, and Antonio Torralba Detection, recognition (so far )! Bags of features

More information

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES Pin-Syuan Huang, Jing-Yi Tsai, Yu-Fang Wang, and Chun-Yi Tsai Department of Computer Science and Information Engineering, National Taitung University,

More information

Computer Vision. Exercise Session 10 Image Categorization

Computer Vision. Exercise Session 10 Image Categorization Computer Vision Exercise Session 10 Image Categorization Object Categorization Task Description Given a small number of training images of a category, recognize a-priori unknown instances of that category

More information

CS6670: Computer Vision

CS6670: Computer Vision CS6670: Computer Vision Noah Snavely Lecture 16: Bag-of-words models Object Bag of words Announcements Project 3: Eigenfaces due Wednesday, November 11 at 11:59pm solo project Final project presentations:

More information

String distance for automatic image classification

String distance for automatic image classification String distance for automatic image classification Nguyen Hong Thinh*, Le Vu Ha*, Barat Cecile** and Ducottet Christophe** *University of Engineering and Technology, Vietnam National University of HaNoi,

More information

Object Category Detection. Slides mostly from Derek Hoiem

Object Category Detection. Slides mostly from Derek Hoiem Object Category Detection Slides mostly from Derek Hoiem Today s class: Object Category Detection Overview of object category detection Statistical template matching with sliding window Part-based Models

More information

Image classification Computer Vision Spring 2018, Lecture 18

Image classification Computer Vision Spring 2018, Lecture 18 Image classification http://www.cs.cmu.edu/~16385/ 16-385 Computer Vision Spring 2018, Lecture 18 Course announcements Homework 5 has been posted and is due on April 6 th. - Dropbox link because course

More information

Object Classification Problem

Object Classification Problem HIERARCHICAL OBJECT CATEGORIZATION" Gregory Griffin and Pietro Perona. Learning and Using Taxonomies For Fast Visual Categorization. CVPR 2008 Marcin Marszalek and Cordelia Schmid. Constructing Category

More information

Three things everyone should know to improve object retrieval. Relja Arandjelović and Andrew Zisserman (CVPR 2012)

Three things everyone should know to improve object retrieval. Relja Arandjelović and Andrew Zisserman (CVPR 2012) Three things everyone should know to improve object retrieval Relja Arandjelović and Andrew Zisserman (CVPR 2012) University of Oxford 2 nd April 2012 Large scale object retrieval Find all instances of

More information

Sparse coding for image classification

Sparse coding for image classification Sparse coding for image classification Columbia University Electrical Engineering: Kun Rong(kr2496@columbia.edu) Yongzhou Xiang(yx2211@columbia.edu) Yin Cui(yc2776@columbia.edu) Outline Background Introduction

More information

Local Image Features

Local Image Features Local Image Features Ali Borji UWM Many slides from James Hayes, Derek Hoiem and Grauman&Leibe 2008 AAAI Tutorial Overview of Keypoint Matching 1. Find a set of distinctive key- points A 1 A 2 A 3 B 3

More information

Visual Object Recognition

Visual Object Recognition Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe Computer Vision Laboratory ETH Zurich Chicago, 14.07.2008 & Kristen Grauman Department

More information

Evaluation and comparison of interest points/regions

Evaluation and comparison of interest points/regions Introduction Evaluation and comparison of interest points/regions Quantitative evaluation of interest point/region detectors points / regions at the same relative location and area Repeatability rate :

More information

2D Image Processing Feature Descriptors

2D Image Processing Feature Descriptors 2D Image Processing Feature Descriptors Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de 1 Overview

More information

VK Multimedia Information Systems

VK Multimedia Information Systems VK Multimedia Information Systems Mathias Lux, mlux@itec.uni-klu.ac.at Dienstags, 16.oo Uhr This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Agenda Evaluations

More information

Detection III: Analyzing and Debugging Detection Methods

Detection III: Analyzing and Debugging Detection Methods CS 1699: Intro to Computer Vision Detection III: Analyzing and Debugging Detection Methods Prof. Adriana Kovashka University of Pittsburgh November 17, 2015 Today Review: Deformable part models How can

More information

CS229: Action Recognition in Tennis

CS229: Action Recognition in Tennis CS229: Action Recognition in Tennis Aman Sikka Stanford University Stanford, CA 94305 Rajbir Kataria Stanford University Stanford, CA 94305 asikka@stanford.edu rkataria@stanford.edu 1. Motivation As active

More information

ImageCLEF 2011

ImageCLEF 2011 SZTAKI @ ImageCLEF 2011 Bálint Daróczy joint work with András Benczúr, Róbert Pethes Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences Training/test

More information

Local Features and Bag of Words Models

Local Features and Bag of Words Models 10/14/11 Local Features and Bag of Words Models Computer Vision CS 143, Brown James Hays Slides from Svetlana Lazebnik, Derek Hoiem, Antonio Torralba, David Lowe, Fei Fei Li and others Computer Engineering

More information

Aggregating Descriptors with Local Gaussian Metrics

Aggregating Descriptors with Local Gaussian Metrics Aggregating Descriptors with Local Gaussian Metrics Hideki Nakayama Grad. School of Information Science and Technology The University of Tokyo Tokyo, JAPAN nakayama@ci.i.u-tokyo.ac.jp Abstract Recently,

More information

Bag of Words Models. CS4670 / 5670: Computer Vision Noah Snavely. Bag-of-words models 11/26/2013

Bag of Words Models. CS4670 / 5670: Computer Vision Noah Snavely. Bag-of-words models 11/26/2013 CS4670 / 5670: Computer Vision Noah Snavely Bag-of-words models Object Bag of words Bag of Words Models Adapted from slides by Rob Fergus and Svetlana Lazebnik 1 Object Bag of words Origin 1: Texture Recognition

More information

Local features and image matching. Prof. Xin Yang HUST

Local features and image matching. Prof. Xin Yang HUST Local features and image matching Prof. Xin Yang HUST Last time RANSAC for robust geometric transformation estimation Translation, Affine, Homography Image warping Given a 2D transformation T and a source

More information

Large-scale visual recognition The bag-of-words representation

Large-scale visual recognition The bag-of-words representation Large-scale visual recognition The bag-of-words representation Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline Bag-of-words Large or small vocabularies? Extensions for instance-level

More information

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Akitsugu Noguchi and Keiji Yanai Department of Computer Science, The University of Electro-Communications, 1-5-1 Chofugaoka,

More information

Learning Representations for Visual Object Class Recognition

Learning Representations for Visual Object Class Recognition Learning Representations for Visual Object Class Recognition Marcin Marszałek Cordelia Schmid Hedi Harzallah Joost van de Weijer LEAR, INRIA Grenoble, Rhône-Alpes, France October 15th, 2007 Bag-of-Features

More information

Selection of Scale-Invariant Parts for Object Class Recognition

Selection of Scale-Invariant Parts for Object Class Recognition Selection of Scale-Invariant Parts for Object Class Recognition Gy. Dorkó and C. Schmid INRIA Rhône-Alpes, GRAVIR-CNRS 655, av. de l Europe, 3833 Montbonnot, France fdorko,schmidg@inrialpes.fr Abstract

More information

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt. CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt. Section 10 - Detectors part II Descriptors Mani Golparvar-Fard Department of Civil and Environmental Engineering 3129D, Newmark Civil Engineering

More information

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Part II: Visual Features and Representations Liangliang Cao, IBM Watson Research Center Evolvement of Visual Features

More information

Feature Based Registration - Image Alignment

Feature Based Registration - Image Alignment Feature Based Registration - Image Alignment Image Registration Image registration is the process of estimating an optimal transformation between two or more images. Many slides from Alexei Efros http://graphics.cs.cmu.edu/courses/15-463/2007_fall/463.html

More information

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION Maral Mesmakhosroshahi, Joohee Kim Department of Electrical and Computer Engineering Illinois Institute

More information

Sampling Strategies for Object Classifica6on. Gautam Muralidhar

Sampling Strategies for Object Classifica6on. Gautam Muralidhar Sampling Strategies for Object Classifica6on Gautam Muralidhar Reference papers The Pyramid Match Kernel Grauman and Darrell Approximated Correspondences in High Dimensions Grauman and Darrell Video Google

More information

By Suren Manvelyan,

By Suren Manvelyan, By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan,

More information

Evaluation of GIST descriptors for web scale image search

Evaluation of GIST descriptors for web scale image search Evaluation of GIST descriptors for web scale image search Matthijs Douze Hervé Jégou, Harsimrat Sandhawalia, Laurent Amsaleg and Cordelia Schmid INRIA Grenoble, France July 9, 2009 Evaluation of GIST for

More information

Local Features: Detection, Description & Matching

Local Features: Detection, Description & Matching Local Features: Detection, Description & Matching Lecture 08 Computer Vision Material Citations Dr George Stockman Professor Emeritus, Michigan State University Dr David Lowe Professor, University of British

More information

1 Case study of SVM (Rob)

1 Case study of SVM (Rob) DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how

More information

Deformable Part Models

Deformable Part Models CS 1674: Intro to Computer Vision Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 9, 2016 Today: Object category detection Window-based approaches: Last time: Viola-Jones

More information

Basic Problem Addressed. The Approach I: Training. Main Idea. The Approach II: Testing. Why a set of vocabularies?

Basic Problem Addressed. The Approach I: Training. Main Idea. The Approach II: Testing. Why a set of vocabularies? Visual Categorization With Bags of Keypoints. ECCV,. G. Csurka, C. Bray, C. Dance, and L. Fan. Shilpa Gulati //7 Basic Problem Addressed Find a method for Generic Visual Categorization Visual Categorization:

More information

arxiv: v3 [cs.cv] 3 Oct 2012

arxiv: v3 [cs.cv] 3 Oct 2012 Combined Descriptors in Spatial Pyramid Domain for Image Classification Junlin Hu and Ping Guo arxiv:1210.0386v3 [cs.cv] 3 Oct 2012 Image Processing and Pattern Recognition Laboratory Beijing Normal University,

More information

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009 Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context

More information

Comparing Local Feature Descriptors in plsa-based Image Models

Comparing Local Feature Descriptors in plsa-based Image Models Comparing Local Feature Descriptors in plsa-based Image Models Eva Hörster 1,ThomasGreif 1, Rainer Lienhart 1, and Malcolm Slaney 2 1 Multimedia Computing Lab, University of Augsburg, Germany {hoerster,lienhart}@informatik.uni-augsburg.de

More information

Lecture 10 Detectors and descriptors

Lecture 10 Detectors and descriptors Lecture 10 Detectors and descriptors Properties of detectors Edge detectors Harris DoG Properties of detectors SIFT Shape context Silvio Savarese Lecture 10-26-Feb-14 From the 3D to 2D & vice versa P =

More information

Using Geometric Blur for Point Correspondence

Using Geometric Blur for Point Correspondence 1 Using Geometric Blur for Point Correspondence Nisarg Vyas Electrical and Computer Engineering Department, Carnegie Mellon University, Pittsburgh, PA Abstract In computer vision applications, point correspondence

More information

Large scale object/scene recognition

Large scale object/scene recognition Large scale object/scene recognition Image dataset: > 1 million images query Image search system ranked image list Each image described by approximately 2000 descriptors 2 10 9 descriptors to index! Database

More information

Indexing local features and instance recognition May 14 th, 2015

Indexing local features and instance recognition May 14 th, 2015 Indexing local features and instance recognition May 14 th, 2015 Yong Jae Lee UC Davis Announcements PS2 due Saturday 11:59 am 2 We can approximate the Laplacian with a difference of Gaussians; more efficient

More information

Visual words. Map high-dimensional descriptors to tokens/words by quantizing the feature space.

Visual words. Map high-dimensional descriptors to tokens/words by quantizing the feature space. Visual words Map high-dimensional descriptors to tokens/words by quantizing the feature space. Quantize via clustering; cluster centers are the visual words Word #2 Descriptor feature space Assign word

More information

Supervised learning. y = f(x) function

Supervised learning. y = f(x) function Supervised learning y = f(x) output prediction function Image feature Training: given a training set of labeled examples {(x 1,y 1 ),, (x N,y N )}, estimate the prediction function f by minimizing the

More information

Pattern recognition (3)

Pattern recognition (3) Pattern recognition (3) 1 Things we have discussed until now Statistical pattern recognition Building simple classifiers Supervised classification Minimum distance classifier Bayesian classifier Building

More information

Category-level localization

Category-level localization Category-level localization Cordelia Schmid Recognition Classification Object present/absent in an image Often presence of a significant amount of background clutter Localization / Detection Localize object

More information

Category vs. instance recognition

Category vs. instance recognition Category vs. instance recognition Category: Find all the people Find all the buildings Often within a single image Often sliding window Instance: Is this face James? Find this specific famous building

More information

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1 Feature Detection Raul Queiroz Feitosa 3/30/2017 Feature Detection 1 Objetive This chapter discusses the correspondence problem and presents approaches to solve it. 3/30/2017 Feature Detection 2 Outline

More information

HISTOGRAMS OF ORIENTATIO N GRADIENTS

HISTOGRAMS OF ORIENTATIO N GRADIENTS HISTOGRAMS OF ORIENTATIO N GRADIENTS Histograms of Orientation Gradients Objective: object recognition Basic idea Local shape information often well described by the distribution of intensity gradients

More information

Tensor Decomposition of Dense SIFT Descriptors in Object Recognition

Tensor Decomposition of Dense SIFT Descriptors in Object Recognition Tensor Decomposition of Dense SIFT Descriptors in Object Recognition Tan Vo 1 and Dat Tran 1 and Wanli Ma 1 1- Faculty of Education, Science, Technology and Mathematics University of Canberra, Australia

More information

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy BSB663 Image Processing Pinar Duygulu Slides are adapted from Selim Aksoy Image matching Image matching is a fundamental aspect of many problems in computer vision. Object or scene recognition Solving

More information

SIFT - scale-invariant feature transform Konrad Schindler

SIFT - scale-invariant feature transform Konrad Schindler SIFT - scale-invariant feature transform Konrad Schindler Institute of Geodesy and Photogrammetry Invariant interest points Goal match points between images with very different scale, orientation, projective

More information

Large Scale Image Retrieval

Large Scale Image Retrieval Large Scale Image Retrieval Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University in Prague Features Affine invariant features Efficient descriptors Corresponding regions

More information

Artistic ideation based on computer vision methods

Artistic ideation based on computer vision methods Journal of Theoretical and Applied Computer Science Vol. 6, No. 2, 2012, pp. 72 78 ISSN 2299-2634 http://www.jtacs.org Artistic ideation based on computer vision methods Ferran Reverter, Pilar Rosado,

More information

Local features: detection and description May 12 th, 2015

Local features: detection and description May 12 th, 2015 Local features: detection and description May 12 th, 2015 Yong Jae Lee UC Davis Announcements PS1 grades up on SmartSite PS1 stats: Mean: 83.26 Standard Dev: 28.51 PS2 deadline extended to Saturday, 11:59

More information

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing CS 4495 Computer Vision Features 2 SIFT descriptor Aaron Bobick School of Interactive Computing Administrivia PS 3: Out due Oct 6 th. Features recap: Goal is to find corresponding locations in two images.

More information

Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study

Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study J. Zhang 1 M. Marszałek 1 S. Lazebnik 2 C. Schmid 1 1 INRIA Rhône-Alpes, LEAR - GRAVIR Montbonnot, France

More information

Instance-level recognition II.

Instance-level recognition II. Reconnaissance d objets et vision artificielle 2010 Instance-level recognition II. Josef Sivic http://www.di.ens.fr/~josef INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire d Informatique, Ecole Normale

More information

Action recognition in videos

Action recognition in videos Action recognition in videos Cordelia Schmid INRIA Grenoble Joint work with V. Ferrari, A. Gaidon, Z. Harchaoui, A. Klaeser, A. Prest, H. Wang Action recognition - goal Short actions, i.e. drinking, sit

More information

Indexing local features and instance recognition May 16 th, 2017

Indexing local features and instance recognition May 16 th, 2017 Indexing local features and instance recognition May 16 th, 2017 Yong Jae Lee UC Davis Announcements PS2 due next Monday 11:59 am 2 Recap: Features and filters Transforming and describing images; textures,

More information

Video Google: A Text Retrieval Approach to Object Matching in Videos

Video Google: A Text Retrieval Approach to Object Matching in Videos Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic, Frederik Schaffalitzky, Andrew Zisserman Visual Geometry Group University of Oxford The vision Enable video, e.g. a feature

More information

Fitting: The Hough transform

Fitting: The Hough transform Fitting: The Hough transform Voting schemes Let each feature vote for all the models that are compatible with it Hopefully the noise features will not vote consistently for any single model Missing data

More information

Local Image Features

Local Image Features Local Image Features Computer Vision CS 143, Brown Read Szeliski 4.1 James Hays Acknowledgment: Many slides from Derek Hoiem and Grauman&Leibe 2008 AAAI Tutorial This section: correspondence and alignment

More information

Recognition. Topics that we will try to cover:

Recognition. Topics that we will try to cover: Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) Object classification (we did this one already) Neural Networks Object class detection Hough-voting techniques

More information

Lecture 16: Object recognition: Part-based generative models

Lecture 16: Object recognition: Part-based generative models Lecture 16: Object recognition: Part-based generative models Professor Stanford Vision Lab 1 What we will learn today? Introduction Constellation model Weakly supervised training One-shot learning (Problem

More information

Motion illusion, rotating snakes

Motion illusion, rotating snakes Motion illusion, rotating snakes Local features: main components 1) Detection: Find a set of distinctive key points. 2) Description: Extract feature descriptor around each interest point as vector. x 1

More information

Local Feature Detectors

Local Feature Detectors Local Feature Detectors Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Slides adapted from Cordelia Schmid and David Lowe, CVPR 2003 Tutorial, Matthew Brown,

More information

Local Features based Object Categories and Object Instances Recognition

Local Features based Object Categories and Object Instances Recognition Local Features based Object Categories and Object Instances Recognition Eric Nowak Ph.D. thesis defense 17th of March, 2008 1 Thesis in Computer Vision Computer vision is the science and technology of

More information

SURF. Lecture6: SURF and HOG. Integral Image. Feature Evaluation with Integral Image

SURF. Lecture6: SURF and HOG. Integral Image. Feature Evaluation with Integral Image SURF CSED441:Introduction to Computer Vision (2015S) Lecture6: SURF and HOG Bohyung Han CSE, POSTECH bhhan@postech.ac.kr Speed Up Robust Features (SURF) Simplified version of SIFT Faster computation but

More information

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213) Recognition of Animal Skin Texture Attributes in the Wild Amey Dharwadker (aap2174) Kai Zhang (kz2213) Motivation Patterns and textures are have an important role in object description and understanding

More information

CS 4495 Computer Vision Classification 3: Bag of Words. Aaron Bobick School of Interactive Computing

CS 4495 Computer Vision Classification 3: Bag of Words. Aaron Bobick School of Interactive Computing CS 4495 Computer Vision Classification 3: Bag of Words Aaron Bobick School of Interactive Computing Administrivia PS 6 is out. Due Tues Nov 25th, 11:55pm. One more assignment after that Mea culpa This

More information

Improved Spatial Pyramid Matching for Image Classification

Improved Spatial Pyramid Matching for Image Classification Improved Spatial Pyramid Matching for Image Classification Mohammad Shahiduzzaman, Dengsheng Zhang, and Guojun Lu Gippsland School of IT, Monash University, Australia {Shahid.Zaman,Dengsheng.Zhang,Guojun.Lu}@monash.edu

More information

Local features: detection and description. Local invariant features

Local features: detection and description. Local invariant features Local features: detection and description Local invariant features Detection of interest points Harris corner detection Scale invariant blob detection: LoG Description of local patches SIFT : Histograms

More information

Perception IV: Place Recognition, Line Extraction

Perception IV: Place Recognition, Line Extraction Perception IV: Place Recognition, Line Extraction Davide Scaramuzza University of Zurich Margarita Chli, Paul Furgale, Marco Hutter, Roland Siegwart 1 Outline of Today s lecture Place recognition using

More information

Combining Selective Search Segmentation and Random Forest for Image Classification

Combining Selective Search Segmentation and Random Forest for Image Classification Combining Selective Search Segmentation and Random Forest for Image Classification Gediminas Bertasius November 24, 2013 1 Problem Statement Random Forest algorithm have been successfully used in many

More information