OBJECT CATEGORIZATION
|
|
- Fay Richardson
- 5 years ago
- Views:
Transcription
1 OBJECT CATEGORIZATION Ing. Lorenzo Seidenari Slides: Ing. Lamberto Ballan November 18th, 2009
2 What is an Object? Merriam-Webster Definition: Something material that may be perceived by the senses. You already know two tasks about objects: Single object recognition: find that logo Object categorization: find a face
3 Why do we care about categorization? Perception of function: We can perceive the 3D shape, texture, material properties, without knowing about objects. But, the concept of category encapsulates also information about what can we do with those objects. We therefore include the perception of function as a proper indeed, crucial- subject for vision science, from Vision Science, chapter 9, Palmer. ICCV09 Short Course: Fei Fei, Torralba, Fergus
4 The perception of function Direct perception (affordances): Flat surface Horizontal Knee-high Sittable upon affordance: quality of an object that allows to perform an action (Gibson 1977). Mediated perception (Categorization) Flat surface Horizontal Knee-high Chair Sittable upon Chair Chair Chair? ICCV09 Short Course: Fei Fei, Torralba, Fergus
5 Direct perception Some aspects of an object function can be perceived directly Functional form: Some forms clearly indicate to a function ( sittable-upon, container, cutting device, ) Sittable-upon Sittable-upon It does not seem easy to sit-upon this Sittable-upon ICCV09 Short Course: Fei Fei, Torralba, Fergus
6 Text Indexing and Categorization Text categorization: the task is to assign a document to one or more categories based on its content is it something about medicine/biology? is it a document about business? Why is it useful? Detecting and indexing similar text/documents in large corpora Clustering document by topic Extracting mid/high level concepts from documents The Bag of Words (BoW) model, combined with advanced classification techniques, is able to perform state-of-the-art results A text - such a sentence or a document - is represented as an unordered collection of words, disregarding grammar and even word order; Three elements: i) a vocabulary, ii) an histogram representation of documents, iii) a classification method
7 Same approach usable with visual data An image can be treated as a document, and features extracted from the image are considered as the "visual words"... image of an object category bag of visual words D1: face D2: bike D3: violin Bag of (visual) Words: an image is represented as an unordered collection of visual words Vocabulary (codewords)
8 Why is it useful? Object recognition and categorization Bag of Visual Words have been successfully used to object categorization in images (e.g. faces, car, airplanes...) and, more recently, for action recognition in video sequences (e.g. running, walking, clapping...) Aim: find (annotate) objects in this photo (a very optimistic result...)
9 Three stages 1. Codebook (vocabulary) formation and feature assignment Given a training set, local descriptors (e.g. SIFT) are collected and a clustering algorithm is used to perform a quantization of the feature space Then, each cluster s center is used as an iconic word and local descriptors are assigned to the nearest word using an appropriate distance (e.g. Euclidean); the result is a Bag-of-Words representation 2. Train a classifier to discriminate vectors corresponding to positive and negative training images Usually Support Vector Machines (SVM) are used as classifiers 3. Apply the trained classifiers to the test image Note: the approach is the same... but the first stage - codebook formation and feature assignment - is really challenging because visual words have to be defined in advance using a clustering algorithm (e.g. k-means)
10 Note: we have to train a classifier (detector) for each object class... Training images Test images Courtesy A. Zisserman
11 Feature detection Given an image, feature detection is the process of extracting local patches (regions) There are several methods: Random sampling Regular grid (dense sampling): the image is segmented by some horizontal and vertical lines It shows very good results for natural scene categorization (Fei-Fei and Perona, CVPR 2005) Interest Points (sparse sampling): local patches are detected by interest point detectors that are able to select salient regions (such as edges, corners, blobs); several different techniques (Mikolajczyk et al., IJCV 2005) Harris corner detector Difference of Gaussian (DoG); it is the SIFT detector (Lowe, IJCV 2004) Affine covariant patches
12 Regular grid It is probably the most simple method for feature detection An evenly sampled grid spaced at given values (e.g. 10x10 pixels) for a given image Despite of its simplicity, it provides good results for textures and natural scenes because it is able to describe more regions respect to interest points techniques
13 Interest points Local patches are detected at most salient regions (such as the regions attracting human attention) It use more information about the image itself respect to random or grid sampling An example of local patches detected using affine covariant features
14 Comparison: dense (grid) sampling vs sparse (interest points) sampling Dense sampling Advantage: it is able to describe the global content of an image Disadvantage: it uses little information of an image itself It has been used successfully for textures and natural scenes categorization Sparse sampling Advantage: it is able to detect ( select ) salient regions that are related to the more attractive and informative regions Disadvantage: depending on the interest points technique and the type/resolution of the image, sometimes only few regions are detected It has been used for specific object recognition and categorization (better for describing background/foreground)
15 Note: this is the basis of the popular SIFT, HOG, (Generalized)Shape Context methods Feature representation Local features are represented by local descriptors Several different information can be used but, usually, edge or gradient orientation histograms are the most common choice Common framework: Divide local region into spatial cells Calculate orientation of image gradient at each pixel Pool quantized orientations over each cell: i) descriptor contains an orientation histogram for each cell, ii) weight votes by gradient magnitude
16 SIFT descriptors However, the most common choice is the SIFT descriptor (because it exhibits the highest matching accuracies) Standard SIFT is computed as a set of orientation histograms on 4x4 pixel neighborhoods (contribution of each pixel is weighted by the gradient magnitude and by a gaussian equal to 1.5 times the scale of the keypoint) histograms contains 8 bins each (corresponding to 8 orientations) each descriptor contains a 4x4 array of 16 histograms around the keypoint this leads to a SIFT descriptor with (4x4x8) 128 elements Fig: standard SIFT descriptor
17 Combined feature descriptors Local descriptors (like SIFT) are usually based only on luminance and shape, so they use grey-scale values and ignore color it is very difficult to select a color model that it sufficiently robust and general nevertheless, color is very important to describe/distinguish objects or scenes Different types of descriptor can be combined to improve representation; the most common combination is between a local shape-descriptor (e.g. SIFT) and a color descriptor (e.g. color histogram in a smart color space like Luv or HSV) Figure: an example of color-sift descriptor (van de Weijer and Schmid, ECCV 2006). The combined descriptor is obtained by fusion of standard SIFT and a Hue descriptor calculated in a Color Invariant Space. Courtesy J. van de Weijer
18 Codebook formation The Bag-of-Words model is built through the creation of a discrete visual vocabulary (codebook) A vocabulary in the object/scene classification domain is commonly obtained by following one of two approaches: Annotation approach Data-driven approach Annotation approach: A vocabulary is obtained by assigning meaningful labels to image patches (e.g. sky, water, vegetation, etc.) Data-driven approach: It is required to perform a vector quantization for large sets of feature-vectors (usually in a high-dimensional space) This is performed by clustering of feature vectors
19 Data-driven approach Visual words are defined by clustering of feature vectors. An example:
20 The performance of this approach depends on the quantization method and on the number of words that are selected The most common quantization approach is the use of k-means clustering: the main reasons are its simplicity and convergence speed Examples of visual words: Courtesy A. Zisserman
21 k-means clustering It is an algorithm to cluster n objects, based on their feature-vector representation, into k<n partitions The objective it tries to achieve is to minimize global intra-cluster variance, or the squared error function: where k is the number of clusters, Si (i=1,...,k) are cluster partitions, and μi is the centroid (or mean point) of all the points xj Si The most common form of k-means is the Lloyd s algorithm: - They are often used as synonymously, but in reality Lloyd s method is an heuristic for solving the k-means problem - Other variations exist but it has remained popular because it converges extremely rapidly in practice
22 k-means clustering: Lloyd s algorithm Lloyd s algorithm is an heuristic iterative solution for the k-means problem 1. It starts by partitioning the n input points into k initial sets, either at random or using some heuristic data 2. It then calculates the centroid μi of each set Si (with i=1,...,k) 3. It constructs a new partition by associating each point with the closest centroid 4. Finally, the centroids are recalculated for the new clusters, and algorithm repeated by alternate application of these two steps until convergence (which is obtained when i) the points no longer switch cluster or ii) centroids are no longer changed) 1,2) initial random centroids 3) new partition by associating points to nearest centroid 4) centroids are moved to the center of their clusters convergence
23 k-means disadvantages Despite of its popularity, the use of k-means clustering for codebook formation is not the optimal solution It has some main disadvantages: 1. the number of visual words has to be known in advance 2. the clustering is not very robust w.r.t. outliers 3. cluster centers are attracted by the denser regions of the sample distribution, thus providing a more imprecise quantization for the vectors laying in these regions
24 This effect (3), due to the assumption of uniform distribution of the features in the descriptor space, is even more pronounced in high dimensional spaces A representation of this effect can be obtained visualizing a Voronoi tessellation of the feature space: k-means (Voronoi tesselletion) Detail of a dense region that has been split in 4 clusters Note: Voronoi cells do not uniformly cover the feature space...
25 Radius-based clustering Given n vectors, the algorithm starts with an uniform random subsampling s of the original dataset (thus s n) For each xi s (grey circles in fig.), a mean-shift procedure is initialized mean-shift is a procedure for locating the modes of a samples distribution (in other words, it is able to find the densest regions of the distribution) Given a radius R, mean-shift clustering on s is used to find the modes A new cluster center is then allocated on the mode corresponding to the maximal density region All vectors on the original set n within a distance < R from the center are labeled as members of this clusters and eliminated for the following iterations it prevents the algorithm from repeatedly assigning centers to the same high-density region It can be stopped when a sufficient number of clusters (words) has been identified R
26 In this way, cluster centers are allocated more uniformly A representation of this effect can be obtained visualizing a Voronoi tessellation of the feature space (it is compared to k-means) k-means clustering (Voronoi tesselletion) radius-based clustering (Voronoi tesselletion) Note: this dense region, that has been split in 4 clusters using k-means, now is correctly coded by radius-based clustering
27 Meanshift Meanshift estimator find distribution modes non parametrically Radius-based clustering 1. Subsampling 2. Use a meanshift estimator for each point 3. Densest mode M is found. 4. Each point at distance r from M. 5. Assigned points are removed. 6. Stop if enough cluster are found or no more clusters are found..
28 Feature assignment Represent an image as a histogram of visual word frequencies Given the codebook generated in the training stages, each region extracted from the test image has to be assigned to the corresponding visual word usually regions are represented by SIFT descriptors usually these features are hard-assigned to the nearest word (in terms of Euclidean distance) Feature (hard) assignment Feature detection Feature representation BoW model: histogram of visual words Courtesy A. Zisserman
29 Drawbacks: this hard assignment, that takes account only of the closest codeword, lacks to consider two issues: codeword uncertainty: i.e. the problem of selecting the correct codeword when two or more candidates are relevant codeword plausibility: i.e. the problem of selecting the correct codeword when all codewords are too far and no representative Feature assignment Figure: the small blue dots are image features the labeled red circles are codewords the yellow triangle represents an image feature that is correctly assigned to codeword b the green square is a example of the codeword uncertainty problem the light-blue diamond is an example of the codeword plausibility problem
30 A possible solution: a soft-assignment mechanism that is able to consider the information or two (or more) relevant candidates Recently, solutions based on kernel density estimation have been applied to feature assignment in codebook models (van Gemert et al., ECCV 2008, Philbin et al., CVPR 2008) In this way, the word frequency histogram is calculated by smoothing the hard assignment of features to the codeword vocabulary Hard-assignment Soft-assignment (Gaussian kernel)
31 Word frequency distributions In text classification, the problem of selecting a good vocabulary is only related to feature selection and vocabulary size Feature selection: pick only those terms that are really discriminant (e.g. using Mutual Information or Chi-Square statistics) Stop-words removal (the most frequent words like the, of, an, etc.) Stemming (the process for reducing inflected/derived words to their stem, base or root form) Given a natural language textual corpus, the words frequency distribution follows the well-known Zipf s law stop words Zipf's law: states that given a corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table (an ideal Zipf s distribution must be a straight line in log-log scale) most useful words
32 Visual words statistics Zipf s law is one of the basic assumption in text categorization according to this empirical evidence, we can consider words at intermediate frequencies as the most informative for classification Therefore, it is interesting to see how the visual words are distributed in a visual corpus in particular we want to know whether their distribution satisfies Zipf s law how their statistics is related to i) feature detection and ii) quantization techniques (see references [2,3] for more details)
33 An example: the distribution of visual words frequency using k-means and radius-based quantization Note: results are related to action categorization on KTH dataset
34 Vocabulary size Unlike the vocabulary of a text corpus whose size is relatively fixed, the size of a visual-words vocabulary is controlled by the number of clusters Choosing the right vocabulary size involves the trade-off between discriminativity and generalizability with small vocabularies, the visual word is not very discriminative because dissimilar features can map to the same codeword as the vocabulary size increases, the feature becomes more discriminative but meanwhile less generalizable (similar features can map to different codewords) There is no consensus as to the appropriate size of a visual vocabulary it can varies from several hundreds, to thousands and ten of thausands however, it is closely related to the dataset (e.g. image resolution) and the feature detection process usually the optimal size is fixed by experiments (see reference [3] for more details)
35 Usually in text categorization the vocabulary size is reduced, keeping only the most informative terms, using feature selection methods several methods: best results are obtained using Chi-Square statistic (CHImax) and Mutual information (IG) in text corpus, a good feature selection method is able to improve classification performances by reducing vocabulary size... is it the same for a visual corpus? text categorization object categorization
36 Classification Many different approaches; state-of-the-art results using BoW models are obtained by Support Vector Machines (SVM) classifiers An SVM classifier will construct a separating hyperplane in that space, one which maximizes the margin between the two data sets SVM is a binary classifier but, usually, in the visual domain it s extended to multi-class problems the original algorithm can be adapted to non-linear classification problems using the kernel trick method support vector Optimization problem: margin support vector (C is a regularization error-term usually fixed by cross-validation on the training set)
37 Non-linear classification: image features are described by high-dimensional feature vectors therefore, data are usually (always) not-linearly separable... linear kernel is sufficient linearly separable data not-linearly separable data use non-linear kernel (e.g. RBF) Optimal kernel choice: stata-of-the-art results are performed by Gaussian kernel using Chi-square as distance between histograms kernel: Chi-square distance:
38 A classification example (on two different datasets) results are closely related to the vocabulary size for big vocabularies linear kernel are the best choice (also, obviously, for computational costs)
39 Weakness of the BoW model No rigorous geometric information of the objects components It s intuitive that objects are made of parts... and relations between parts are really informative! An example ( face detection): Note: All have equal probability for BoW models Not extensively tested yet for viewpoint and scale invariance Segmentation and localization unclear These methods can suffer from poor recall
40 Applications Nowadays, BoW models have been successfully applied to the visual domain by several research groups; it is probably the most popular approach for large-scale categorization problems Object, Scene and Texture categorization in images Event and Action categorization in videos
41 References (1) J. Sivic and A. Zisserman. Video google: a text retrieval approach to object matching in videos. In: Proc. of ICCV, (2) F. Jurie and B. Triggs. Creating efficient codebooks for visual recognition. In: Proc. of ICCV, (3) J. Yang, Y.-G. Jiang, A. G. Hauptmann and C.-W. Ngo. Evaluating bag-of-visualwords representation in scene classification. In: Proc. of MIR, (4) L. Fei-Fei, R. Fergus, A. Torralba. Recognizing and learning object categories. CVPR 2007 short course (Slides, Matlab code, Datasets), URL:
BoW model. Textual data: Bag of Words model
BoW model Textual data: Bag of Words model With text, categoriza9on is the task of assigning a document to one or more categories based on its content. It is appropriate for: Detec9ng and indexing similar
More informationPart-based and local feature models for generic object recognition
Part-based and local feature models for generic object recognition May 28 th, 2015 Yong Jae Lee UC Davis Announcements PS2 grades up on SmartSite PS2 stats: Mean: 80.15 Standard Dev: 22.77 Vote on piazza
More informationPreviously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011
Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition
More informationPart based models for recognition. Kristen Grauman
Part based models for recognition Kristen Grauman UT Austin Limitations of window-based models Not all objects are box-shaped Assuming specific 2d view of object Local components themselves do not necessarily
More informationBeyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba Adding spatial information Forming vocabularies from pairs of nearby features doublets
More informationEVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari
EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION Ing. Lorenzo Seidenari e-mail: seidenari@dsi.unifi.it What is an Event? Dictionary.com definition: something that occurs in a certain place during a particular
More informationFuzzy based Multiple Dictionary Bag of Words for Image Classification
Available online at www.sciencedirect.com Procedia Engineering 38 (2012 ) 2196 2206 International Conference on Modeling Optimisation and Computing Fuzzy based Multiple Dictionary Bag of Words for Image
More informationLecture 12 Recognition
Institute of Informatics Institute of Neuroinformatics Lecture 12 Recognition Davide Scaramuzza 1 Lab exercise today replaced by Deep Learning Tutorial Room ETH HG E 1.1 from 13:15 to 15:00 Optional lab
More informationDiscriminative classifiers for image recognition
Discriminative classifiers for image recognition May 26 th, 2015 Yong Jae Lee UC Davis Outline Last time: window-based generic object detection basic pipeline face detection with boosting as case study
More informationLecture 12 Recognition. Davide Scaramuzza
Lecture 12 Recognition Davide Scaramuzza Oral exam dates UZH January 19-20 ETH 30.01 to 9.02 2017 (schedule handled by ETH) Exam location Davide Scaramuzza s office: Andreasstrasse 15, 2.10, 8050 Zurich
More informationClassifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao
Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao Motivation Image search Building large sets of classified images Robotics Background Object recognition is unsolved Deformable shaped
More informationBeyond Bags of Features
: for Recognizing Natural Scene Categories Matching and Modeling Seminar Instructed by Prof. Haim J. Wolfson School of Computer Science Tel Aviv University December 9 th, 2015
More informationIntroduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others
Introduction to object recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Overview Basic recognition tasks A statistical learning approach Traditional or shallow recognition
More informationHands on Advanced Bag- of- Words Models for Visual Recogni8on
Hands on Advanced Bag- of- Words Models for Visual Recogni8on Lamberto Ballan and Lorenzo Seidenari MICC - University of Florence - The tutorial will start at 14:30 - In the meanwhile please download the
More informationDescriptors for CV. Introduc)on:
Descriptors for CV Content 2014 1.Introduction 2.Histograms 3.HOG 4.LBP 5.Haar Wavelets 6.Video based descriptor 7.How to compare descriptors 8.BoW paradigm 1 2 1 2 Color RGB histogram Introduc)on: Image
More informationBag-of-features. Cordelia Schmid
Bag-of-features for category classification Cordelia Schmid Visual search Particular objects and scenes, large databases Category recognition Image classification: assigning a class label to the image
More informationObject Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce
Object Recognition Computer Vision Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce How many visual object categories are there? Biederman 1987 ANIMALS PLANTS OBJECTS
More informationFeature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking
Feature descriptors Alain Pagani Prof. Didier Stricker Computer Vision: Object and People Tracking 1 Overview Previous lectures: Feature extraction Today: Gradiant/edge Points (Kanade-Tomasi + Harris)
More informationPreliminary Local Feature Selection by Support Vector Machine for Bag of Features
Preliminary Local Feature Selection by Support Vector Machine for Bag of Features Tetsu Matsukawa Koji Suzuki Takio Kurita :University of Tsukuba :National Institute of Advanced Industrial Science and
More informationPatch Descriptors. CSE 455 Linda Shapiro
Patch Descriptors CSE 455 Linda Shapiro How can we find corresponding points? How can we find correspondences? How do we describe an image patch? How do we describe an image patch? Patches with similar
More informationVideo annotation based on adaptive annular spatial partition scheme
Video annotation based on adaptive annular spatial partition scheme Guiguang Ding a), Lu Zhang, and Xiaoxu Li Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory
More informationPatch Descriptors. EE/CSE 576 Linda Shapiro
Patch Descriptors EE/CSE 576 Linda Shapiro 1 How can we find corresponding points? How can we find correspondences? How do we describe an image patch? How do we describe an image patch? Patches with similar
More informationBeyond Bags of features Spatial information & Shape models
Beyond Bags of features Spatial information & Shape models Jana Kosecka Many slides adapted from S. Lazebnik, FeiFei Li, Rob Fergus, and Antonio Torralba Detection, recognition (so far )! Bags of features
More informationIMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES
IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES Pin-Syuan Huang, Jing-Yi Tsai, Yu-Fang Wang, and Chun-Yi Tsai Department of Computer Science and Information Engineering, National Taitung University,
More informationComputer Vision. Exercise Session 10 Image Categorization
Computer Vision Exercise Session 10 Image Categorization Object Categorization Task Description Given a small number of training images of a category, recognize a-priori unknown instances of that category
More informationCS6670: Computer Vision
CS6670: Computer Vision Noah Snavely Lecture 16: Bag-of-words models Object Bag of words Announcements Project 3: Eigenfaces due Wednesday, November 11 at 11:59pm solo project Final project presentations:
More informationString distance for automatic image classification
String distance for automatic image classification Nguyen Hong Thinh*, Le Vu Ha*, Barat Cecile** and Ducottet Christophe** *University of Engineering and Technology, Vietnam National University of HaNoi,
More informationObject Category Detection. Slides mostly from Derek Hoiem
Object Category Detection Slides mostly from Derek Hoiem Today s class: Object Category Detection Overview of object category detection Statistical template matching with sliding window Part-based Models
More informationImage classification Computer Vision Spring 2018, Lecture 18
Image classification http://www.cs.cmu.edu/~16385/ 16-385 Computer Vision Spring 2018, Lecture 18 Course announcements Homework 5 has been posted and is due on April 6 th. - Dropbox link because course
More informationObject Classification Problem
HIERARCHICAL OBJECT CATEGORIZATION" Gregory Griffin and Pietro Perona. Learning and Using Taxonomies For Fast Visual Categorization. CVPR 2008 Marcin Marszalek and Cordelia Schmid. Constructing Category
More informationThree things everyone should know to improve object retrieval. Relja Arandjelović and Andrew Zisserman (CVPR 2012)
Three things everyone should know to improve object retrieval Relja Arandjelović and Andrew Zisserman (CVPR 2012) University of Oxford 2 nd April 2012 Large scale object retrieval Find all instances of
More informationSparse coding for image classification
Sparse coding for image classification Columbia University Electrical Engineering: Kun Rong(kr2496@columbia.edu) Yongzhou Xiang(yx2211@columbia.edu) Yin Cui(yc2776@columbia.edu) Outline Background Introduction
More informationLocal Image Features
Local Image Features Ali Borji UWM Many slides from James Hayes, Derek Hoiem and Grauman&Leibe 2008 AAAI Tutorial Overview of Keypoint Matching 1. Find a set of distinctive key- points A 1 A 2 A 3 B 3
More informationVisual Object Recognition
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe Computer Vision Laboratory ETH Zurich Chicago, 14.07.2008 & Kristen Grauman Department
More informationEvaluation and comparison of interest points/regions
Introduction Evaluation and comparison of interest points/regions Quantitative evaluation of interest point/region detectors points / regions at the same relative location and area Repeatability rate :
More information2D Image Processing Feature Descriptors
2D Image Processing Feature Descriptors Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de 1 Overview
More informationVK Multimedia Information Systems
VK Multimedia Information Systems Mathias Lux, mlux@itec.uni-klu.ac.at Dienstags, 16.oo Uhr This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Agenda Evaluations
More informationDetection III: Analyzing and Debugging Detection Methods
CS 1699: Intro to Computer Vision Detection III: Analyzing and Debugging Detection Methods Prof. Adriana Kovashka University of Pittsburgh November 17, 2015 Today Review: Deformable part models How can
More informationCS229: Action Recognition in Tennis
CS229: Action Recognition in Tennis Aman Sikka Stanford University Stanford, CA 94305 Rajbir Kataria Stanford University Stanford, CA 94305 asikka@stanford.edu rkataria@stanford.edu 1. Motivation As active
More informationImageCLEF 2011
SZTAKI @ ImageCLEF 2011 Bálint Daróczy joint work with András Benczúr, Róbert Pethes Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences Training/test
More informationLocal Features and Bag of Words Models
10/14/11 Local Features and Bag of Words Models Computer Vision CS 143, Brown James Hays Slides from Svetlana Lazebnik, Derek Hoiem, Antonio Torralba, David Lowe, Fei Fei Li and others Computer Engineering
More informationAggregating Descriptors with Local Gaussian Metrics
Aggregating Descriptors with Local Gaussian Metrics Hideki Nakayama Grad. School of Information Science and Technology The University of Tokyo Tokyo, JAPAN nakayama@ci.i.u-tokyo.ac.jp Abstract Recently,
More informationBag of Words Models. CS4670 / 5670: Computer Vision Noah Snavely. Bag-of-words models 11/26/2013
CS4670 / 5670: Computer Vision Noah Snavely Bag-of-words models Object Bag of words Bag of Words Models Adapted from slides by Rob Fergus and Svetlana Lazebnik 1 Object Bag of words Origin 1: Texture Recognition
More informationLocal features and image matching. Prof. Xin Yang HUST
Local features and image matching Prof. Xin Yang HUST Last time RANSAC for robust geometric transformation estimation Translation, Affine, Homography Image warping Given a 2D transformation T and a source
More informationLarge-scale visual recognition The bag-of-words representation
Large-scale visual recognition The bag-of-words representation Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline Bag-of-words Large or small vocabularies? Extensions for instance-level
More informationExtracting Spatio-temporal Local Features Considering Consecutiveness of Motions
Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Akitsugu Noguchi and Keiji Yanai Department of Computer Science, The University of Electro-Communications, 1-5-1 Chofugaoka,
More informationLearning Representations for Visual Object Class Recognition
Learning Representations for Visual Object Class Recognition Marcin Marszałek Cordelia Schmid Hedi Harzallah Joost van de Weijer LEAR, INRIA Grenoble, Rhône-Alpes, France October 15th, 2007 Bag-of-Features
More informationSelection of Scale-Invariant Parts for Object Class Recognition
Selection of Scale-Invariant Parts for Object Class Recognition Gy. Dorkó and C. Schmid INRIA Rhône-Alpes, GRAVIR-CNRS 655, av. de l Europe, 3833 Montbonnot, France fdorko,schmidg@inrialpes.fr Abstract
More informationCEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.
CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt. Section 10 - Detectors part II Descriptors Mani Golparvar-Fard Department of Civil and Environmental Engineering 3129D, Newmark Civil Engineering
More informationLearning Visual Semantics: Models, Massive Computation, and Innovative Applications
Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Part II: Visual Features and Representations Liangliang Cao, IBM Watson Research Center Evolvement of Visual Features
More informationFeature Based Registration - Image Alignment
Feature Based Registration - Image Alignment Image Registration Image registration is the process of estimating an optimal transformation between two or more images. Many slides from Alexei Efros http://graphics.cs.cmu.edu/courses/15-463/2007_fall/463.html
More informationIMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim
IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION Maral Mesmakhosroshahi, Joohee Kim Department of Electrical and Computer Engineering Illinois Institute
More informationSampling Strategies for Object Classifica6on. Gautam Muralidhar
Sampling Strategies for Object Classifica6on Gautam Muralidhar Reference papers The Pyramid Match Kernel Grauman and Darrell Approximated Correspondences in High Dimensions Grauman and Darrell Video Google
More informationBy Suren Manvelyan,
By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan,
More informationEvaluation of GIST descriptors for web scale image search
Evaluation of GIST descriptors for web scale image search Matthijs Douze Hervé Jégou, Harsimrat Sandhawalia, Laurent Amsaleg and Cordelia Schmid INRIA Grenoble, France July 9, 2009 Evaluation of GIST for
More informationLocal Features: Detection, Description & Matching
Local Features: Detection, Description & Matching Lecture 08 Computer Vision Material Citations Dr George Stockman Professor Emeritus, Michigan State University Dr David Lowe Professor, University of British
More information1 Case study of SVM (Rob)
DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how
More informationDeformable Part Models
CS 1674: Intro to Computer Vision Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 9, 2016 Today: Object category detection Window-based approaches: Last time: Viola-Jones
More informationBasic Problem Addressed. The Approach I: Training. Main Idea. The Approach II: Testing. Why a set of vocabularies?
Visual Categorization With Bags of Keypoints. ECCV,. G. Csurka, C. Bray, C. Dance, and L. Fan. Shilpa Gulati //7 Basic Problem Addressed Find a method for Generic Visual Categorization Visual Categorization:
More informationarxiv: v3 [cs.cv] 3 Oct 2012
Combined Descriptors in Spatial Pyramid Domain for Image Classification Junlin Hu and Ping Guo arxiv:1210.0386v3 [cs.cv] 3 Oct 2012 Image Processing and Pattern Recognition Laboratory Beijing Normal University,
More informationAnalysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009
Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context
More informationComparing Local Feature Descriptors in plsa-based Image Models
Comparing Local Feature Descriptors in plsa-based Image Models Eva Hörster 1,ThomasGreif 1, Rainer Lienhart 1, and Malcolm Slaney 2 1 Multimedia Computing Lab, University of Augsburg, Germany {hoerster,lienhart}@informatik.uni-augsburg.de
More informationLecture 10 Detectors and descriptors
Lecture 10 Detectors and descriptors Properties of detectors Edge detectors Harris DoG Properties of detectors SIFT Shape context Silvio Savarese Lecture 10-26-Feb-14 From the 3D to 2D & vice versa P =
More informationUsing Geometric Blur for Point Correspondence
1 Using Geometric Blur for Point Correspondence Nisarg Vyas Electrical and Computer Engineering Department, Carnegie Mellon University, Pittsburgh, PA Abstract In computer vision applications, point correspondence
More informationLarge scale object/scene recognition
Large scale object/scene recognition Image dataset: > 1 million images query Image search system ranked image list Each image described by approximately 2000 descriptors 2 10 9 descriptors to index! Database
More informationIndexing local features and instance recognition May 14 th, 2015
Indexing local features and instance recognition May 14 th, 2015 Yong Jae Lee UC Davis Announcements PS2 due Saturday 11:59 am 2 We can approximate the Laplacian with a difference of Gaussians; more efficient
More informationVisual words. Map high-dimensional descriptors to tokens/words by quantizing the feature space.
Visual words Map high-dimensional descriptors to tokens/words by quantizing the feature space. Quantize via clustering; cluster centers are the visual words Word #2 Descriptor feature space Assign word
More informationSupervised learning. y = f(x) function
Supervised learning y = f(x) output prediction function Image feature Training: given a training set of labeled examples {(x 1,y 1 ),, (x N,y N )}, estimate the prediction function f by minimizing the
More informationPattern recognition (3)
Pattern recognition (3) 1 Things we have discussed until now Statistical pattern recognition Building simple classifiers Supervised classification Minimum distance classifier Bayesian classifier Building
More informationCategory-level localization
Category-level localization Cordelia Schmid Recognition Classification Object present/absent in an image Often presence of a significant amount of background clutter Localization / Detection Localize object
More informationCategory vs. instance recognition
Category vs. instance recognition Category: Find all the people Find all the buildings Often within a single image Often sliding window Instance: Is this face James? Find this specific famous building
More informationFeature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1
Feature Detection Raul Queiroz Feitosa 3/30/2017 Feature Detection 1 Objetive This chapter discusses the correspondence problem and presents approaches to solve it. 3/30/2017 Feature Detection 2 Outline
More informationHISTOGRAMS OF ORIENTATIO N GRADIENTS
HISTOGRAMS OF ORIENTATIO N GRADIENTS Histograms of Orientation Gradients Objective: object recognition Basic idea Local shape information often well described by the distribution of intensity gradients
More informationTensor Decomposition of Dense SIFT Descriptors in Object Recognition
Tensor Decomposition of Dense SIFT Descriptors in Object Recognition Tan Vo 1 and Dat Tran 1 and Wanli Ma 1 1- Faculty of Education, Science, Technology and Mathematics University of Canberra, Australia
More informationBSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy
BSB663 Image Processing Pinar Duygulu Slides are adapted from Selim Aksoy Image matching Image matching is a fundamental aspect of many problems in computer vision. Object or scene recognition Solving
More informationSIFT - scale-invariant feature transform Konrad Schindler
SIFT - scale-invariant feature transform Konrad Schindler Institute of Geodesy and Photogrammetry Invariant interest points Goal match points between images with very different scale, orientation, projective
More informationLarge Scale Image Retrieval
Large Scale Image Retrieval Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University in Prague Features Affine invariant features Efficient descriptors Corresponding regions
More informationArtistic ideation based on computer vision methods
Journal of Theoretical and Applied Computer Science Vol. 6, No. 2, 2012, pp. 72 78 ISSN 2299-2634 http://www.jtacs.org Artistic ideation based on computer vision methods Ferran Reverter, Pilar Rosado,
More informationLocal features: detection and description May 12 th, 2015
Local features: detection and description May 12 th, 2015 Yong Jae Lee UC Davis Announcements PS1 grades up on SmartSite PS1 stats: Mean: 83.26 Standard Dev: 28.51 PS2 deadline extended to Saturday, 11:59
More informationCS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing
CS 4495 Computer Vision Features 2 SIFT descriptor Aaron Bobick School of Interactive Computing Administrivia PS 3: Out due Oct 6 th. Features recap: Goal is to find corresponding locations in two images.
More informationLocal Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study
Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study J. Zhang 1 M. Marszałek 1 S. Lazebnik 2 C. Schmid 1 1 INRIA Rhône-Alpes, LEAR - GRAVIR Montbonnot, France
More informationInstance-level recognition II.
Reconnaissance d objets et vision artificielle 2010 Instance-level recognition II. Josef Sivic http://www.di.ens.fr/~josef INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire d Informatique, Ecole Normale
More informationAction recognition in videos
Action recognition in videos Cordelia Schmid INRIA Grenoble Joint work with V. Ferrari, A. Gaidon, Z. Harchaoui, A. Klaeser, A. Prest, H. Wang Action recognition - goal Short actions, i.e. drinking, sit
More informationIndexing local features and instance recognition May 16 th, 2017
Indexing local features and instance recognition May 16 th, 2017 Yong Jae Lee UC Davis Announcements PS2 due next Monday 11:59 am 2 Recap: Features and filters Transforming and describing images; textures,
More informationVideo Google: A Text Retrieval Approach to Object Matching in Videos
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic, Frederik Schaffalitzky, Andrew Zisserman Visual Geometry Group University of Oxford The vision Enable video, e.g. a feature
More informationFitting: The Hough transform
Fitting: The Hough transform Voting schemes Let each feature vote for all the models that are compatible with it Hopefully the noise features will not vote consistently for any single model Missing data
More informationLocal Image Features
Local Image Features Computer Vision CS 143, Brown Read Szeliski 4.1 James Hays Acknowledgment: Many slides from Derek Hoiem and Grauman&Leibe 2008 AAAI Tutorial This section: correspondence and alignment
More informationRecognition. Topics that we will try to cover:
Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) Object classification (we did this one already) Neural Networks Object class detection Hough-voting techniques
More informationLecture 16: Object recognition: Part-based generative models
Lecture 16: Object recognition: Part-based generative models Professor Stanford Vision Lab 1 What we will learn today? Introduction Constellation model Weakly supervised training One-shot learning (Problem
More informationMotion illusion, rotating snakes
Motion illusion, rotating snakes Local features: main components 1) Detection: Find a set of distinctive key points. 2) Description: Extract feature descriptor around each interest point as vector. x 1
More informationLocal Feature Detectors
Local Feature Detectors Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Slides adapted from Cordelia Schmid and David Lowe, CVPR 2003 Tutorial, Matthew Brown,
More informationLocal Features based Object Categories and Object Instances Recognition
Local Features based Object Categories and Object Instances Recognition Eric Nowak Ph.D. thesis defense 17th of March, 2008 1 Thesis in Computer Vision Computer vision is the science and technology of
More informationSURF. Lecture6: SURF and HOG. Integral Image. Feature Evaluation with Integral Image
SURF CSED441:Introduction to Computer Vision (2015S) Lecture6: SURF and HOG Bohyung Han CSE, POSTECH bhhan@postech.ac.kr Speed Up Robust Features (SURF) Simplified version of SIFT Faster computation but
More informationRecognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)
Recognition of Animal Skin Texture Attributes in the Wild Amey Dharwadker (aap2174) Kai Zhang (kz2213) Motivation Patterns and textures are have an important role in object description and understanding
More informationCS 4495 Computer Vision Classification 3: Bag of Words. Aaron Bobick School of Interactive Computing
CS 4495 Computer Vision Classification 3: Bag of Words Aaron Bobick School of Interactive Computing Administrivia PS 6 is out. Due Tues Nov 25th, 11:55pm. One more assignment after that Mea culpa This
More informationImproved Spatial Pyramid Matching for Image Classification
Improved Spatial Pyramid Matching for Image Classification Mohammad Shahiduzzaman, Dengsheng Zhang, and Guojun Lu Gippsland School of IT, Monash University, Australia {Shahid.Zaman,Dengsheng.Zhang,Guojun.Lu}@monash.edu
More informationLocal features: detection and description. Local invariant features
Local features: detection and description Local invariant features Detection of interest points Harris corner detection Scale invariant blob detection: LoG Description of local patches SIFT : Histograms
More informationPerception IV: Place Recognition, Line Extraction
Perception IV: Place Recognition, Line Extraction Davide Scaramuzza University of Zurich Margarita Chli, Paul Furgale, Marco Hutter, Roland Siegwart 1 Outline of Today s lecture Place recognition using
More informationCombining Selective Search Segmentation and Random Forest for Image Classification
Combining Selective Search Segmentation and Random Forest for Image Classification Gediminas Bertasius November 24, 2013 1 Problem Statement Random Forest algorithm have been successfully used in many
More information