Visual Object Recognition -67777 Instructor: Daphna Weinshall, daphna@cs.huji.ac.il Office: Ross 211 Office hours: Sunday 12:00-13:00 1
Sources Recognizing and Learning Object Categories ICCV 2005 short courses Li Fei-Fei(UIUC), Rob Fergus (Oxford-MIT), Antonio Torralba(MIT) http://people.csail.mit.edu/torralba/iccv2005 Object Recognition (UPenn, CSE399b Spring 2007) Jianbo Shi (UPenn) Recognition and Matching based on local invariant features Cordelia Schmid(INRIA), David Lowe (UBC) Visual Object Recognition AAAI tutorial, July 2008 Bastian Leibe(ETH Zurich), Kristen Grauman(U Texas, Austin) 2
What is Visual Recognition About Finding and recognizing objects in a picture o This is a car (class recognition) o Look, there is Obama (object identification) o Shmuel is the person on the left (localization) Problems: the same object may appear very differently o Change of viewpoint (facial profile vs. fronto-parallel) o Change of illumination o Object articulation (a person lifting up his hand) o Intra-class variability (there are many types of cars) 3
Object Identification???? 4 Visual Object Recognition - 67777
Object Categorization How to recognize ANY car How to recognize ANY cow 5
What could be done with recognition algorithms? There is a wide range of applications, including Autonomous robots Navigation, driver safety Content-based retrieval and analysis for images and videos Situated search Medical image analysis 6
Object Categorization Task Description o Given a small number of training images of a category, recognize a-priori unknown instances of that category and assign the correct category label. Which categories are feasible visually? o Extensively studied in Cognitive Psychology, e.g. [Brown 58] Fido German dog animal living 7 shepherd being
Visual Object Categories Basic Level Categories in human categorization [Rosch 76, Lakoff 87] o The highest level at which category members have similar perceived shape o The highest level at which a single mental image reflects the entire category o The level at which human subjects are usually fastest at identifying category members o The first level named and understood by children o The highest level at which a person uses similar motor actions for interaction with category members 8
Visual Object Categories Basic-level categories in humans seem to be defined predominantly visually. There is evidence that humans (usually) start with basic-level categorization before doing identification. Basic-level categorization is easier and faster for humans than object identification! Most promising starting point for visual classification Individual level recognition is easier for most automatic methods Basic level Individual level Abstract levels dog German shepherd Fido animal quadruped cat cow Doberman 9
Other Types of Categories Functional Categories e.g., chairs = something you can sit on 10
Other Types of Categories Ad-hoc categories e.g., something you can find in an office environment 11
Levels of Object Categorization cow car motorbike Different levels of recognition o Which object class is in the image? o Where is it in the image? o Where exactly which pixels? Obj/Img classification Detection/Localization Figure/Ground segmentation 12
What makes visual object recognition difficult? Geometrical transformations (viewpoint translation, rotation, scale) Photometric transformations (Illumination) Deformations Intra-class variations Scene obstructions: occlusion, clutter 13
pose Michelangelo 1475-1564 14
scale 15 Visual Object Recognition - 67777
illumination 16 slide credit: S. Ullman Visual Object Recognition - 67777
deformations Xu, Beihong 1943 17
intra-class variation 18 Visual Object Recognition - 67777
occlusion Magritte, 1957 19 Visual Object Recognition - 67777
background clutter Klimt, 1913 20 Visual Object Recognition - 67777
Context 21 Visual Object Recognition - 67777
Context 22 Image credit: D. Hoeim Visual Object Recognition - 67777
Context 23 Visual Object Recognition - 67777
Example: Detection in Crowded Scenes Learn object variability o Changes in appearance, scale, and articulation Compensate for clutter, overlap, and occlusion 24
Teachers are scarce how much supervision? Less More 25 Visual Object Recognition - 67777
Rough evolution of focus in recognition research 1980s rigid planar objects 1990s to early 2000s Currently higher level categories 26
Recognition system Feature extraction: use a set of basic features Learning: learn representation of objects from training images (sometimes of minimal size or larger) Recognition: decide whether the object exists in the image odetection o Localization o Segmentation 27
Object or image representation A picture is a function (array) of grey level values I 10 250 I(x,y) 20 200 30 150 Visual Object Recognition - 67777 y 40 50 60 70 10 20 30 40 50 60 70 80 90 x First challenge: obtain informative representation o Global representation o Local features representation 100 50 0 80 60 40 y 20 0 0 20 40 x 60 80 100 28
Global representations: limitations All parts of the image or window impact the description -> sensitive to occlusion, clutter, viewpoint 29
Local representations Describe component regions or patches separately: SIFT [Lowe 99] Salient regions [Kadir 01] Shape context [Belongie 02] Harris-Affine [Mikolajczyk 04] Superpixels [Ren et al.] Spin images [Johnson 99] Maximally Stable Extremal Regions [Matas 02] Geometric Blur [Berg 05] 30
Main issues with feature detection: 1. Where features are detected: oa single feature for the whole object (global representation) oon a grid o Random sampling o At interest points 2. How feature regions are described o Fixed window (suitable for global representation) ovary size of window ovary shape of window 31
Learning representation: When number of features is not prohibitive (e.g., number of pixels, quantized color values), suitable for global representation o Vector (ordered list) of all features o Histogram (distribution) of feature values Group of possible features is very large o Set of features in object o Set of object features, related to each other by a Graphical model o Set of object features, where the image is some projection transformation of those features This representation is learned, or computed, from a set of training images 32
Recognition Global representation: sliding window Car/non-car Classifier 33
Discussion Global representations have major limitations Alternatively, describe and match only local regions Increased robustness to o Occlusions o Articulation o Intra-category variations d φ θ θ q φ d q 34
Recognition Representation with local features: o Compute local features in the image o Intersect set of image features with set of object features or o Compute probability for image features based on object s graphical model or o Check whether the image is a permissible transformation of the object: invariants, indexing + verification, geometric hashing suitable for recognition at individual level 35
Organizaiton of the course About 4 classes of frontal lectures, discussing: o Features: detectors (part1) and descriptors (part2) orepresentation of objects: vectors, histograms, bag of visual words, constellation models, 3D models (learning) orecognition of objects: probabilistic approaches, geometrical approaches (inference) Remaining classes: student presentations of recent papers on object recognition (seminar format), 2-3 papers per class Final project 36