Lecture 12 Recognition

Similar documents
Lecture 12 Recognition. Davide Scaramuzza

Lecture 12 Recognition

Perception IV: Place Recognition, Line Extraction

Perception. Autonomous Mobile Robots. Sensors Vision Uncertainties, Line extraction from laser scans. Autonomous Systems Lab. Zürich.

Keypoint-based Recognition and Object Search

Indexing local features and instance recognition May 16 th, 2017

By Suren Manvelyan,

Indexing local features and instance recognition May 14 th, 2015

Object Recognition and Augmented Reality

CS 4495 Computer Vision Classification 3: Bag of Words. Aaron Bobick School of Interactive Computing

CS6670: Computer Vision

Recognition. Topics that we will try to cover:

Computer Vision. Exercise Session 10 Image Categorization

A Systems View of Large- Scale 3D Reconstruction

Patch Descriptors. CSE 455 Linda Shapiro

Bag of Words Models. CS4670 / 5670: Computer Vision Noah Snavely. Bag-of-words models 11/26/2013

Indexing local features and instance recognition May 15 th, 2018

Instance-level recognition

Instance-level recognition

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Find that! Visual Object Detection Primer

Instance-level recognition II.

Local Features and Bag of Words Models

Object Recognition. Lecture 11, April 21 st, Lexing Xie. EE4830 Digital Image Processing

Patch Descriptors. EE/CSE 576 Linda Shapiro

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

6.819 / 6.869: Advances in Computer Vision

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

Discriminative classifiers for image recognition

Object Category Detection. Slides mostly from Derek Hoiem

Visual Object Recognition

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Recognizing object instances

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58

Large Scale Image Retrieval

VK Multimedia Information Systems

Robot localization method based on visual features and their geometric relationship

Robotics Programming Laboratory

Ensemble of Bayesian Filters for Loop Closure Detection

Supervised learning. y = f(x) function

Automatic Image Alignment (feature-based)

Category-level localization

Fuzzy based Multiple Dictionary Bag of Words for Image Classification

2D Image Processing Feature Descriptors

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Feature Based Registration - Image Alignment

Instance-level recognition part 2

IMAGE MATCHING - ALOK TALEKAR - SAIRAM SUNDARESAN 11/23/2010 1

CS4495/6495 Introduction to Computer Vision. 8C-L1 Classification: Discriminative models

CS4670: Computer Vision

Large-scale visual recognition The bag-of-words representation

Feature Matching and Robust Fitting

LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS

Local features and image matching. Prof. Xin Yang HUST

CPPP/UFMS at ImageCLEF 2014: Robot Vision Task

Linear combinations of simple classifiers for the PASCAL challenge

Beyond Bags of features Spatial information & Shape models

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications

Local Image Features

OBJECT CATEGORIZATION

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Local features: detection and description. Local invariant features

Robust PDF Table Locator

Large scale object/scene recognition

Window based detectors

Midterm Wed. Local features: detection and description. Today. Last time. Local features: main components. Goal: interest operator repeatability

Object Recognition with Invariant Features

Lecture 4.1 Feature descriptors. Trym Vegard Haavardsholm

Deformable Part Models

Experiments in Place Recognition using Gist Panoramas

SURF. Lecture6: SURF and HOG. Integral Image. Feature Evaluation with Integral Image

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search

Automatic Image Alignment

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

Object Category Detection: Sliding Windows

String distance for automatic image classification

Image Processing. Image Features

Visual Object Recognition

Part based models for recognition. Kristen Grauman

Basic Problem Addressed. The Approach I: Training. Main Idea. The Approach II: Testing. Why a set of vocabularies?

Large Scale 3D Reconstruction by Structure from Motion

Previously. Window-based models for generic object detection 4/11/2011

Detection of Cut-And-Paste in Document Images

Descriptors for CV. Introduc)on:

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Pattern recognition (3)

Automatic Image Alignment

Visual localization using global visual features and vanishing points

Local Feature Detectors

Recognizing Object Instances. Prof. Xin Yang HUST

Introduction to Computer Vision. Srikumar Ramalingam School of Computing University of Utah

Category vs. instance recognition

Introduction. Introduction. Related Research. SIFT method. SIFT method. Distinctive Image Features from Scale-Invariant. Scale.

A Tale of Two Object Recognition Methods for Mobile Robots

Uncertainties: Representation and Propagation & Line Extraction from Range data

Large-scale visual recognition Efficient matching

Local features: detection and description May 12 th, 2015

Part-based and local feature models for generic object recognition

Transcription:

Institute of Informatics Institute of Neuroinformatics Lecture 12 Recognition Davide Scaramuzza 1

Lab exercise today replaced by Deep Learning Tutorial Room ETH HG E 1.1 from 13:15 to 15:00 Optional lab exercise is online: K-means clustering and Bag of Words place recognition 4

Outline Recognition applications and challenges Recognition approaches Classifiers K-means clustering Bag of words 5

Application: large-scale image retrieval Query image Closest results from a database of 100 million images 6

Application: recognition for mobile phones Smartphone: Lincoln Microsoft Research Point & Find, Nokia SnapTell.com (Amazon) Google Goggles 7

Application: Face recognition See iphoto, Google Photos, Facebook Detection Recognition Sally 8

Application: Face recognition Detection works by using four basic types of feature detectors The white areas are subtracted from the black ones. A special representation of the sample called the integral image makes feature extraction faster. P. Viola, M. Jones: Robust Real-time Object Detection, Int. Journal of Computer Vision 2001 9

Application: Optical character recognition (OCR) Technology to convert scanned docs to text If you have a scanner, it probably came with OCR software Digit recognition, AT&T labs, using CNN, by Yann LeCun (1993) http://yann.lecun.com/ License plate readers http://en.wikipedia.org/wiki/automatic_number_plate_recognition 10

Application: pedestrian recognition Detector: Histograms of oriented gradients (HOG) Credit: Van Gool s lab, ETH Zurich 11

Challenges: object intra-class variations How to recognize ANY car How to recognize ANY cow 12

Challenges: object intra-class variations How to recognize ANY chair 15

Challenges: context and human experience 16

Outline Recognition applications and challenges Recognition approaches Classifiers K-means clustering Bag of words 17

Research progress in recognition 1960-1990 Polygonal objects 1990-2000 Faces, characters, planar objects 2000-today Any kind of object 18

Two schools of approaches Model based Tries to fit a model (2D or 3D) using a set of corresponding features (lines, point features) Example: SIFT matching and RANSAC for model validation Appearance based The model is defined by a set of images representing the object Example: template matching can be thought as a simple object recognition algorithm (the template is the object to recognize 19

Example of 2D model-based approach Q: Is this Book present in the Scene? Scene Book 21

Example of 2D model-based approach Q: Is this Book present in the Scene? Extract keypoints in both images 22

Example of 2D model-based approach Q: Is this Book present in the Scene? Most of the Book s keypoints are present in the Scene Look for corresponding matches A: The Book is present in the Scene 23

Example of appearance-based approach: Simple 2D template matching The model of the object is simply an image A simple example: Template matching Shift the template over the image and compare (e.g. NCC or SSD) Problem: works only if template and object are identical 24

Example of appearance-based approach: Simple 2D template matching The model of the object is simply an image A simple example: Template matching Shift the template over the image and compare (e.g. NCC or SSD) Problem: works only if template and object are identical 25

Outline Recognition applications and challenges Recognition approaches Classifiers K-means clustering Bag of words 26

What is the goal of object recognition? Goal: classify! Binary classifier say yes/no as to whether an object is present in an image Multi-class classifier categorize an object: determine what class it belongs to (e.g., car, apple, etc.) How to display the result to the user Bounding box on object Full segmentation Is this or is this not a car? Bounding box on object Full segmentation 27

Detection via classification: Main idea Basic component: a binary classifier Car/non-car Classifier No, Yes, not car. a car. 28

Detection via classification: Main idea More in detail, we need to: 1. Obtain training data 2. Define features 3. Define classifier Training examples Car/non-car Classifier Feature extraction 29

Detection via classification: Main idea Consider all subwindows in an image Sample at multiple scales and positions Make a decision per window: Does this contain object X or not? Car/non-car Classifier 30

Generalization: the machine learning approach 31

Generalization: the machine learning approach Apply a prediction function to a feature representation of the image to get the desired output: f( ) = apple f( ) = tomato f( ) = cow 32

The machine learning framework y = f(x) output prediction function Input: Image features Training: given a training set of labeled examples {(x 1, y 1 ),, (x N, y N )}, estimate the prediction function f by minimizing the prediction error (loss = y f(x)) on the training set Testing: apply f to a never-before-seen test example x and output the predicted value y = f(x) 33

Recognition task and supervision Images in the training set must be labeled with the correct answer that the model is expected to produce Contains a motorbike 34

Examples of possible features Blob features Image Histograms Histograms of oriented gradients (HOG) 37

Classifiers: Nearest neighbor Features are represented in the descriptor space (Ex. What is the dimensionality of the descriptor space for SIFT features?) Training examples from class 1 Test example Training examples from class 2 f(x) = label of the training example nearest to x No training required! All we need is a distance function for our inputs Problem: need to compute distances to all training examples! (what if you have 1 million training images and 1 thousand features per image?) 38

Classifiers: Linear Find a linear function to separate the classes: f(x) = sgn(w x + b) 39

Classifiers: non-linear Good classifier Bad classifier (over fitting) 40

Outline Recognition applications and challenges Recognition approaches Classifiers K-means clustering Bag of words 41

How do we define a classifier? We first need to cluster the training data Then, we need a distance function to determine to which cluster the query image belongs to 42

K-means clustering k-means clustering is an algorithm to partition n observations into k clusters in which each observation x belongs to the cluster S i with center m i It minimizes the sum of squared Euclidean distances between points x and their nearest cluster centers m i Algorithm: k D X, M = Randomly initialize k cluster centers Iterate until convergence: i=1 Assign each data point x j to the nearest center m i Recompute each cluster center as the mean of all points assigned to it (x m i ) 2 x S i 43

K-means demo Source: http://shabal.in/visuals/kmeans/1.html 44

Outline Recognition applications and challenges Recognition approaches Classifiers K-means clustering Bag of words 45

Application: large-scale image retrieval Query image Results on a database of 100 million images 46

Slide Slide Credit: Nister 47

Slide Slide Credit: Nister 48

Slide Slide Credit: Nister 49

Fast visual search Query of 1 image in a database of 100 million images in 6 seconds Video Google, Sivic and Zisserman, ICCV 2003 Scalable Recognition with a Vocabulary Tree, Nister and Stewenius, CVPR 2006. 50

Bag of Words Extension to scene/place recognition: Is this image in my database? Robot: Have I been to this place before? 51

Visual Place Recognition Goal: find the most similar images of a query image in a database of N images Complexity: N2 M 2 2 feature comparisons (assumes each image has M features) Each image must be compared with all other images! N is the number of all images collected by a robot - Example: assume your Roomba robot takes 1 image every meter to cover a 100 m 2 house; assume 100 SIFT features per image M = 100, N = 100 N 2 M 2 /2= ~ 50 Million feature comparisons! Solution: Use an inverted file index! Complexity reduces to N M [ Video Google, Sivic & Zisserman, ICCV 03] [ Scalable Recognition with a Vocabulary Tree, Nister & Stewenius, CVPR 06] See also FABMAP and Galvez-Lopez 12 s (DBoW2)] University of Zurich Robotics and Perception Group - rpg.ifi.uzh.ch

Indexing local features: inverted file text For text documents, an efficient way to find all pages in which a word occurs is to use an index We want to find all images in which a feature occurs How many distinct SIFT or BRISK features exist? SIFT Infinite BRISK-128 2 128 = 3.4 10 38 Since the number of image features may be infinite, before we build our visual vocabulary we need to map our features to visual words Using analogies from text retrieval, we should: Define a Visual Word Define a vocabulary of Visual Words This approach is known as Bag of Words (BOW) 53

Building the Visual Vocabulary Image Collection Extract Features Cluster Descriptors Descriptors space What is a visual word? A visual word is the centroid of a cluster! Examples of Features belonging to the same clusters 54

Inverted File index Inverted File Index lists all visual words in the vocabulary (extracted at training time) Each word points to a list of images, from the all image Data Base (DB), in which that word appears. The DB grows as the robot navigates and collects new images. Voting array: has as many cells as the images in the DB. Each word in the query image votes for an image Inverted File Index Query image Q Visual words in Q 101 103 105 105 180 180 180 Visual word 0 101 102 103 104 105 Voting Array for Q List of images in which this word appears +1 +1 +1 56

Populating the vocabulary Feature descriptor space 57

Populating the vocabulary 58

Populating the vocabulary 59

Populating the vocabulary 60

Populating the vocabulary 61

Populating the vocabulary 62

63

64

65

66

67

68

69

70

71

72

73

74

Building the inverted file index 75

Building the inverted file index 76

Building the inverted file index 77

Building the inverted file index 78

Recognition 79

Robust object/scene recognition Visual Vocabulary discards the spatial relationships between features Two images with the same features shuffled around will return a 100% match when using only appearance information. This can be overcome using geometric verification Test the h most similar images to the query image for geometric consistency (e.g. using 5- or 8-point RANSAC) and retain the image with the smallest reprojection error and largest number of inliers Further reading (out of scope of this course): [Cummins and Newman, IJRR 2011] [Stewénius et al, ECCV 2012] 80

Video Google System Query region 1. Collect all words within query region 2. Inverted file index to find relevant frames 3. Compare word counts 4. Spatial verification Retrieved frames Sivic & Zisserman, ICCV 2003 Demo online at : http://www.robots.ox.ac.uk/~vgg/re search/vgoogle/ 81

More words is better Improves Retrieval Improves Speed Branch factor 83

FABMAP [Cummins and Newman IJRR 2011] Place recognition for robot localization Uses training images to build the BoW database Captures the spatial dependencies of visual words to distinguish the most characteristic structure of each scene Probabilistic model of the world. At a new frame, compute: P(being at a known place) P(being at a new place) Very high performance Binaries available online Open FABMAP 86

Things to remember Appearance-based object recognition Classifiers K-means clustering Bag of Words approach What is visual word Inverted file index How it works Chapter 14 of the Szeliski s book 93