Recognition. Topics that we will try to cover:

Similar documents
Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Indexing local features and instance recognition May 16 th, 2017

Object Recognition and Augmented Reality

Indexing local features and instance recognition May 14 th, 2015

Keypoint-based Recognition and Object Search

Recognizing object instances

Indexing local features and instance recognition May 15 th, 2018

CS 4495 Computer Vision Classification 3: Bag of Words. Aaron Bobick School of Interactive Computing

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58

Lecture 12 Recognition

Lecture 12 Recognition

Object Detection. Sanja Fidler CSC420: Intro to Image Understanding 1/ 1

Recognizing Object Instances. Prof. Xin Yang HUST

Lecture 12 Recognition. Davide Scaramuzza

CS6670: Computer Vision

ECS 189G: Intro to Computer Vision, Spring 2015 Problem Set 3

By Suren Manvelyan,

6.819 / 6.869: Advances in Computer Vision

Visual Object Recognition

Bag of Words Models. CS4670 / 5670: Computer Vision Noah Snavely. Bag-of-words models 11/26/2013

Large-scale visual recognition The bag-of-words representation

Exercise 4: Image Retrieval with Vocabulary Trees

Vocabulary tree. Vocabulary tree supports very efficient retrieval. It only cares about the distance between a query feature and each node.

Feature Matching + Indexing and Retrieval

Image classification Computer Vision Spring 2018, Lecture 18

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Image Retrieval with a Visual Thesaurus

Recognizing object instances. Some pset 3 results! 4/5/2011. Monday, April 4 Prof. Kristen Grauman UT-Austin. Christopher Tosh.

Part based models for recognition. Kristen Grauman

Large Scale Image Retrieval

Part-based and local feature models for generic object recognition

Large scale object/scene recognition

Instance recognition

Large-scale Image Search and location recognition Geometric Min-Hashing. Jonathan Bidwell

Paper Presentation. Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval

Today. Main questions 10/30/2008. Bag of words models. Last time: Local invariant features. Harris corner detector: rotation invariant detection

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Efficient Representation of Local Geometry for Large Scale Object Retrieval

Instance-level recognition II.

Patch Descriptors. CSE 455 Linda Shapiro

Deformable Part Models

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Perception IV: Place Recognition, Line Extraction

Multiple-Choice Questionnaire Group C

DeepIndex for Accurate and Efficient Image Retrieval

Distances and Kernels. Motivation

Three things everyone should know to improve object retrieval. Relja Arandjelović and Andrew Zisserman (CVPR 2012)

Rich feature hierarchies for accurate object detection and semantic segmentation

Detection of Cut-And-Paste in Document Images

Compressed local descriptors for fast image and video search in large databases

Lecture 14: Indexing with local features. Thursday, Nov 1 Prof. Kristen Grauman. Outline

A Systems View of Large- Scale 3D Reconstruction

Using Machine Learning for Classification of Cancer Cells

SEARCH BY MOBILE IMAGE BASED ON VISUAL AND SPATIAL CONSISTENCY. Xianglong Liu, Yihua Lou, Adams Wei Yu, Bo Lang

Bundling Features for Large Scale Partial-Duplicate Web Image Search

EE368/CS232 Digital Image Processing Winter

From Structure-from-Motion Point Clouds to Fast Location Recognition

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

Learning bottom-up visual processes using automatically generated ground truth data

Perception. Autonomous Mobile Robots. Sensors Vision Uncertainties, Line extraction from laser scans. Autonomous Systems Lab. Zürich.

Discriminative classifiers for image recognition

IMAGE MATCHING - ALOK TALEKAR - SAIRAM SUNDARESAN 11/23/2010 1

Efficient Large-scale Image Search With a Vocabulary Tree. PREPRINT December 19, 2017

Visual Recognition and Search April 18, 2008 Joo Hyun Kim

Patch Descriptors. EE/CSE 576 Linda Shapiro

Instance-level recognition part 2

Storyline Reconstruction for Unordered Images

Efficient Category Mining by Leveraging Instance Retrieval

Visual Word based Location Recognition in 3D models using Distance Augmented Weighting

Learning to Match Images in Large-Scale Collections

Scalable Recognition with a Vocabulary Tree

Computer Vision. Exercise Session 10 Image Categorization

Beyond Bags of Features

TRECVid 2013 Experiments at Dublin City University

Scalable Recognition with a Vocabulary Tree

TA Section: Problem Set 4

Geometric VLAD for Large Scale Image Search. Zixuan Wang 1, Wei Di 2, Anurag Bhardwaj 2, Vignesh Jagadesh 2, Robinson Piramuthu 2

ECE 5424: Introduction to Machine Learning

Descriptors for CV. Introduc)on:

Object Category Detection. Slides mostly from Derek Hoiem

Selective Search for Object Recognition

Modern-era mobile phones and tablets

Nagoya University at TRECVID 2014: the Instance Search Task

Video Google faces. Josef Sivic, Mark Everingham, Andrew Zisserman. Visual Geometry Group University of Oxford

Mixtures of Gaussians and Advanced Feature Encoding

Recap Image Classification with Bags of Local Features

Depth from Stereo. Sanja Fidler CSC420: Intro to Image Understanding 1/ 12

Feature Matching and Robust Fitting

Multi-View Vocabulary Trees for Mobile 3D Visual Search

Large Scale 3D Reconstruction by Structure from Motion

More Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA

Category Recognition. Jia-Bin Huang Virginia Tech ECE 6554 Advanced Computer Vision

Large-scale visual recognition Efficient matching

Describable Visual Attributes for Face Verification and Image Search

Mobile Visual Search with Word-HOG Descriptors

Object Classification Problem

Supervised learning. f(x) = y. Image feature

Video Google: A Text Retrieval Approach to Object Matching in Videos

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Transcription:

Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) Object classification (we did this one already) Neural Networks Object class detection Hough-voting techniques Support Vector Machines (SVM) detector on HOG features Deformable part-based model (DPM) R-CNN (detector with Neural Networks) Segmentation Unsupervised segmentation ( bottom-up techniques) Supervised segmentation ( top-down techniques) Sanja Fidler CSC420: Intro to Image Understanding 1/ 31

Recognition: Indexing for Fast Retrieval Sanja Fidler CSC420: Intro to Image Understanding 2/ 31

Recognizing or Retrieving Specific Objects Example: Visual search in feature films Demo: http://www.robots.ox.ac.uk/~vgg/research/vgoogle/ [Source: J. Sivic, slide credit: R. Urtasun] Sanja Fidler CSC420: Intro to Image Understanding 3/ 31

Recognizing or Retrieving Specific Objects Example: Search photos on the web for particular places [Source: J. Sivic, slide credit: R. Urtasun] Sanja Fidler CSC420: Intro to Image Understanding 4/ 31

Sanja Fidler CSC420: Intro to Image Understanding 5/ 31

Why is it Di cult? Objects can have possibly large changes in scale, viewpoint, lighting and partial occlusion. [Source: J. Sivic, slide credit: R. Urtasun] Sanja Fidler CSC420: Intro to Image Understanding 6/ 31

Why is it Di cult? There is tones of data. Sanja Fidler CSC420: Intro to Image Understanding 7/ 31

Our Case: Matching with Local Features For each image in our database we extracted local descriptors (e.g., SIFT) Sanja Fidler CSC420: Intro to Image Understanding 8/ 31

Our Case: Matching with Local Features For each image in our database we extracted local descriptors (e.g., SIFT) Sanja Fidler CSC420: Intro to Image Understanding 8/ 31

Our Case: Matching with Local Features Let s focus on descriptors only (vectors of e.g. 128 dim for SIFT) Sanja Fidler CSC420: Intro to Image Understanding 8/ 31

Our Case: Matching with Local Features Sanja Fidler CSC420: Intro to Image Understanding 8/ 31

Our Case: Matching with Local Features Sanja Fidler CSC420: Intro to Image Understanding 8/ 31

Our Case: Matching with Local Features Sanja Fidler CSC420: Intro to Image Understanding 8/ 31

Indexing! Sanja Fidler CSC420: Intro to Image Understanding 9/ 31

Indexing Local Features: Inverted File Index For text documents, an e cient way to find all pages on which a word occurs is to use an index. We want to find all images in which a feature occurs. To use this idea, well need to map our features to visual words. Why? [Source: K. Grauman, slide credit: R. Urtasun] Sanja Fidler CSC420: Intro to Image Understanding 10 / 31

How would visual words help us? Sanja Fidler CSC420: Intro to Image Understanding 11 / 31

How would visual words help us? Sanja Fidler CSC420: Intro to Image Understanding 11 / 31

How would visual words help us? Sanja Fidler CSC420: Intro to Image Understanding 11 / 31

How would visual words help us? Sanja Fidler CSC420: Intro to Image Understanding 11 / 31

But What Are Our Visual Words? Sanja Fidler CSC420: Intro to Image Understanding 12 / 31

But What Are Our Visual Words? Sanja Fidler CSC420: Intro to Image Understanding 12 / 31

But What Are Our Visual Words? Sanja Fidler CSC420: Intro to Image Understanding 12 / 31

But What Are Our Visual Words? Sanja Fidler CSC420: Intro to Image Understanding 12 / 31

But What Are Our Visual Words? Sanja Fidler CSC420: Intro to Image Understanding 12 / 31

But What Are Our Visual Words? Sanja Fidler CSC420: Intro to Image Understanding 12 / 31

But What Are Our Visual Words? Sanja Fidler CSC420: Intro to Image Understanding 12 / 31

Visual Words All example patches on the right belong to the same visual word. [Source: R. Urtasun] Sanja Fidler CSC420: Intro to Image Understanding 13 / 31

Now We Can do Our Fast Matching Sanja Fidler CSC420: Intro to Image Understanding 14 / 31

Inverted File Index Now we found all images in the database that have at least one visual word in common with the query image But this can still give us lots of images... What can we do? Sanja Fidler CSC420: Intro to Image Understanding 15 / 31

Inverted File Index Now we found all images in the database that have at least one visual word in common with the query image But this can still give us lots of images... What can we do? Idea: Compute meaningful similarity (e ciently) between query image and retrieved images. Then just match query to top K most similar images and forget about the rest. Sanja Fidler CSC420: Intro to Image Understanding 15 / 31

Inverted File Index Now we found all images in the database that have at least one visual word in common with the query image But this can still give us lots of images... What can we do? Idea: Compute meaningful similarity (e ciently) between query image and retrieved images. Then just match query to top K most similar images and forget about the rest. How can we do compute a meaningful similarity, and do it fast? Sanja Fidler CSC420: Intro to Image Understanding 15 / 31

Relation to Documents [Slide credit: R. Urtasun] Sanja Fidler CSC420: Intro to Image Understanding 16 / 31

Bags of Visual Words [Slide credit: R. Urtasun] Summarize entire image based on its distribution (histogram) of word occurrences. Analogous to bag of words representation commonly used for documents. Sanja Fidler CSC420: Intro to Image Understanding 17 / 31

Compute a Bag-of-Words Description Sanja Fidler CSC420: Intro to Image Understanding 18 / 31

Compute a Bag-of-Words Description Sanja Fidler CSC420: Intro to Image Understanding 18 / 31

Compute a Bag-of-Words Description Sanja Fidler CSC420: Intro to Image Understanding 18 / 31

Comparing Images Compute the similarity by normalized dot product between their representations (vectors) sim(t j, q) = < t j, q > t j q Rank images in database based on the similarity score (the higher the better) Take top K best ranked images and do spatial verification (compute transformation and count inliers) Sanja Fidler CSC420: Intro to Image Understanding 19 / 31

Comparing Images Compute the similarity by normalized dot product between their representations (vectors) sim(t j, q) = < t j, q > t j q Rank images in database based on the similarity score (the higher the better) Take top K best ranked images and do spatial verification (compute transformation and count inliers) Sanja Fidler CSC420: Intro to Image Understanding 19 / 31

Compute a Better Bag-of-Words Description Sanja Fidler CSC420: Intro to Image Understanding 20 / 31

Compute a Better Bag-of-Words Description Sanja Fidler CSC420: Intro to Image Understanding 20 / 31

Compute a Better Bag-of-Words Description Instead of a histogram, for retrieval it s better to re-weight the image description vector t =[t 1, t 2,...,t i,...]withterm frequency-inverse document frequency (tf-idf), a standard trick in document retrieval: t i = n id log N n d n i where: n id... is the number of occurrences of word i in image d n d... is the total number of words in image d n i... is the number of occurrences of word i in the whole database N... is the number of documents in the whole database Sanja Fidler CSC420: Intro to Image Understanding 20 / 31

Compute a Better Bag-of-Words Description Instead of a histogram, for retrieval it s better to re-weight the image description vector t =[t 1, t 2,...,t i,...]withterm frequency-inverse document frequency (tf-idf), a standard trick in document retrieval: t i = n id log N n d n i where: n id... is the number of occurrences of word i in image d n d... is the total number of words in image d n i... is the number of occurrences of word i in the whole database N... is the number of documents in the whole database The weighting is a product of two terms: the word frequency n id n d,andthe inverse document frequency log N n i Sanja Fidler CSC420: Intro to Image Understanding 20 / 31

Compute a Better Bag-of-Words Description Instead of a histogram, for retrieval it s better to re-weight the image description vector t =[t 1, t 2,...,t i,...]withterm frequency-inverse document frequency (tf-idf), a standard trick in document retrieval: t i = n id log N n d n i where: n id... is the number of occurrences of word i in image d n d... is the total number of words in image d n i... is the number of occurrences of word i in the whole database N... is the number of documents in the whole database The weighting is a product of two terms: the word frequency n id n d,andthe inverse document frequency log N n i Intuition behind this: word frequency weights words occurring often in a particular document, and thus describe it well, while the inverse document frequency downweights the words that occur often in the full dataset Sanja Fidler CSC420: Intro to Image Understanding 20 / 31

Comparing Images Compute the similarity by normalized dot product between their tf-idf representations (vectors) sim(t j, q) = < t j, q > t j q Rank images in database based on the similarity score (the higher the better) Take top K best ranked images and do spatial verification (compute transformation and count inliers) Sanja Fidler CSC420: Intro to Image Understanding 21 / 31

Spatial Verification Both image pairs have many visual words in common Only some of the matches are mutually consistent [Source: O. Chum] Sanja Fidler CSC420: Intro to Image Understanding 22 / 31

Visual Words/Bags of Words Good flexible to geometry / deformations / viewpoint compact summary of image content provides vector representation for sets good results in practice Bad background and foreground mixed when bag covers whole image optimal vocabulary formation remains unclear basic model ignores geometry must verify afterwards, or encode via features Sanja Fidler CSC420: Intro to Image Understanding 23 / 31

Summary Stu You Need To Know Fast image retrieval: Compute features in all images from database, and query image. Cluster the descriptors from the images in the database (e.g., k-means) to get k clusters. These clusters are vectors that live in the same dimensional space as the descriptors. We call them visual words. Assign each descriptor in database and query image to the closest cluster. Build an inverted file index For a query image, lookup all the visual words in the inverted file index to get a list of images that share at least one visual word with the query Compute a bag-of-words (BoW) vector for each retrieved image and query. This vector just counts the number of occurrences of each word. It has as many dimensions as there are visual words. Weight the vector with tf-idf. Compute similarity between query BoW vector and all retrieved image BoW vectors. Sort (highest to lowest). Take top K most similar images (e.g, 100) Do spatial verification on all top K retrieved images (RANSAC + a homography + remove images with too few inliers) ne or Sanja Fidler CSC420: Intro to Image Understanding 24 / 31

Summary Stu You Need To Know Matlab function: [IDX, W] = kmeans(x, k); where rows of X are descriptors, rows of W are visual words vectors, and IDX are assignments of rows of X to visual words Once you have W, you can quickly compute IDX via the dist2 function (Assignment 2): D=dist2(X,W );[,IDX] = min(d, [], 2); A much faster way of computing the closest cluster (IDX) is via the FLANN library: http://www.cs.ubc.ca/research/flann/ Since X is typically super large, kmeans will run for days... A solution is to randomly sample a few descriptors from X and cluster those. Another great possibility is to use this: http://www.robots.ox.ac.uk/~vgg/software/fastanncluster/ Sanja Fidler CSC420: Intro to Image Understanding 25 / 31

Even Faster? Can we make the retrieval process even more e cient? Sanja Fidler CSC420: Intro to Image Understanding 26 / 31

Vocabulary Trees Hierarchical clustering for large vocabularies, [Nister et al., 06]. k defines the branch factor (number of children of each node) of the tree. Sanja Fidler CSC420: Intro to Image Understanding 27 / 31

Vocabulary Trees Hierarchical clustering for large vocabularies, [Nister et al., 06]. k defines the branch factor (number of children of each node) of the tree. First, an initial k-means process is run on the training data, defining k cluster centers (same as we did before). Sanja Fidler CSC420: Intro to Image Understanding 27 / 31

Vocabulary Trees Hierarchical clustering for large vocabularies, [Nister et al., 06]. k defines the branch factor (number of children of each node) of the tree. First, an initial k-means process is run on the training data, defining k cluster centers (same as we did before). The same process is then recursively applied to each group. Sanja Fidler CSC420: Intro to Image Understanding 27 / 31

Vocabulary Trees Hierarchical clustering for large vocabularies, [Nister et al., 06]. k defines the branch factor (number of children of each node) of the tree. First, an initial k-means process is run on the training data, defining k cluster centers (same as we did before). The same process is then recursively applied to each group. The tree is determined level by level, up to some maximum number of levels L. Sanja Fidler CSC420: Intro to Image Understanding 27 / 31

Vocabulary Trees Hierarchical clustering for large vocabularies, [Nister et al., 06]. k defines the branch factor (number of children of each node) of the tree. First, an initial k-means process is run on the training data, defining k cluster centers (same as we did before). The same process is then recursively applied to each group. The tree is determined level by level, up to some maximum number of levels L. Sanja Fidler CSC420: Intro to Image Understanding 27 / 31

Constructing the tree O ine phase: hierarchical clustering (e.g., k-means at leach level). Vocabulary Tree Sanja Fidler CSC420: Intro to Image Understanding 28 / 31

Constructing the tree O ine phase: hierarchical clustering (e.g., k-means at leach level). Vocabulary Tree Sanja Fidler CSC420: Intro to Image Understanding 28 / 31

Constructing the tree O ine phase: hierarchical clustering (e.g., k-means at leach level). Vocabulary Tree Sanja Fidler CSC420: Intro to Image Understanding 28 / 31

Constructing the tree O ine phase: hierarchical clustering (e.g., k-means at leach level). Vocabulary Tree Sanja Fidler CSC420: Intro to Image Understanding 28 / 31

Assigning Descriptors to Words Sanja Fidler CSC420: Intro to Image Understanding 29 / 31

Assigning Descriptors to Words Sanja Fidler CSC420: Intro to Image Understanding 29 / 31

Assigning Descriptors to Words Each descriptor vector is propagated down the tree by at each level comparing the descriptor vector to the k candidate cluster centers (represented by k children in the tree) and choosing the closest one. Sanja Fidler CSC420: Intro to Image Understanding 29 / 31

Assigning Descriptors to Words Each descriptor vector is propagated down the tree by at each level comparing the descriptor vector to the k candidate cluster centers (represented by k children in the tree) and choosing the closest one. Sanja Fidler CSC420: Intro to Image Understanding 29 / 31

Assigning Descriptors to Words The tree allows us to e ciently match a descriptor to a very large vocabulary Sanja Fidler CSC420: Intro to Image Understanding 29 / 31

Assigning Descriptors to Words Sanja Fidler CSC420: Intro to Image Understanding 29 / 31

Vocabulary Size Complexity: branching factor and number of levels Most important for the retrieval quality is to have a large vocabulary Sanja Fidler CSC420: Intro to Image Understanding 30 / 31

Next Time Object Detection Sanja Fidler CSC420: Intro to Image Understanding 31 / 31