Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Similar documents
Indexing local features and instance recognition May 15 th, 2018

Indexing local features and instance recognition May 16 th, 2017

By Suren Manvelyan,

Indexing local features and instance recognition May 14 th, 2015

CS 4495 Computer Vision Classification 3: Bag of Words. Aaron Bobick School of Interactive Computing

Advanced Techniques for Mobile Robotics Bag-of-Words Models & Appearance-Based Mapping

Distances and Kernels. Motivation

Pattern recognition (3)

Recognizing Object Instances. Prof. Xin Yang HUST

Lecture 14: Indexing with local features. Thursday, Nov 1 Prof. Kristen Grauman. Outline

Recognizing object instances

Today. Main questions 10/30/2008. Bag of words models. Last time: Local invariant features. Harris corner detector: rotation invariant detection

Instance recognition

Recognition with Bag-ofWords. (Borrowing heavily from Tutorial Slides by Li Fei-fei)

Recognizing object instances. Some pset 3 results! 4/5/2011. Monday, April 4 Prof. Kristen Grauman UT-Austin. Christopher Tosh.

Feature Matching + Indexing and Retrieval

Object Classification for Video Surveillance

Recognition. Topics that we will try to cover:

Visual Navigation for Flying Robots. Structure From Motion

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Patch Descriptors. CSE 455 Linda Shapiro

Lecture 12 Visual recognition

EECS 442 Computer vision. Object Recognition

Keypoint-based Recognition and Object Search

Visual Object Recognition

CS6670: Computer Vision

Patch Descriptors. EE/CSE 576 Linda Shapiro

The bits the whirl-wind left out..

Supervised learning. f(x) = y. Image feature

Object Recognition and Augmented Reality

Bag of Words Models. CS4670 / 5670: Computer Vision Noah Snavely. Bag-of-words models 11/26/2013

Part-based and local feature models for generic object recognition

Video Google: A Text Retrieval Approach to Object Matching in Videos

Large-scale visual recognition The bag-of-words representation

Large Scale Image Retrieval

Lecture 12 Recognition

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Image Features and Categorization. Computer Vision Jia-Bin Huang, Virginia Tech

The SIFT (Scale Invariant Feature

Lecture 12 Recognition. Davide Scaramuzza

Image Features and Categorization. Computer Vision Jia-Bin Huang, Virginia Tech

Lecture 12 Recognition

Image classification Computer Vision Spring 2018, Lecture 18

Instance-level recognition part 2

Category Recognition. Jia-Bin Huang Virginia Tech ECE 6554 Advanced Computer Vision

Instance-level recognition II.

Local features and image matching. Prof. Xin Yang HUST

Instance-level recognition

Part based models for recognition. Kristen Grauman

Bundling Features for Large Scale Partial-Duplicate Web Image Search

Visual Recognition and Search April 18, 2008 Joo Hyun Kim

Texture. COS 429 Princeton University

Paper Presentation. Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013

Instance-level recognition

A Systems View of Large- Scale 3D Reconstruction

Large scale object/scene recognition

Local Image Features

Local Features and Bag of Words Models

From Structure-from-Motion Point Clouds to Fast Location Recognition

Compressed local descriptors for fast image and video search in large databases

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

6.819 / 6.869: Advances in Computer Vision

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58

CS 4495 Computer Vision. Segmentation. Aaron Bobick (slides by Tucker Hermans) School of Interactive Computing. Segmentation

Image Retrieval with a Visual Thesaurus

SEARCH BY MOBILE IMAGE BASED ON VISUAL AND SPATIAL CONSISTENCY. Xianglong Liu, Yihua Lou, Adams Wei Yu, Bo Lang

CS4670: Computer Vision

Efficient Representation of Local Geometry for Large Scale Object Retrieval

Visual words. Map high-dimensional descriptors to tokens/words by quantizing the feature space.

Hamming embedding and weak geometric consistency for large scale image search

Perception IV: Place Recognition, Line Extraction

Feature Matching and Robust Fitting

IMAGE MATCHING - ALOK TALEKAR - SAIRAM SUNDARESAN 11/23/2010 1

CLSH: Cluster-based Locality-Sensitive Hashing

Instance recognition and discovering patterns

ECS 189G: Intro to Computer Vision, Spring 2015 Problem Set 3

Three things everyone should know to improve object retrieval. Relja Arandjelović and Andrew Zisserman (CVPR 2012)

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

EECS150 - Digital Design Lecture 14 FIFO 2 and SIFT. Recap and Outline

CS 1674: Intro to Computer Vision. Midterm Review. Prof. Adriana Kovashka University of Pittsburgh October 10, 2016

Detection of Cut-And-Paste in Document Images

CMPSCI 670: Computer Vision! Grouping

Light-Weight Spatial Distribution Embedding of Adjacent Features for Image Search

Video Google faces. Josef Sivic, Mark Everingham, Andrew Zisserman. Visual Geometry Group University of Oxford

Scalable Recognition with a Vocabulary Tree

Lecture 15: Object recognition: Bag of Words models & Part based generative models

Local Image Features

Segmentation and Grouping April 19 th, 2018

Visual Word based Location Recognition in 3D models using Distance Augmented Weighting

Discriminative classifiers for image recognition

Introduction to Object Recognition & Bag of Words (BoW) Models

Grouping and Segmentation

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Scalable Recognition with a Vocabulary Tree

Large-scale Image Search and location recognition Geometric Min-Hashing. Jonathan Bidwell

LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS

Local Image Features

Binary SIFT: Towards Efficient Feature Matching Verification for Image Search

Min-Hashing and Geometric min-hashing

Transcription:

Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ 1

Object Recognition in Large Databases Some material for these slides comes from www.cs.utexas.edu/~grauman/courses/spring2011/slides/lecture18_index.pptx 2

Object Recognition We have seen methods to do object recognition by matching a query image to a database image: Extract features from each image Match their descriptors Impose a constraint (such as a homography or the fundamental matrix) to eliminate mismatches If a sufficient number of matches remain, we have found our object! 24 22 21 25 23 19 Problem: as the database gets large, the time it takes to match a new image against each database image can become prohibitive 24 2122 2519 23 20 20 3

Example Application: Location Recognition Match a new image to a database of images to determine where the camera was when it took the picture What uses can this have? 4

GPS not available indoors Indoor Localization 5 Chris Card, Qualitative Image Based Localization In A Large Building, MS Thesis, 2015

In a large building, there can be many locations that have a similar appearance Walls and floors often have little or no texture and doors look very similar Similarity of Appearance 6

Brown Hall Mapping (Chris Card) 1st, 2nd, and 3rd floors of Brown Hall 1,382 images taken at known locations Given a new image, match to an image in the database C. Card and W. Hoff, "Qualitative Image Based Localization in a Large Building," Proc. of 19th International Conference on Image Processing,, & Pattern Recognition, 2015 7

8

9

Approach Instead of comparing the query image to every image in the database, one at a time, first narrow down the search to a few likely images Then use a more detailed verification stage on those Candidate matching images 10

Approach Create an index of feature descriptors It is a table containing the features, along with the images where the features appeared Similar to the index in a text document We ll look at two methods: Hashing Bag of words 11

Hashing Transform a feature descriptor into a shorter key that indexes into a table Store the feature keypoint there, along with the image id that it came from Image 3 Hash table Feature 11, image 3 Feature 21, image 7 Image 7 Feature 65, image 7 12

Matching To match a query image, extract feature descriptors and map them into the hash table Retrieve the stored features (and their corresponding image id s) from those locations in the table Images with a high number of matching features are taken to be candidate matching images Problem: Image noise can perturb a feature descriptor so that it no longer exactly matches the feature descriptor from the corresponding database image This can cause the hash function to map the query descriptor to a completely different location in the hash table 13

Locality Sensitive Hashing (LSH) In LSH, the hash function preserves the locality of feature descriptors If two features are close in feature space then their hashes will also be close Difference between hashes is equivalent to distance in feature space [1] If noise perturbs a query feature descriptor, it will map to a location in the hash table that is close to the correct location So when mapping a query descriptor to the hash table, you should also retrieve entries from the table that are near to the mapped location [1] M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality sensitive hashing scheme based on p stable distributions, in Proceedings of the 20th Annual Symposium on Computational Geometry, pp. 253 262 (2004) 14

Example Features: ORB ORB consists of a feature detector and descriptor FAST is the feature detector BRIEF is the feature descriptor Descriptor is 32 bytes per feature SURF uses 64, SIFT uses 128 15

Feature Matching Ratio test When matching a query feature, the closest matching feature from one database image must be 80% closer then the second closest feature from the same database image (in feature space) Spatial consistency Neighbors of the query point (in image space) should have matches that are neighbors of the database point. 16

Verification The N database images with the highest number of matches are candidate images (N=2 in Card s work) Each candidate image is then sent through a verification step, by fitting a fundamental matrix using RANSAC 17

Evaluation 1,382 images; 1,073,903 feature points Test set of 70 images Vary the threshold for the number of inliers to the fundamental matrix For a threshold of 16: TPR is 94% FPR is 17% 18

Examples of True Positives Corresponding epipolar lines Query 19

Examples of True Positives Query 20

Examples of True Positives Query 21

Examples of True Positives Query 22

Example of False Negative Query 23

Example of False Positives Query 24

Example of False Positives Query 25

Example of False Positives Query 26

Bag of Words In a large database, there can be a lot of features to store Instead of storing all of them, we can quantize the descriptors into visual words The number of possible words (the vocabulary ) is relatively small Then, each image can be described by a histogram of the visual words (i.e., a bag of words 27

Visual words: main idea Extract some local features from a number of images e.g., SIFT descriptor space: each point is 128-dimensional Slide credit: D. Nister, CVPR 2006

Visual words: main idea

Visual words: main idea

Visual words: main idea

Each point is a local descriptor, e.g. SIFT vector.

Visual words Map high dimensional descriptors to tokens/words by quantizing the feature space Quantize via clustering, let cluster centers be the prototype words Word #2 Descriptor s feature space Determine which word to assign to each new image region by finding the closest cluster center. Kristen Grauman

Example: each group of patches belongs to the same visual word Visual words Figure from Sivic & Zisserman, ICCV 2003 Kristen Grauman

Similarity to textons First explored for texture and material representations Texton = cluster center of filter responses over collection of images Describe textures and materials based on distribution of prototypical texture elements. Leung & Malik 1999; Varma & Zisserman, 2002 Kristen Grauman

Texture representation example Windows with primarily horizontal edges Dimension 2 (mean d/dy value) Dimension 1 (mean d/dx value) Windows with small gradient in both directions Both Windows with primarily vertical edges mean d/dx value mean d/dy value Win. #1 4 10 Win.#2 18 7 Win.#9 20 20 statistics to summarize patterns in small windows Kristen Grauman

Inverted file index Database images are loaded into the index, mapping words to image numbers Kristen Grauman

Inverted file index A new query image is mapped to indices of database images that share a word. Kristen Grauman

Matching Images Once we have extracted visual words from a query image, how to find matching images in the database? One way is to simply look at the image id s of the matching features, and retrieve those images whose id s occurred most often (i.e., Chris Card s method) Another way is to look at the distribution (histogram) of the visual words in each image The histograms from the query image and the matching database image should be very similar

Analogy to documents Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the sensory, retinal image brain, was transmitted point by point visual, to centers perception, in the brain; the cerebral cortex was a movie screen, so to speak, upon which retinal, the image cerebral in the eye was cortex, projected. Through the discoveries eye, cell, of Hubel optical and Wiesel we now know that behind the origin of the visual perception in the brain nerve, there image is a considerably more complicated Hubel, course of Wiesel events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image. China is forecasting a trade surplus of $90bn ( 51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. China, The trade, figures are likely to further annoy surplus, the US, which commerce, has long argued that China's exports are unfairly helped by a deliberately exports, undervalued imports, yuan. Beijing US, agrees the surplus yuan, is too high, bank, but says domestic, the yuan is only one factor. Bank of China governor Zhou Xiaochuan said foreign, the country increase, also needed to do more to boost domestic trade, demand value so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value. ICCV 2005 short course, L. Fei-Fei

Bags of visual words Summarize entire image based on its distribution (histogram) of word occurrences. Analogous to bag of words representation commonly used for documents.

Comparing bags of words Rank frames by normalized scalar product between their (possibly weighted) occurrence counts nearest neighbor search for similar images. [1 8 1 4] [5 1 1 0],, d j q for vocabulary of V words Kristen Grauman

tf idf weighting Term frequency inverse document frequency Describe frame by frequency of each word within it, downweight words that appear often in the database (Standard weighting for text retrieval) Number of occurrences of word i in document d Number of words in document d Total number of documents in database Number of documents word i occurs in, in whole database Kristen Grauman

Bags of words for content-based image retrieval Sivic, Josef, and Andrew Zisserman. "Efficient visual search of videos cast as text retrieval." IEEE transactions on pattern analysis and machine intelligence 31.4 (2009): 591-606. Slide from Andrew Zisserman Sivic & Zisserman, ICCV 2003

Slide from Andrew Zisserman School of Mines Sivic &Colorado Zisserman, ICCV 2003

Additional Checks Stop words Create a stop list of the most common visual words These words are dropped from further consideration Spatial consistency For every matching feature, count the number of k = 15 nearest adjacent features that also match between the two documents This is added to the score 48

Video Google System 1. Collect all words within query region 2. Inverted file index to find relevant frames 3. Compare word counts 4. Spatial verification Sivic & Zisserman, ICCV 2003 Demo online at : http://www.robots.ox.ac.uk/~vgg/research /vgoogle/index.html Query region Retrieved frames K. Computer Grauman, Vision B. Leibe 49

Video Google System Query region Retrieved frames K. Computer Grauman, Vision B. Leibe 50

Scoring retrieval quality Query Database size: 10 images Relevant (total): 5 images Results (ordered): precision = #relevant / #returned recall = #relevant / #total relevant 1 0.8 precision 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 recall Slide credit: Ondrej Chum

Vocabulary Trees A very large vocabulary can be organized into a tree for greater efficiency Each descriptor vector is compared to several prototypes at a given level in the vocabulary tree and the branch with the closest prototype is selected for further refinement Only a few comparisons at each level are needed for quantizing each descriptor 52

Vocabulary Trees: hierarchical clustering for large vocabularies Tree construction: [Nister & Stewenius, CVPR 06] Slide credit: David Nister

54

Bags of words: pros and cons + flexible to geometry / deformations / viewpoint + compact summary of image content + provides vector representation for sets + very good results in practice basic model ignores geometry must verify afterwards, or encode via features background and foreground mixed when bag covers whole image optimal vocabulary formation remains unclear

Summary Matching local invariant features: useful not only to provide matches for multi view geometry, but also to find objects and scenes. Bag of words representation: quantize feature space to make discrete set of visual words Summarize image by distribution of words Index individual words Inverted index: pre compute index to enable faster search at query time