Recognition with Bag-ofWords. (Borrowing heavily from Tutorial Slides by Li Fei-fei)

Similar documents
Pattern recognition (3)

By Suren Manvelyan,

Advanced Techniques for Mobile Robotics Bag-of-Words Models & Appearance-Based Mapping

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Object Classification for Video Surveillance

Distances and Kernels. Motivation

Lecture 14: Indexing with local features. Thursday, Nov 1 Prof. Kristen Grauman. Outline

CS 4495 Computer Vision Classification 3: Bag of Words. Aaron Bobick School of Interactive Computing

EECS 442 Computer vision. Object Recognition

Indexing local features and instance recognition May 16 th, 2017

Lecture 12 Visual recognition

Indexing local features and instance recognition May 14 th, 2015

Indexing local features and instance recognition May 15 th, 2018

Today. Main questions 10/30/2008. Bag of words models. Last time: Local invariant features. Harris corner detector: rotation invariant detection

Recognizing Object Instances. Prof. Xin Yang HUST

Recognizing object instances

Lecture 15: Object recognition: Bag of Words models & Part based generative models

Visual Navigation for Flying Robots. Structure From Motion

Introduction to Object Recognition & Bag of Words (BoW) Models

CS6670: Computer Vision

The bits the whirl-wind left out..

Recognizing object instances. Some pset 3 results! 4/5/2011. Monday, April 4 Prof. Kristen Grauman UT-Austin. Christopher Tosh.

Object Recognition with Invariant Features

Local invariant features

Patch Descriptors. CSE 455 Linda Shapiro

The SIFT (Scale Invariant Feature

Bag of Words Models. CS4670 / 5670: Computer Vision Noah Snavely. Bag-of-words models 11/26/2013

Feature Matching + Indexing and Retrieval

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Outline 7/2/201011/6/

Patch Descriptors. EE/CSE 576 Linda Shapiro

CAP 5415 Computer Vision Fall 2012

Instance recognition

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Scale Invariant Feature Transform

Scale Invariant Feature Transform

SCALE INVARIANT FEATURE TRANSFORM (SIFT)

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Image Features and Categorization. Computer Vision Jia-Bin Huang, Virginia Tech

Evaluation and comparison of interest points/regions

Image classification Computer Vision Spring 2018, Lecture 18

Local features: detection and description. Local invariant features

Image Features and Categorization. Computer Vision Jia-Bin Huang, Virginia Tech

Part-based and local feature models for generic object recognition

Computer Vision for HCI. Topics of This Lecture

EECS150 - Digital Design Lecture 14 FIFO 2 and SIFT. Recap and Outline

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

Local Features: Detection, Description & Matching

Beyond Bags of Features

Visual Object Recognition

Artistic ideation based on computer vision methods

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

Introduction. Introduction. Related Research. SIFT method. SIFT method. Distinctive Image Features from Scale-Invariant. Scale.

Local Feature Detectors

Local Features and Bag of Words Models

Local features and image matching. Prof. Xin Yang HUST

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

Feature Based Registration - Image Alignment

School of Computing University of Utah

Computer Vision. Recap: Smoothing with a Gaussian. Recap: Effect of σ on derivatives. Computer Science Tripos Part II. Dr Christopher Town

Local Image Features

Motion Estimation and Optical Flow Tracking

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Supervised learning. y = f(x) function

Part based models for recognition. Kristen Grauman

Descriptors for CV. Introduc)on:

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Selection of Scale-Invariant Parts for Object Class Recognition

Local features: detection and description May 12 th, 2015

Loose Shape Model for Discriminative Learning of Object Categories

Building a Panorama. Matching features. Matching with Features. How do we build a panorama? Computational Photography, 6.882

String distance for automatic image classification

Efficient Kernels for Identifying Unbounded-Order Spatial Features

Midterm Wed. Local features: detection and description. Today. Last time. Local features: main components. Goal: interest operator repeatability

2D Image Processing Feature Descriptors

Determining Patch Saliency Using Low-Level Context

Patch-based Object Recognition. Basic Idea

Beyond Bags of features Spatial information & Shape models

Improving Recognition through Object Sub-categorization

TA Section: Problem Set 4

CLASSIFICATION Experiments

Click to edit title style

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1

Implementing the Scale Invariant Feature Transform(SIFT) Method

Designing Applications that See Lecture 7: Object Recognition

Local Features Tutorial: Nov. 8, 04

Click to edit title style

bi-sift: Towards a semantically relevant local descriptor

Local Image Features

Multiple-Choice Questionnaire Group C

Discriminative classifiers for image recognition

Supervised learning. f(x) = y. Image feature

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications

Key properties of local features

Automatic Image Alignment (feature-based)

A Comparison of SIFT, PCA-SIFT and SURF

Transcription:

Recognition with Bag-ofWords (Borrowing heavily from Tutorial Slides by Li Fei-fei)

Recognition So far, we ve worked on recognizing edges Now, we ll work on recognizing objects We will use a bag-of-words approach

Object Bag of words

Analogy to documents Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal sensory, image was transmitted pointbrain, by point to visual centers in the brain; the cerebral cortex was visual, perception, a movie screen, so to speak, upon which the cerebral cortex, image inretinal, the eye was projected. Through the discoveries ofeye, Hubelcell, and Wiesel we now optical know that behind the origin of the visual nerve, image perception in the brain there is a considerably Hubel, Wiesel more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a stepwise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image. China is forecasting a trade surplus of $90bn ( 51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. The figures are likely to further China, trade, annoy the US, which has long argued that surplus, commerce, China's exports are unfairly helped by a deliberately undervalued yuan. Beijing exports, imports, US, agrees the surplus is too high, but says the yuan, bank, domestic, yuan is only one factor. Bank of China governor Zhou Xiaochuan said the country foreign, increase, also needed to do more to boost domestic trade, value demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value.

A clarification: definition of BoW Looser definition Independent features

A clarification: definition of BoW Looser definition Independent features Stricter definition Independent features histogram representation

learnin g recognition codewords dictionary feature detection & representation image representation category models (and/or) classifiers category decision

Representation 2. 1. feature detection & representation image representation 3. codewords dictionary

1.Feature detection and representation

1.Feature detection and representation Regular grid Vogel & Schiele, 2003 Fei-Fei & Perona, 2005

1.Feature detection and representation Regular grid Vogel & Schiele, 2003 Fei-Fei & Perona, 2005 Interest point detector Csurka, et al. 2004 Fei-Fei & Perona, 2005 Sivic, et al. 2005

1.Feature detection and representation Regular grid Interest point detector Vogel & Schiele, 2003 Fei-Fei & Perona, 2005 Csurka, Bray, Dance & Fan, 2004 Fei-Fei & Perona, 2005 Sivic, Russell, Efros, Freeman & Zisserman, 2005 Other methods Random sampling (Vidal-Naquet & Ullman, 2002) Segmentation based patches (Barnard, Duygulu, Forsyth, de Freitas, Blei, Jordan, 2003)

1.Feature detection and representation Compute SIFT descriptor [Lowe 99] Normalize patch Detect patches [Mikojaczyk and Schmid 02] [Mata, Chum, Urban & Pajdla, 02] [Sivic & Zisserman, 03] Slide credit: Josef Sivic

1.Feature detection and representation

2. Codewords dictionary formation

2. Codewords dictionary formation Vector quantization Slide credit: Josef Sivic

2. Codewords dictionary formation Fei-Fei et al. 2005

Image patch examples of codewords Sivic et al. 2005

frequency 3. Image representation.. codewords

Representation 2. 1. feature detection & representation image representation 3. codewords dictionary

One of the keys to success is a good representation of features Just pixels is a bad representation Pixel intensities are affected by a lot of different things Rotation, scaling, perspective Illumination changes Reordering of scenes We want a good way of characterizing image patches that is somewhat robust to these different effects

Scale-Invariant Local Features Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging parameters SIFT Features (slide from Lowe)

Advantages of invariant local features Locality: features are local, so robust to occlusion and clutter (no prior segmentation) Distinctiveness: individual features can be matched to a large database of objects Quantity: many features can be generated for even small objects Efficiency: close to real-time performance Extensibility: can easily be extended to wide range of differing feature types, with each adding robustness (slide from Lowe)

Think Back to Bag of Words Two Key Problems Problem 1: What parts of the image do I look at? Problem 2: How do I represent the patches of pixels

Build Scale-Space Pyramid All scales must be examined to identify scale-invariant features An efficient function is to compute the Difference of Gaussian (DOG) pyramid (Burt & Adelson, 1983) R e s a m p le Resam ple B lur Subtrac t B lu r S u b tra c t (slide from Lowe)

Scale space processed one octave at a time (slide from Lowe)

Key point localization Detect maxima and minima of difference-of-gaussian in scale space Resam ple B lur Subtrac t (slide from Lowe)

Sampling frequency for scale More points are found as sampling frequency increases, but accuracy of matching decreases after 3 scales/octave (slide from Lowe)

Select canonical orientation Create histogram of local gradient directions computed at selected scale Assign canonical orientation at peak of smoothed histogram Each key specifies stable 2D coordinates (x, y, scale, orientation) 0 2π (slide from Lowe)

Example of keypoint detection Threshold on value at DOG peak and on ratio of principle curvatures (Harris approach) (a) 233x189 image (b) 832 DOG extrema (c) 729 left after peak value threshold (d) 536 left after testing ratio of principle curvatures (slide from Lowe)

Detecting Keypoints is not always better (From L. Fei-Fei and Perona)

SIFT vector formation Thresholded image gradients are sampled over 16x16 array of locations in scale space Create array of orientation histograms 8 orientations x 4x4 histogram array = 128 dimensions (slide from Lowe)

Feature stability to noise Match features after random change in image scale & orientation, with differing levels of image noise Find nearest neighbor in database of 30,000 features (slide from Lowe)

Feature stability to affine change Match features after random change in image scale & orientation, with 2% image noise, and affine distortion Find nearest neighbor in database of 30,000 features (slide from Lowe)

Distinctiveness of features Vary size of database of features, with 30 degree affine change, 2% image noise Measure % correct for single nearest neighbor match (slide from Lowe)

Sony Aibo (Evolution Robotics) SIFT usage: Recognize charging station Communicate with visual cards

SIFT is just the beginning Authors have proposed more feature point detectors Authors have proposed other feature descriptors Harris-Laplace,. ColorSIFT SURF The Koen executable implements many of this

Learning and Recognition codewords dictionary category models (and/or) classifiers category decision

Learning and Recognition 1. 2. Generative method: - graphical models Discriminative method: - SVM category models (and/or) classifiers

Learning and Recognition 1. 2. Generative method: - graphical models Discriminative method: - SVM category models (and/or) classifiers

Discriminative methods based on bag of words representation Decision boundary Zebra Non-zebra

Discriminative methods based on bag of words representation Grauman & Darrell, 2005, 2006: SVM w/ Pyramid Match kernels Others Csurka, Bray, Dance & Fan, 2004 Serre & Poggio, 2005

learnin g recognition codewords dictionary feature detection & representation image representation category models (and/or) classifiers category decision

What about spatial info??

What about spatial info? Feature level Generative models Discriminative methods Lazebnik, Schmid & Ponce, 2006

Invariance issues Scale and rotation Implicit Detectors and descriptors Kadir and Brady. 2003

Invariance issues Scale and rotation Occlusion Implicit in the models Codeword distribution: small variations (In theory) Theme (z) distribution: different occlusion patterns

Invariance issues Scale and rotation Occlusion Translation Encode (relative) location information Sudderth, Torralba, Freeman & Willsky, 2005, 2006 Niebles & Fei-Fei, 2007

Invariance issues Scale and rotation Occlusion Translation View point (in theory) Codewords: detector and descriptor Theme distributions: different view points Fergus, Fei-Fei, Perona & Zisserman, 2005

Model properties Intuitive Analogy to documents Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal sensory, image was transmitted pointbrain, by point to visual centers in the brain; the cerebral cortex was visual, perception, a movie screen, so to speak, upon which the cerebral cortex, image inretinal, the eye was projected. Through the discoveries ofeye, Hubelcell, and Wiesel we now optical know that behind the origin of the visual image perception in thenerve, brain there is a considerably more complicated course of events. By Hubel, Wiesel following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a stepwise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image.

Model properties Intuitive Analogy to documents Analogy to human vision Olshausen and Field, 2004, Fei-Fei and Perona, 2005

Model properties Sivic, Russell, Efros, Freeman, Zisserman, 2005 Intuitive generative models Convenient for weakly- or unsupervised, incremental training Prior information Flexibility (e.g. HDP) Dataset Incremental model learning Classification Li, Wang & Fei-Fei, CVPR 2007

Model properties Intuitive generative models Discriminative method Computationally efficient Grauman et al. CVPR 2005

Model properties Intuitive generative models Discriminative method Learning and recognition relatively fast Compare to other methods

Weakness of the model No rigorous geometric information of the object components It s intuitive to most of us that objects are made of parts no such information Not extensively tested yet for View point invariance Scale invariance Segmentation and localization unclear