Bridging the Gap Between Local and Global Approaches for 3D Object Recognition. Isma Hadji G. N. DeSouza

Similar documents
Automatic Image Alignment (feature-based)

Efficient Surface and Feature Estimation in RGBD

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

CS 223B Computer Vision Problem Set 3

CRF Based Point Cloud Segmentation Jonathan Nation

3D object recognition used by team robotto

CS 231A Computer Vision (Fall 2012) Problem Set 3

SCALE INVARIANT FEATURE TRANSFORM (SIFT)

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Introduction. Introduction. Related Research. SIFT method. SIFT method. Distinctive Image Features from Scale-Invariant. Scale.

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Video Google: A Text Retrieval Approach to Object Matching in Videos

Instance-level recognition

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

CSC494: Individual Project in Computer Science

Instance-level recognition

Tri-modal Human Body Segmentation

Specular 3D Object Tracking by View Generative Learning

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Evaluation and comparison of interest points/regions

Building a Panorama. Matching features. Matching with Features. How do we build a panorama? Computational Photography, 6.882

TEXTURE CLASSIFICATION METHODS: A REVIEW

3D Object Recognition using Multiclass SVM-KNN

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

Recognizing Deformable Shapes. Salvador Ruiz Correa Ph.D. UW EE

Part-based and local feature models for generic object recognition

Feature Based Registration - Image Alignment

Classification of objects from Video Data (Group 30)

CS 231A Computer Vision (Fall 2011) Problem Set 4

Recognizing Deformable Shapes. Salvador Ruiz Correa UW Ph.D. Graduate Researcher at Children s Hospital

Outline 7/2/201011/6/

Toward Part-based Document Image Decoding

Fast Image Matching Using Multi-level Texture Descriptor

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

Sea Turtle Identification by Matching Their Scale Patterns

Dietrich Paulus Joachim Hornegger. Pattern Recognition of Images and Speech in C++

Image Processing. Image Features

Lecture 9: Hough Transform and Thresholding base Segmentation

PROBLEM FORMULATION AND RESEARCH METHODOLOGY

Classification Algorithms in Data Mining

A Keypoint Descriptor Inspired by Retinal Computation

Designing Applications that See Lecture 7: Object Recognition

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

Recognizing Deformable Shapes. Salvador Ruiz Correa (CSE/EE576 Computer Vision I)

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing

Introduction to SLAM Part II. Paul Robertson

Recognizing Deformable Shapes. Salvador Ruiz Correa Ph.D. Thesis, Electrical Engineering

Learning to Recognize Faces in Realistic Conditions

Computer Vision for HCI. Topics of This Lecture

Classification and Detection in Images. D.A. Forsyth

Object of interest discovery in video sequences

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi

Trademark Matching and Retrieval in Sport Video Databases

Gender Classification

/10/$ IEEE 4048

Articulated Pose Estimation with Flexible Mixtures-of-Parts

Topological Mapping. Discrete Bayes Filter

CHAPTER 3 ADAPTIVE DECISION BASED MEDIAN FILTER WITH FUZZY LOGIC

Improved Coding for Image Feature Location Information

Content based Image Retrieval Using Multichannel Feature Extraction Techniques

Partial Shape Matching using Transformation Parameter Similarity - Additional Material

Patch-based Object Recognition. Basic Idea

Multiple-Choice Questionnaire Group C

TA Section 7 Problem Set 3. SIFT (Lowe 2004) Shape Context (Belongie et al. 2002) Voxel Coloring (Seitz and Dyer 1999)

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Determinant of homography-matrix-based multiple-object recognition

Computer Vision with MATLAB MATLAB Expo 2012 Steve Kuznicki

Grammar Rule Extraction and Transfer in Buildings

The SIFT (Scale Invariant Feature

Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58

3D Keypoints Detection for Objects Recognition

Scalable Object Classification using Range Images

Scale Invariant Feature Transform

CAP 6412 Advanced Computer Vision

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Det De e t cting abnormal event n s Jaechul Kim

Why study Computer Vision?

Short Survey on Static Hand Gesture Recognition

Advanced Techniques for Mobile Robotics Bag-of-Words Models & Appearance-Based Mapping

Local Features: Detection, Description & Matching

Clustering in Registration of 3D Point Clouds

Bundling Features for Large Scale Partial-Duplicate Web Image Search

Implementing the Scale Invariant Feature Transform(SIFT) Method

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Hand Posture Recognition Using Adaboost with SIFT for Human Robot Interaction

Clustering Lecture 4: Density-based Methods

Real-time Object Detection CS 229 Course Project

Basic Problem Addressed. The Approach I: Training. Main Idea. The Approach II: Testing. Why a set of vocabularies?

Local features: detection and description. Local invariant features

Deep Learning for Robust Normal Estimation in Unstructured Point Clouds. Alexandre Boulch. Renaud Marlet

DA Progress report 2 Multi-view facial expression. classification Nikolas Hesse

Motion Tracking and Event Understanding in Video Sequences

Elsevier Editorial System(tm) for Information Sciences Manuscript Draft

Transcription:

Bridging the Gap Between Local and Global Approaches for 3D Object Recognition Isma Hadji G. N. DeSouza

Outline Introduction Motivation Proposed Methods: 1. LEFT keypoint Detector 2. LGS Feature Descriptor The LEFT-LGS combination Conclusion and Future Work

Introduction Object recognition algorithms, usually, rely on three main steps: Descriptors play the most critical step in the Object Recognition Pipeline The techniques adopted for object recognition can be divided into two main categories: Global versus Local

Introduction The techniques adopted for 3D object description can be divided into two main categories: Global Descriptors e.g: 1-NN Local Descriptors e.g: 1-NN e.g: CG Algotms.

Local Approaches

Global Approaches

Local Versus Global PROS? Local Approaches 1. No segmentation needed. 2. Local descriptors proved to be a more attractive within cluttered/occluded scenes. 3. Straightforward pose estimation. CONS? 1. less discriminating due to the limited scope of the local neighborhood. 2. Additional step required for bad correspondence rejection. PROS? Global Approaches 1. Global descriptors are usually more discriminating as they encode the entire structure of objects. 2. Simple Nearest Neighbor matching is enough to find correspondences. CONS? 1. Requires segmentation for scene 2. Verymuchaffected by occlusions and clutter. 3. Requires complicated algorithms for estimating poses.

The Local-to-Global Approach Solution? The Local-to-Global approach Why Local? Local in the sense that each feature vector describes a single keypoint, rather than the entire object Follows the local object recognition pipeline! Why Global? Globalas it looks beyond the local neighborhood (support regions) to detect and describe object Global detection and description

Keypoint Detection There are two main motivations for finding keypoints: 1. We want to reduce the computational overhead. 2. We only want to rely on distinctive parts of an object to recognize it. The main challenge when selecting keypoints is to maximize: 1. Distinctiveness. 2. Repeatability.

LEFT Detector - Motivation HOW? First estimate some low level features describing each point in the cloud. Traditionally keypoint detectors rely on local threshold-based algorithms. In this research, we question both: 1. The local characteristic of traditional 3D keypoint detectors. 2. The threshold based approaches.

LEFT Detector - Algorithm The proposed keypointdetection scheme was inspired on the way animals detect important information. The least frequent a specific shape is, the more representative it is of the object! 1. Low level features describing a 3D cloud of points or mesh are estimated for each point pi in the object. 2. Most similar features are grouped together Quantizing the feature space. 3. The histogram built in the previous step is sorted in ascending order The features are sorted from least to most frequent. 4. Points with least frequent feature occurrence in the object are selected as keypoints!

LEFT Experimental Evaluation In order to evaluate the repeatability of the keypointsdetected using LEFT we performed experiments using 4 benchmark datasets. a) The Retrieval Dataset b) The Stereo Dataset We also compare our results to 3 SOTA detectors : ISS, LSP and KPQ

LEFT Experimental Evaluation A good keypoint detector is characterized by a high repeatability. The steps followed for measuring repeatability are as follows: 1. Find keypointson both scene and model. 2. Rotate and translate the model based on the ground truth transformation. 3. Find correspondences between detected keypoints in the model and in the scene. 4. Calculate the relative repeatability as the number of correspondences divided by the total number of unoccluded keypoints detected in the model.

LEFT Performance LEFT repeatability for variable number of keypoints.

Feature Description The role of the descriptors is to encode the geometry of the object. In the case of feature descriptors the main challenges are: 1. Maximizing the discrimination power of the descriptors. 2. Robustness to occlusion and clutter.

LGS Descriptor - Motivation

LGS Descriptor - Motivation Dragon GFPFH Feature Vectors Armadillo Partial view of Armadillo

LGS Descriptor More specifically, the LGS was built around the following five main ideas: 1. Relying on surface point classification to capture the entire geometry of the object (global property). 2. Describing keypoints(local property), but using both local and global support regions grouped by the same surface class. 3. Using signatures to avoid loss of information and mitigate the effects of occlusion (local property). 4. Using distributions of L2-distances to encode the relationship between keypointand global support regions while increasing robustness to noise by eliminating the use of sensitive features such as surface normal (global property). 5. Using confidence on the relationship above to improve the matching during object recognition (local property).

LGS Descriptor Step 1 Point Classification To classify the points we use the radius-based surface classification proposed in the RSD descriptor. The original RSD used both the minimum and maximum radii to classify points as belonging to one of the geometric primitive shapes. In our implementation we chose to classify points from very sharp to very smooth. Why? The ability to find a pre-defined number of classes independently of the object considered i.e. we can vary the number of classes by simply varying the ranges of sharpness and smoothness. Using the minimal radii of each point on the object s surface, the algorithm splits the radii values into N different ranges, representing the N classes of the object s surfaces.

LGS Descriptor Step 2 Class Confidence Estimation LGS uses continuous ranges of radii values, therefore fuzzy regions may emerge. Fuzzy regions contain points that could belong to any of the two consecutive ranges Unstable Points Solution? We assign both a class and a membership to the class for each point in the cloud. A low confidence indicates that the point belongs to a fuzzy or noisy region. This determines when the algorithm gives high or low weights to the points used in the next steps.

LGS Descriptor Step 3 Signature Construction The LGS descriptor is constructed in a signature-based fashion using: Sorted in D clusters L2 distances and Confidences Concatenated D*N Feature Vector 3 K-neighbors in each one of the N classes (e.g. N=3).

LGS Descriptor Step 3 Signature Construction Another example: N=5 classes.

LGS Descriptor Step 4 LGS Descriptor Matching The confidences estimated in step 2 are used as weights during the matching stage, when the LGS computes the distance dijbetween a pair of signatures (i, j). The distance between each entry of the pair of signatures is multiplied by the corresponding minimum confidence. This allows LGS to reduce the effect of unstable points located on fuzzy regions. Mathematically, the weighted distance used to compare LGS descriptors is given by:

LGS Experimental Evaluation In order to evaluate the robustness of the LGS descriptor we performed experiments using the previous datasets. We also compare our results to SOTA descriptors. 3 local descriptors: (SI, FPFH, SHOT) and 2 Global ones (VFH, GFPFH). A good descriptor is characterized by a high percentage of correct matches. 1. Find and describe all keypointsin both the scene and the model. 2. For each feature vector in the scene find it s nearest neighbor in the model. 3. Find the true correspondences using the provided ground-truth transformation between model and scene. 4. Compare the correspondences from step 2 and step 3 and count total number of true/false matches.

LGS Experimental Evaluation Effect of the number of classes N used to construct the signature As expected: 1. A small number of classes does not capture enough of the variations in the structure of these objects. 2. Too many classes also leads to reduced discriminating power. Why? Possible miss-classification error

LGS Experimental Evaluation Classification Accuracy Again, as expected addingmore classes caused a drop in the classification accuracy, which ultimately caused a drop in the number of good correspondences.

LGS Experimental Evaluation Effect of the number of neighbors in each class Interestingly enough, we can see that the LGS descriptor is not too sensitive to the use of larger neighborhoods. We attribute this to the advantages of the use of a signature-based feature vector.

LGS Performance Quantitative Comparison of LGS with respect to Local SOTA 3D descriptors

LEFT-LGS Performance Quantitative Comparison of LEFT-LGS with respect to Local SOTA 3D descriptors

LEFT-LGS for Object Recognition A good Object Recognition algorithm should have a high TPR and a small FPR. 1. Find keypointsand their corresponding descriptors for both scene and models. 2. Find Potential matches using the a simple Nearest Neighbor matching. 3. Prune correspondences using a Correspondence Grouping algorithm! 4. Count the number of consistent correspondences to decide upon model presence or absence.

LEFT-LGS Compared to Local Descriptors LEFT-LGS combined in the object recognition pipeline compared to Local SOTA 3D descriptors

LEFT-LGS Compared to Global Descriptors We can compare the performance of LEFT-LGS to global descriptors directly in terms of correctly recognized objects. 1. Segment the scene and describe each segment. 2. For each segment descriptor find its closest model descriptor. 3. Confirm model presence or absence according to the ground truth scene annotation.

LEFT-LGS Compared to Global Descriptors LEFT-LGS combined in the object recognition pipeline compared to Global SOTA 3D descriptors.

Conclusion and Future Work This work? Built a bridge between Local and Global approaches used the field of Object Recognition. Introduced the concept of detecting keypoints globally LEFT Proposedahybrid descriptor using the advantages of local and global approaches while minimizing the shortcomings of each LGS Future Work? Make the detection stage completely independent of the segmentation step. Address the problem of surface misclassification. Devisea method for automatic selection of the number of classes N.

Thank you! QUESTIONS?