Real-time Scalable 6DOF Pose Estimation for Textureless Objects

Similar documents
Detection and Fine 3D Pose Estimation of Texture-less Objects in RGB-D Images

Learning 6D Object Pose Estimation and Tracking

Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd Supplementary Material

BOP: Benchmark for 6D Object Pose Estimation

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

arxiv: v1 [cs.cv] 11 Nov 2017

Perceiving the 3D World from Images and Videos. Yu Xiang Postdoctoral Researcher University of Washington

Planning, Execution and Learning Application: Examples of Planning in Perception

S7316: Real-Time Robotics Control and Simulation for Deformable Terrain Applications Using the GPU

Gesture Recognition: Hand Pose Estimation. Adrian Spurr Ubiquitous Computing Seminar FS

CS201: Computer Vision Introduction to Tracking

Face Recognition using Tensor Analysis. Prahlad R. Enuganti

Object Classification in Domestic Environments

Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach

3D Object Recognition and Scene Understanding from RGB-D Videos. Yu Xiang Postdoctoral Researcher University of Washington

Face Alignment Across Large Poses: A 3D Solution

Direct Matrix Factorization and Alignment Refinement: Application to Defect Detection

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Announcements. Recognition I. Gradient Space (p,q) What is the reflectance map?

Unsupervised learning in Vision

Direct Methods in Visual Odometry

Is 2D Information Enough For Viewpoint Estimation? Amir Ghodrati, Marco Pedersoli, Tinne Tuytelaars BMVC 2014

Pose Estimation of Kinematic Chain Instances via Object Coordinate Regression

Fitting (LMedS, RANSAC)

Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes

PERFORMANCE CAPTURE FROM SPARSE MULTI-VIEW VIDEO

Object Detection by Point Feature Matching using Matlab

Efficient Grasping from RGBD Images: Learning Using a New Rectangle Representation. Yun Jiang, Stephen Moseson, Ashutosh Saxena Cornell University

Articulated Pose Estimation with Flexible Mixtures-of-Parts

GPU Based Face Recognition System for Authentication

Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based Visual Servo

Object and Class Recognition I:

Local Image Features

Category vs. instance recognition

Recognition: Face Recognition. Linda Shapiro EE/CSE 576

CVPR 2014 Visual SLAM Tutorial Kintinuous

William Yang Group 14 Mentor: Dr. Rogerio Richa Visual Tracking of Surgical Tools in Retinal Surgery using Particle Filtering

On the Dimensionality of Deformable Face Models

Direct Plane Tracking in Stereo Images for Mobile Navigation

Robotics Programming Laboratory

COMP 102: Computers and Computing

A REAL-TIME TRACKING SYSTEM COMBINING TEMPLATE-BASED AND FEATURE-BASED APPROACHES

Occlusion Detection of Real Objects using Contour Based Stereo Matching

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Detecting Object Instances Without Discriminative Features

6-DOF Model Based Tracking via Object Coordinate Regression

Deep Learning for Virtual Shopping. Dr. Jürgen Sturm Group Leader RGB-D

Large-Scale Face Manifold Learning

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. X, XXXXX

Object Category Detection. Slides mostly from Derek Hoiem

Learning 3D Part Detection from Sparsely Labeled Data: Supplemental Material

Collision Detection. Jane Li Assistant Professor Mechanical Engineering & Robotics Engineering

Discriminate Analysis

Talk plan. 3d model. Applications: cultural heritage 5/9/ d shape reconstruction from photographs: a Multi-View Stereo approach

Computer Vision for HCI. Topics of This Lecture

Fitting a Single Active Appearance Model Simultaneously to Multiple Images

Applying Synthetic Images to Learning Grasping Orientation from Single Monocular Images

The Hilbert Problems of Computer Vision. Jitendra Malik UC Berkeley & Google, Inc.

2D vs. 3D Deformable Face Models: Representational Power, Construction, and Real-Time Fitting

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction

Lecture 10 Multi-view Stereo (3D Dense Reconstruction) Davide Scaramuzza

From 3D descriptors to monocular 6D pose: what have we learned?

Face Tracking : An implementation of the Kanade-Lucas-Tomasi Tracking algorithm

Improving Vision-based Topological Localization by Combining Local and Global Image Features

Computational Foundations of Cognitive Science

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction

SCALE INVARIANT TEMPLATE MATCHING

Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image - Supplementary Material -

Applications Video Surveillance (On-line or off-line)

Real-Time Monocular Pose Estimation of 3D Objects using Temporally Consistent Local Color Histograms

Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd

Task analysis based on observing hands and objects by vision

AUTOMATIC 3D HUMAN ACTION RECOGNITION Ajmal Mian Associate Professor Computer Science & Software Engineering

A Comparison of SIFT, PCA-SIFT and SURF

Translation Symmetry Detection: A Repetitive Pattern Analysis Approach

Augmented Reality, Advanced SLAM, Applications

Object and Action Detection from a Single Example

A Novel Hand Posture Recognition System Based on Sparse Representation Using Color and Depth Images

Dense 3D Reconstruction. Christiano Gava

The SIFT (Scale Invariant Feature

Nonrigid Surface Modelling. and Fast Recovery. Department of Computer Science and Engineering. Committee: Prof. Leo J. Jia and Prof. K. H.

Articulated Structure from Motion through Ellipsoid Fitting

Near Real-time Object Detection in RGBD Data

Human pose estimation using Active Shape Models

Dense Image-based Motion Estimation Algorithms & Optical Flow

Multiple-Choice Questionnaire Group C

Category-level localization

Edge-based Template Matching and Tracking for Perspectively Distorted Planar Objects

AAM Based Facial Feature Tracking with Kinect

Part based models for recognition. Kristen Grauman

Online Feature Selection Using Mutual Information for Real-Time Multi-View Object Tracking

Dense 3D Reconstruction. Christiano Gava

Data-driven Approaches to Simulation (Motion Capture)

Object Recognition with Invariant Features

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Skinning Mesh Animations

Photo-Consistency Based Registration of an Uncalibrated Image Pair to a 3D Surface Model Using Genetic Algorithm

VISUAL POSE ESTIMATION FOR A MOBILE MANIPULATOR

A consumer level 3D object scanning device using Kinect for web-based C2C business

Point Pair Feature based Object Detection for Random Bin Picking

Transcription:

Real-time Scalable 6DOF Pose Estimation for Textureless Objects Zhe Cao 1, Yaser Sheikh 1, Natasha Kholgade Banerjee 2 1 Robotics Institute, Carnegie Mellon University, PA, USA 2 Department of Computer Science, Clarkson University, NY, USA 1

Object Pose Estimation for Robotic Manipulation Object detection is not enough 3D object pose estimation Manipulation task: robot image from Toyota America Research Center 2

Real-time GPU-based Pose Estimation for Textureless Objects Moving Camera RGB Stream Object Pose Estimation Result 3

4 Related Work Feature-based Pose Estimation Template-based Pose Estimation RGBD-based Pose Estimation Collet et al., 2011 Hinterstoisser et al., 2010 Song et al., 2014 Xie et al., 2013 Choi et al., 2012 Hinterstoisser et al., 2013

Model-based Object Pose Estimation for Textureless Object Camera Frame 3D Model Pose Estimation Result 5

Challenges in Model-based Method Viewpoint Variance Camera Frame Scale Variance Illumination 3D Model Variance 6

GPU-based Exhaustive Search for Viewpoint and Scale Scale Viewpoint 3D Model Camera Frame Rendered Image (Template) 7

Transformation Function for Illumination Robustness Transformation function: Scale f( ) =(f mvnorm f LoG )( ) Viewpoint Where, f mvnorm ( ) is the mean-variance normalization, f LoG ( ) is the Laplacian of Transformed Image 3D Model Guassian (LoG) transformation 8 Transformed Templates

Normalized Cross-correlation (NCC) CPU-based NCC [1] Sequential matching GPU-based NCC [2] Parallel over pixel Our Vectorized-NCC Parallel over templates Easy but slow Does not fully utilize the modern GPU Fastest Jp [1] Lewis J P. Fast normalized cross-correlation[c] Visionbinterface, 1995. [2] Babenko P, Shah M. MinGPU: a minimum GPU library for computer vision[j]. 2008

Template Matrix Construction Rendered Templates Normalized LoG Feature Vectorized Template Matrix T T 0 = t 0 1 t 0 2 t 0 n Viewpoint ti T 3D Model T 10

Image Patch Matrix Construction Image Pyramid Normalized LoG Feature Vectorized Image Patch Matrix Scale 1 P 0 = p 0 1 p 0 2 p 0 m P pi 11

Image Patch Matrix Construction Image Pyramid Normalized LoG Feature Vectorized Image Patch Matrix Scale 1 P 0 = p 0 1 p 0 2 p 0 m Scale 2 P Scale n pi 12

Vectorized Normalized Cross-correlation (VNCC) Score matrix S = Template matrix T x Image matrix P i i j j By reshaping the template set and the image, we reformulate large-scale template matching as one matrix product Our VNCC is 20 times faster than previous GPU-based NCC

Vectorized Normalized Cross-correlation (VNCC) Score matrix S = Template matrix T x Image matrix P i i j j Cross-correlation value between ti and pj Template ti Image patch pj

SVD-based Dimensionality Reduction Score matrix S = Template matrix T x Image matrix P i i j j To further speed up the computation, we perform SVD decomposition on template matrix: T = U * D * V T = A * Z Weights Bases 15

SVD-based Dimensionality Reduction Score matrix S = Template matrix T x Image matrix P i i j j To further speed up the computation, we perform SVD decomposition on template matrix: T = U * D * V T = A * Z Decrease the runtime by 25%! Weights Bases 16

RGB-based Pose Estimation Results 1. 2. computationally expensive 17

Real-time RGB-based Object Pose Estimation Response map of matched template over the image Matched template Detected object contours (multiple hypotheses) select pose hypotheses number

RGB-D Object Scale Prior Imposed object scale prior Camera projection based on depth value 19

Multiple Object Pose Estimation S = Template X Image Matrix Matrix

Multiple Object Pose Estimation Image Matrix 21

Multiple Object Pose Estimation Template Matrix 22

Multiple Object Pose Estimation S = Template X Image Matrix Matrix 23

Multiple Object Pose Estimation S = A X Z X Image Matrix Principal Component Analysis 24

Multiple Object Pose Estimation S = A X Z X Image Matrix Principal Component Analysis 25

RGB-D Object Pose Estimation Results Eggbox Duck toy 26

RGB-D Object Pose Estimation Color stream for the teapot and sugar bag Real-time 3D model alignment result

Application on Real Robot 28 Robot in Toyota America Research Center

Runtime for Different Number of Objects Runtime (ms) # objects VNCC-PCA VNCC DDT-3D [1] Linemod [2] one 26.3 34.4 55.1 119 250 200 VNCC-PCA VNCC DDT-3D two 27.4 40.7 70.2 218 five 32.9 66.2 107.4 535 ten 38.5 78.0 172.7 985 Runtime (ms) 150 100 fifteen 43.2 89.5 238.1 1388 50 [1] Rios-Cabrera et al. Discriminatively trained templates for 3d object detection: A real time scalable approach ICCV 2013. [2] Hinterstoisser et al. Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes ACCV 2012. 0 0 5 10 15 Number of objects Sub-linear increase 29

3D Mesh Model Dataset for Evaluation 10 textureless models 6 textured objects 13 models are created from Autodesk 123D Catch 3 models are from online repository 30

Accuracy comparison Average error in our dataset Pitch Roll Yaw X Y Runtime VNCC-5 2.71 5.43 6.35 5.27 4.28 162 ms Line2D-5 [5] 3.05 6.12 7.88 9.35 7.24 288 ms VNCC-PCA-5 2.92 5.56 6.42 5.43 4.47 119 ms Accuracy in ACCV12 dataset Method DDT-3d [1] Hintersoisser [2] VNCC- PCA-10 VNCC-10 VNCC-1 Linemod[3] Drost [4] Accuracy 97.2% 96.6% 96.0% 96.2% 84.2% 83.0% 79.3% 31

Future Work Patch-based Matching Non-rigid pose estimation: deformable objects and articulated objects Object tracking based on particle filtering 32