Real-time Scalable 6DOF Pose Estimation for Textureless Objects Zhe Cao 1, Yaser Sheikh 1, Natasha Kholgade Banerjee 2 1 Robotics Institute, Carnegie Mellon University, PA, USA 2 Department of Computer Science, Clarkson University, NY, USA 1
Object Pose Estimation for Robotic Manipulation Object detection is not enough 3D object pose estimation Manipulation task: robot image from Toyota America Research Center 2
Real-time GPU-based Pose Estimation for Textureless Objects Moving Camera RGB Stream Object Pose Estimation Result 3
4 Related Work Feature-based Pose Estimation Template-based Pose Estimation RGBD-based Pose Estimation Collet et al., 2011 Hinterstoisser et al., 2010 Song et al., 2014 Xie et al., 2013 Choi et al., 2012 Hinterstoisser et al., 2013
Model-based Object Pose Estimation for Textureless Object Camera Frame 3D Model Pose Estimation Result 5
Challenges in Model-based Method Viewpoint Variance Camera Frame Scale Variance Illumination 3D Model Variance 6
GPU-based Exhaustive Search for Viewpoint and Scale Scale Viewpoint 3D Model Camera Frame Rendered Image (Template) 7
Transformation Function for Illumination Robustness Transformation function: Scale f( ) =(f mvnorm f LoG )( ) Viewpoint Where, f mvnorm ( ) is the mean-variance normalization, f LoG ( ) is the Laplacian of Transformed Image 3D Model Guassian (LoG) transformation 8 Transformed Templates
Normalized Cross-correlation (NCC) CPU-based NCC [1] Sequential matching GPU-based NCC [2] Parallel over pixel Our Vectorized-NCC Parallel over templates Easy but slow Does not fully utilize the modern GPU Fastest Jp [1] Lewis J P. Fast normalized cross-correlation[c] Visionbinterface, 1995. [2] Babenko P, Shah M. MinGPU: a minimum GPU library for computer vision[j]. 2008
Template Matrix Construction Rendered Templates Normalized LoG Feature Vectorized Template Matrix T T 0 = t 0 1 t 0 2 t 0 n Viewpoint ti T 3D Model T 10
Image Patch Matrix Construction Image Pyramid Normalized LoG Feature Vectorized Image Patch Matrix Scale 1 P 0 = p 0 1 p 0 2 p 0 m P pi 11
Image Patch Matrix Construction Image Pyramid Normalized LoG Feature Vectorized Image Patch Matrix Scale 1 P 0 = p 0 1 p 0 2 p 0 m Scale 2 P Scale n pi 12
Vectorized Normalized Cross-correlation (VNCC) Score matrix S = Template matrix T x Image matrix P i i j j By reshaping the template set and the image, we reformulate large-scale template matching as one matrix product Our VNCC is 20 times faster than previous GPU-based NCC
Vectorized Normalized Cross-correlation (VNCC) Score matrix S = Template matrix T x Image matrix P i i j j Cross-correlation value between ti and pj Template ti Image patch pj
SVD-based Dimensionality Reduction Score matrix S = Template matrix T x Image matrix P i i j j To further speed up the computation, we perform SVD decomposition on template matrix: T = U * D * V T = A * Z Weights Bases 15
SVD-based Dimensionality Reduction Score matrix S = Template matrix T x Image matrix P i i j j To further speed up the computation, we perform SVD decomposition on template matrix: T = U * D * V T = A * Z Decrease the runtime by 25%! Weights Bases 16
RGB-based Pose Estimation Results 1. 2. computationally expensive 17
Real-time RGB-based Object Pose Estimation Response map of matched template over the image Matched template Detected object contours (multiple hypotheses) select pose hypotheses number
RGB-D Object Scale Prior Imposed object scale prior Camera projection based on depth value 19
Multiple Object Pose Estimation S = Template X Image Matrix Matrix
Multiple Object Pose Estimation Image Matrix 21
Multiple Object Pose Estimation Template Matrix 22
Multiple Object Pose Estimation S = Template X Image Matrix Matrix 23
Multiple Object Pose Estimation S = A X Z X Image Matrix Principal Component Analysis 24
Multiple Object Pose Estimation S = A X Z X Image Matrix Principal Component Analysis 25
RGB-D Object Pose Estimation Results Eggbox Duck toy 26
RGB-D Object Pose Estimation Color stream for the teapot and sugar bag Real-time 3D model alignment result
Application on Real Robot 28 Robot in Toyota America Research Center
Runtime for Different Number of Objects Runtime (ms) # objects VNCC-PCA VNCC DDT-3D [1] Linemod [2] one 26.3 34.4 55.1 119 250 200 VNCC-PCA VNCC DDT-3D two 27.4 40.7 70.2 218 five 32.9 66.2 107.4 535 ten 38.5 78.0 172.7 985 Runtime (ms) 150 100 fifteen 43.2 89.5 238.1 1388 50 [1] Rios-Cabrera et al. Discriminatively trained templates for 3d object detection: A real time scalable approach ICCV 2013. [2] Hinterstoisser et al. Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes ACCV 2012. 0 0 5 10 15 Number of objects Sub-linear increase 29
3D Mesh Model Dataset for Evaluation 10 textureless models 6 textured objects 13 models are created from Autodesk 123D Catch 3 models are from online repository 30
Accuracy comparison Average error in our dataset Pitch Roll Yaw X Y Runtime VNCC-5 2.71 5.43 6.35 5.27 4.28 162 ms Line2D-5 [5] 3.05 6.12 7.88 9.35 7.24 288 ms VNCC-PCA-5 2.92 5.56 6.42 5.43 4.47 119 ms Accuracy in ACCV12 dataset Method DDT-3d [1] Hintersoisser [2] VNCC- PCA-10 VNCC-10 VNCC-1 Linemod[3] Drost [4] Accuracy 97.2% 96.6% 96.0% 96.2% 84.2% 83.0% 79.3% 31
Future Work Patch-based Matching Non-rigid pose estimation: deformable objects and articulated objects Object tracking based on particle filtering 32