Semantic RGB-D Perception for Cognitive Robots

Similar documents
Learning Semantic Environment Perception for Cognitive Robots

Visual Perception for Robots

Manipulating a Large Variety of Objects and Tool Use in Domestic Service, Industrial Automation, Search and Rescue, and Space Exploration

Deep Learning for Visual Perception

Active Recognition and Manipulation of Simple Parts Exploiting 3D Information

Learning Depth-Sensitive Conditional Random Fields for Semantic Segmentation of RGB-D Images

3D Simultaneous Localization and Mapping and Navigation Planning for Mobile Robots in Complex Environments

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

Fast Semantic Segmentation of RGB-D Scenes with GPU-Accelerated Deep Neural Networks

Perceiving the 3D World from Images and Videos. Yu Xiang Postdoctoral Researcher University of Washington

3D Object Recognition and Scene Understanding from RGB-D Videos. Yu Xiang Postdoctoral Researcher University of Washington

CRF Based Point Cloud Segmentation Jonathan Nation

Deformable Part Models

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Object Classification in Domestic Environments

Human Body Recognition and Tracking: How the Kinect Works. Kinect RGB-D Camera. What the Kinect Does. How Kinect Works: Overview

Lecture 19: Depth Cameras. Visual Computing Systems CMU , Fall 2013

Multi-View 3D Object Detection Network for Autonomous Driving

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

UNDERSTANDING complex scenes has gained much

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

3D object recognition used by team robotto

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

Scanning and Printing Objects in 3D

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

3D Shape Analysis with Multi-view Convolutional Networks. Evangelos Kalogerakis

Machine Learning for Medical Image Analysis. A. Criminisi

(Deep) Learning for Robot Perception and Navigation. Wolfram Burgard

Semantic Mapping and Reasoning Approach for Mobile Robotics

Exploiting Depth Camera for 3D Spatial Relationship Interpretation

CAP 6412 Advanced Computer Vision

Efficient Surface and Feature Estimation in RGBD

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

The Kinect Sensor. Luís Carriço FCUL 2014/15

Robot Mapping. SLAM Front-Ends. Cyrill Stachniss. Partial image courtesy: Edwin Olson 1

Articulated Pose Estimation with Flexible Mixtures-of-Parts

Towards Semantic Scene Analysis with Time-of-Flight Cameras

Data-driven Depth Inference from a Single Still Image

Scanning and Printing Objects in 3D Jürgen Sturm

Computer Vision: Making machines see

Dynamic Time Warping for Binocular Hand Tracking and Reconstruction

Lecture 7: Semantic Segmentation

Deep Incremental Scene Understanding. Federico Tombari & Christian Rupprecht Technical University of Munich, Germany

Separating Objects and Clutter in Indoor Scenes

Dense Tracking and Mapping for Autonomous Quadrocopters. Jürgen Sturm

Indoor Home Furniture Detection with RGB-D Data for Service Robots

Finding Surface Correspondences With Shape Analysis

Deep Learning for Robust Normal Estimation in Unstructured Point Clouds. Alexandre Boulch. Renaud Marlet

Deep Learning for Remote Sensing

Real-Time Human Pose Recognition in Parts from Single Depth Images

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Tri-modal Human Body Segmentation

Learning and Recognizing Visual Object Categories Without First Detecting Features

Building Reliable 2D Maps from 3D Features

Object Class and Instance Recognition on RGB-D Data

3D Face and Hand Tracking for American Sign Language Recognition

Tracking People. Tracking People: Context

Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning

3D Shape Segmentation with Projective Convolutional Networks

Registration of Dynamic Range Images

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus

Simultaneous Localization and Mapping (SLAM)

arxiv: v1 [cs.cv] 31 Mar 2016

Scenario/Motivation. Introduction System Overview Surface Estimation Geometric Category and Model Applications Conclusions

INTELLIGENT AUTONOMOUS SYSTEMS LAB

3D Scanning. Qixing Huang Feb. 9 th Slide Credit: Yasutaka Furukawa

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

ECE 6554:Advanced Computer Vision Pose Estimation

RGB-D Object Recognition and Pose Estimation based on Pre-trained Convolutional Neural Network Features

CURFIL: A GPU library for Image Labeling with Random Forests

IN computer vision develop mathematical techniques in

Object Segmentation and Tracking in 3D Video With Sparse Depth Information Using a Fully Connected CRF Model

Learning to Segment Object Candidates

Human Upper Body Pose Estimation in Static Images

Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach

Removing Moving Objects from Point Cloud Scenes

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

What Happened to the Representations of Perception? Cornelia Fermüller Computer Vision Laboratory University of Maryland

A THREE LAYERED MODEL TO PERFORM CHARACTER RECOGNITION FOR NOISY IMAGES

Large-Scale Point Cloud Classification Benchmark

Chapter 3 Image Registration. Chapter 3 Image Registration

Visual Recognition and Search April 18, 2008 Joo Hyun Kim

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

3D Computer Vision. Depth Cameras. Prof. Didier Stricker. Oliver Wasenmüller

3D Terrain Sensing System using Laser Range Finder with Arm-Type Movable Unit

Indoor Object Recognition of 3D Kinect Dataset with RNNs

Learning from 3D Data

Automatic Photo Popup

3D Point Cloud Segmentation Using a Fully Connected Conditional Random Field

Robotics Programming Laboratory

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

SIMPLE ROOM SHAPE MODELING WITH SPARSE 3D POINT INFORMATION USING PHOTOGRAMMETRY AND APPLICATION SOFTWARE

Kinect Device. How the Kinect Works. Kinect Device. What the Kinect does 4/27/16. Subhransu Maji Slides credit: Derek Hoiem, University of Illinois

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling

Structured Models in. Dan Huttenlocher. June 2010

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Interactive Collision Detection for Engineering Plants based on Large-Scale Point-Clouds

Transcription:

Semantic RGB-D Perception for Cognitive Robots Sven Behnke Computer Science Institute VI Autonomous Intelligent Systems

Our Domestic Service Robots Dynamaid Cosero Size: 100-180 cm, weight: 30-35 kg 36 articulated joints PC, laser scanners, Kinect, microphone, 2

RoboCup 2013 Eindhoven 3

Analysis of Table-top Scenes and Grasp Planning Detection of clusters above horizontal plane Two grasps (top, side) Flexible grasping of many unknown objects [Stückler, Steffens, Holz, Behnke, Robotics and Autonomous Systems 2012] 4

3D-Mapping with Surfels 5

3D-Mapping with Surfels 6

3D-Mapping and Localization Registration of 3D laser scans Representation of point distributions in voxels Drivability assessment trough region growing Robust localization using 2D laser scans [Kläß, Stückler, Behnke: Robotik 2012] 7

3D Mapping by RGB-D SLAM Modelling of shape and color distributions in voxels Local multiresolution Efficient registration of views on CPU Global optimization [Stückler, Behnke: Journal of Visual Communication and Image Representation 2013] 2,5cm 5cm Multi-camera SLAM [Stoucken, Diplomarbeit 2013] 8

Learning and Tracking Object Models Modeling of objects by RGB-D-SLAM Real-time registration with current RGB-D image 9

Deformable RGB-D-Registration Based on Coherent Point Drift method [Myronenko & Song, PAMI 2010] Multiresolution Surfel Map allows real-time registration [Stückler, Behnke, ICRA2014] 10

Transformation of Poses on Object Derived from the deformation field [Stückler, Behnke, ICRA2014] 11

Grasp & Motion Skill Transfer Demonstration at RoboCup 2013 [Stückler, Behnke, ICRA2014] 12

Tool use: Bottle Opener Tool tip perception Extension of arm kinematics Perception of crown cap Motion adaptation [Stückler, Behnke, Humanoids 2014] 13

Picking Sausage, Bimanual Transport Perception of tool tip and sausage Alignment with main axis of sausage Our team NimbRo won the RoboCup@Home League in three consecutive years [Stückler, Behnke, Humanoids 2014] 14

Hierarchical Object Discovery trough Motion Segmentation Motion is strong segmentation cue Both camera and object motion Segment-wise registration of a sequence Inference of a segment hierarchy [Stückler, Behnke: IJCAI 2013] 15

Semantic Mapping Pixel-wise classification of RGB-D images by random forests Inner nodes compare color / depth of regions Size normalization Training and recall on GPU 3D fusion through RGB-D SLAM Evaluation on own data set and NYU depth v2 [Stückler, Biresev, Behnke: IROS 2012] Accuracy in % Ø Classes Ø Pixels Ground truth Segmentation Silberman et al. 2012 59,6 58,6 Couprie et al. 2013 63,5 64,5 Random forest 65,0 68,1 3D-Fusion 66,8 70,6 [Stückler et al., Journal of Real-Time Image Processing 2014] 16

Learning Depth-sensitive CRFs SLIC+depth super pixels Unary features: random forest Height feature Pairwise features Color contrast Vertical alignment Depth difference Normal differences Results: similarity between superpixel normals Random forest CRF prediction Ground truth [Müller and Behnke, ICRA 2014] 17

Object Class Detection in RGB-D Hough forests make not only object class decision, but describe object center RGB-D objects data set Color and depth features Training with rendered scenes Detection of object position and orientation Scene Class prob. Object centers Orientation Detected objects Depth helps a lot [Badami, Stückler, Behnke: SPME 2013] 18

Bin Picking Known objects in transport box Matching of graphs of 2D and 3D shape primitives 3D 2D Grasp and motion planning Offline Online [Nieuwenhuisen et al.: ICRA 2013] 19

Learning of Object Models Scan multiple objects in different poses Find support plane and remove it Segment views Register views using ICP Recognize geometric primitives Registered views Surface reconstruction Detected primitives 20

[Holz et al. STAR 2014] Active Object Perception

Active Object Perception Detected cylinders Partial occlusions Detected object Efficient exploration of the part arrangement in the transport boxes to handle occlusions [Holz et al. STAR 2014]

Active Object Perception Next best view Efficient exploration of the part arrangement in the transport boxes to handle occlusions

Active Object Perception Two more detected objects Efficient exploration of the part arrangement in the transport boxes to handle occlusions [Holz et al. STAR 2014]

Industrial Application: Depalettizing Using work space RGB-D camera Initial pose of transport box roughly known Detect dominant horizontal plane above ground Cluster points above support plane Estimate main axes [Holz et al. IROS 2015] 25

Object View Registration Wrist RGB-D camera moved above innermost object candidate Object views are represented as Multiresolution Surfel Map Registration of object view with current measurements using soft assignments Verification based on registration quality [Holz et al. IROS 2015] 26

Part Detection and Grasping [Holz et al. IROS 2015] 27

Depalletizing Results: 10 Runs Total time Component times and success rates [Holz et al. IROS 2015] 28

Part Verification Results Parts used for verification Detection confidences [Holz et al. IROS 2015] 29

Different Lighting Conditions Artificial light and day light Only daylight Low light In all cases, the palette was successfully cleared. [Holz et al. IROS 2015] 30

Deep Learning [Schulz and Behnke, KI 2012] 31

GPU Implementations (CUDA) Affordable parallel computers General-purpose programming Convolutional [Scherer & Behnke, 2009] Local connectivity [Uetz & Behnke, 2009] 32

Image Categorization: NORB 10 categories, jittered-cluttered Max-Pooling, cross-entropy training Test error: 5,6% (LeNet7: 7.8%) [Scherer, Müller, Behnke, ICANN 10] 33

Image Categorization: LabelMe 50,000 color images (256x256) 12 classes + clutter (50%) Error TRN: 3.77%; TST: 16.27% Recall: 1,356 images/s [Uetz, Behnke, ICIS2009] 34

Object-class Segmentation Class annotation per pixel [Schulz, Behnke 2012] Multi-scale input channels Evaluated on MSRC-9/21 and INRIA Graz-02 data sets Input Output Truth Input Output Truth 35

Object Detection in Images Bounding box annotation Structured loss that directly maximizes overlap of the prediction with ground truth bounding boxes Evaluated on two of the Pascal VOC 2007 classes [Schulz, Behnke, ICANN 2014] 36

RGB-D Object-Class Segmentation Scale input according to depth Compute pixel height NYU Depth V2 RGB Depth Height Truth Output [Schulz, Höft, Behnke, ESANN 2015] 37

Neural Abstraction Pyramid [Behnke, LNCS 2766, 2003] Abstract features - Data-driven - Analysis - Feature extraction - Model-driven - Synthesis - Feature expansion - Grouping - Competition - Completion Signals 38

Iterative Interpretation Interpret most obvious parts first [Behnke, LNCS 2766, 2003] Use partial interpretation as context to resolve local ambiguities 39

Local Recurrent Connectivity Lateral projection Backward projection Hyper column Forward projection Feature map Cell Layer Processor element Layer Projections Output Layer Hyper neighborhood Less abstract [Behnke, LNCS 2766, 2003] More abstract 40

Neural Abstraction Pyramid for RGB-D Video Object-class Segmentation NYU Depth V2 contains RGB-D video sequences Recursive computation is efficient for temporal integration RGB Depth Output Truth [Pavel, Schulz, Behnke, IJCNN 2015] 41

Geometric and Semantic Features for RGB-D Object-class Segmentation New geometric feature: distance from wall Semantic features pretrained from ImageNet Both help significantly [Husain et al. under review] RGB Truth DistWall OutWO OutWithDist 42

Semantic Segmentation Priors for Object Discovery Combine bottomup object discovery and semantic priors Semantic segmentation used to classify color and depth superpixels Higher recall, more precise object borders [Garcia et al. under review] 43

RGB-D Object Recognition and Pose Estimation Use pretrained features from ImageNet [Schwarz, Schulz, Behnke, ICRA2015] 44

Canonical View, Colorization Objects viewed from different elevation Render canonical view Colorization based on distance from center vertical [Schwarz, Schulz, Behnke, ICRA2015] 45

Features Disentangle Data t-sne embedding [Schwarz, Schulz, Behnke ICRA2015] 46

Recognition Accuracy Improved both category and instance recognition Confusion 1: pitcher / coffe mug 2: peach / sponge [Schwarz, Schulz, Behnke, ICRA2015] 47

Conclusion Semantic perception in everyday environments is challenging Simple methods rely on strong assumptions (e.g. support plane) Depth helps with segmentation, allows for size normalization, geometric features, shape descriptors Deep learning methods work well Transfer of features from large data sets Many open problems, e.g. total scene understanding, incorporating physics, 48

Thanks for your attention! Questions? 49