Learning Semantic Environment Perception for Cognitive Robots

Similar documents
Semantic RGB-D Perception for Cognitive Robots

Visual Perception for Robots

Manipulating a Large Variety of Objects and Tool Use in Domestic Service, Industrial Automation, Search and Rescue, and Space Exploration

3D Simultaneous Localization and Mapping and Navigation Planning for Mobile Robots in Complex Environments

Deep Learning for Visual Perception

Learning Depth-Sensitive Conditional Random Fields for Semantic Segmentation of RGB-D Images

Fast Semantic Segmentation of RGB-D Scenes with GPU-Accelerated Deep Neural Networks

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

Perceiving the 3D World from Images and Videos. Yu Xiang Postdoctoral Researcher University of Washington

Active Recognition and Manipulation of Simple Parts Exploiting 3D Information

3D Object Recognition and Scene Understanding from RGB-D Videos. Yu Xiang Postdoctoral Researcher University of Washington

Object Classification in Domestic Environments

Separating Objects and Clutter in Indoor Scenes

UNDERSTANDING complex scenes has gained much

Deep Incremental Scene Understanding. Federico Tombari & Christian Rupprecht Technical University of Munich, Germany

(Deep) Learning for Robot Perception and Navigation. Wolfram Burgard

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

Semantic Mapping and Reasoning Approach for Mobile Robotics

Learning from 3D Data

Multi-View 3D Object Detection Network for Autonomous Driving

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

Lecture 19: Depth Cameras. Visual Computing Systems CMU , Fall 2013

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus

Lecture 7: Semantic Segmentation

Field-of-view dependent registration of point clouds and incremental segmentation of table-tops using time-offlight

Object Segmentation and Tracking in 3D Video With Sparse Depth Information Using a Fully Connected CRF Model

3D Shape Analysis with Multi-view Convolutional Networks. Evangelos Kalogerakis

Dense Tracking and Mapping for Autonomous Quadrocopters. Jürgen Sturm

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

A Modular Software Framework for Eye-Hand Coordination in Humanoid Robots

Large-Scale Point Cloud Classification Benchmark

Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Indoor Object Recognition of 3D Kinect Dataset with RNNs

Human Body Recognition and Tracking: How the Kinect Works. Kinect RGB-D Camera. What the Kinect Does. How Kinect Works: Overview

Data-driven Depth Inference from a Single Still Image

3D object recognition used by team robotto

The Kinect Sensor. Luís Carriço FCUL 2014/15

Exploiting Depth Camera for 3D Spatial Relationship Interpretation

What is computer vision?

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Dense 3D Semantic Mapping of Indoor Scenes from RGB-D Images

Robots Towards Making Sense of 3D Data

arxiv: v1 [cs.cv] 31 Mar 2016

Computer Vision: Making machines see

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

Structured Models in. Dan Huttenlocher. June 2010

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

From 3D descriptors to monocular 6D pose: what have we learned?

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

Dense 3D Semantic Mapping of Indoor Scenes from RGB-D Images

CRF Based Point Cloud Segmentation Jonathan Nation

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Scanning and Printing Objects in 3D Jürgen Sturm

Learning-based Localization

Deformable Part Models

OpenStreetSLAM: Global Vehicle Localization using OpenStreetMaps

Real-Time Vision-Based State Estimation and (Dense) Mapping

Learning 6D Object Pose Estimation and Tracking

Robot Mapping. SLAM Front-Ends. Cyrill Stachniss. Partial image courtesy: Edwin Olson 1

What Happened to the Representations of Perception? Cornelia Fermüller Computer Vision Laboratory University of Maryland

CURFIL: A GPU library for Image Labeling with Random Forests

Turning an Automated System into an Autonomous system using Model-Based Design Autonomous Tech Conference 2018

Convolutional-Recursive Deep Learning for 3D Object Classification

RGB-D Object Recognition and Pose Estimation based on Pre-trained Convolutional Neural Network Features

Dealing with Scale. Stephan Weiss Computer Vision Group NASA-JPL / CalTech

Autonomous Mobile Robot Design

Qadeer Baig, Mathias Perrollaz, Jander Botelho Do Nascimento, Christian Laugier

Deep Neural Network Enhanced VSLAM Landmark Selection

Articulated Pose Estimation with Flexible Mixtures-of-Parts

Decomposing a Scene into Geometric and Semantically Consistent Regions

Efficient Surface and Feature Estimation in RGBD

Real-Time Human Pose Recognition in Parts from Single Depth Images

arxiv: v2 [cs.cv] 28 Sep 2016

Semantic Segmentation

Indoor Home Furniture Detection with RGB-D Data for Service Robots

Synscapes A photorealistic syntehtic dataset for street scene parsing Jonas Unger Department of Science and Technology Linköpings Universitet.

LAUROPE Six Legged Walking Robot for Planetary Exploration participating in the SpaceBot Cup

Generating Object Candidates from RGB-D Images and Point Clouds

3D Scene Understanding from RGB-D Images. Thomas Funkhouser

Final Project Report: Mobile Pick and Place

CAP 6412 Advanced Computer Vision

3D Geometric Computer Vision. Martin Jagersand Univ. Alberta Edmonton, Alberta, Canada

LEARNING TO SEGMENT MOVING OBJECTS IN VIDEOS FRAGKIADAKI ET AL. 2015

Scanning and Printing Objects in 3D

Removing Moving Objects from Point Cloud Scenes

08 An Introduction to Dense Continuous Robotic Mapping

CS 378: Autonomous Intelligent Robotics. Instructor: Jivko Sinapov

Presented at the FIG Congress 2018, May 6-11, 2018 in Istanbul, Turkey

EE631 Cooperating Autonomous Mobile Robots

Robotics Programming Laboratory

Learning and Recognizing Visual Object Categories Without First Detecting Features

Semantic Labeling of 3D Point Clouds with Object Affordance for Robot Manipulation

Chapter 3 Image Registration. Chapter 3 Image Registration

Loop detection and extended target tracking using laser data

Learning to Segment Object Candidates

Seminar Heidelberg University

Transcription:

Learning Semantic Environment Perception for Cognitive Robots Sven Behnke University of Bonn, Germany Computer Science Institute VI Autonomous Intelligent Systems

Some of Our Cognitive Robots Equipped with many sensors and DoFs Demonstration in complex scenarios MAV Soccer robot Service robot Exploration robot Picking robot 2

3D Environment Perception 3D laser scanner, dual wide-angle stereo cameras, ultrasound, Quad Core i7 Autonomous navigation close to structures 3D mapping [Droeschel et al. JFR 2016] 3

3D Mapping Registering 3D laser scans 4 [Droeschel et al. 2016]

Mobile Manipulation in Mars-like Environment 5

Autonomous Mission Execution 3D mapping, localization, mission and navigation planning 3D object perception and grasping 6 [Schwarz et al. Frontiers 2016]

Cognitive Service Robot Cosero 7

Table-top Analysis and Grasp Planning Detection of clusters above horizontal plane Two grasps (top, side) Flexible grasping of many unknown objects 8 [Stückler et al, Robotics and Autonomous Systems, 2013]

3D Mapping by RGB-D SLAM Modelling of shape and color distributions in voxels Local multiresolution Efficient registration of views on CPU Global optimization [Stückler, Behnke: Journal of Visual Communication and Image Representation 2013] 5cm 2,5cm Multi-camera SLAM [Stoucken] 9

Learning and Tracking Object Models Modeling of objects by RGB-D-SLAM Real-time registration with current RGB-D frame 10

Deformable RGB-D-Registration Based on Coherent Point Drift method [Myronenko & Song, PAMI 2010] Multiresolution Surfel Map allows real-time registration 11

Transformation of Poses on Object Derived from the deformation field [Stückler, Behnke, ICRA2014] 12

Grasp & Motion Skill Transfer [Stückler, Behnke, ICRA2014] 13

Tool use: Bottle Opener Tool tip perception Extension of arm kinematics Perception of crown cap Motion adaptation [Stückler, Behnke, Humanoids 2014] 14

Hierarchical Object Discovery trough Motion Segmentation Simultaneous object modeling and motion segmentation Inference of a segment hierarchy 15 [Stückler, Behnke: IJCAI 2013]

Semantic Mapping Pixel-wise classification of RGB-D images by random forests Compare color / depth of regions Size normalization 3D fusion through RGB-D SLAM Evaluation on NYU depth v2 [Stückler, Biresev, Behnke: IROS 2012] Ground truth Accuracy in % Ø Classes Ø Pixels Silberman et al. 2012 59,6 58,6 Segmentation Couprie et al. 2013 63,5 64,5 Random forest 65,0 68,1 3D-Fusion 66,8 16

Learning Depth-sensitive CRFs SLIC+depth super pixels Unary features: random forest Height feature Pairwise features Color contrast Vertical alignment Depth difference Normal differences Results: Random forest CRF prediction 17 Ground truth

Deep Learning Learning layered representations [Schulz; Behnke, KI 2012] 18

Object-class Segmentation [Schulz, Behnke, ESANN 2012] Class annotation per pixel Multi-scale input channels Evaluated on MSRC-9/21 and INRIA Graz-02 data sets 19 Input Output Truth

Object Detection in Natural Images Bounding box annotation Structured loss that directly maximizes overlap of the prediction with ground truth bounding boxes Evaluated on two of the Pascal VOC 2007 classes [Schulz, Behnke, ICANN 2014] 20

RGB-D Object-Class Segmentation Covering windows segmented with CNN Scale input according to depth, compute pixel hight RGB Depth Height Truth Output [Schulz, Höft, Behnke, ESANN 2015] 21

Neural Abstraction Pyramid Abstract features [Behnke, Rojas, IJCNN 1998] [Behnke, LNCS 2766, 2003] - Data-driven - Analysis - Feature extraction - Model-driven - Synthesis - Feature expansion - Grouping - Competition - Completion Signals 22

Iterative Image Interpretation Interpret most obvious parts first Use partial interpretation as context to resolve local ambiguities 23

Neural Abstraction Pyramid for RGB-D Video Object-class Segmentation Recursive computation is efficient for temporal integration Neural Abstraction Pyramid [Pavel, Schulz, Behnke, Neural Networks 2017] 24

Geometric and Semantic Features for RGB-D Object-class Segmentation New geometric feature: distance from wall Semantic features pretrained from ImageNet Both help significantly [Husain et al. RA-L 2016] RGB Truth DistWall OutWO OutWithDistWall 25

Semantic Segmentation Priors for Object Discovery Combine bottom-up object discovery and semantic priors Semantic segmentation used to classify color and depth superpixels Higher recall, more precise object borders [Garcia et al. ICPR 2016] 26

RGB-D Object Recognition and Pose Estimation Text [Schwarz, Schulz, Behnke, ICRA2015] 27

Canonical View, Colorization Objects viewed from different elevation Render canonical view Colorization based on distance from center vertical 28 [Schwarz, Schulz, Behnke, ICRA2015]

Pretrained Features Disentangle Data t-sne embedding [Schwarz, Schulz, Behnke ICRA2015] 29

Recognition Accuracy Improved both category and instance recognition Confusion: 1: pitcher / coffe mug 2: peach / sponge [Schwarz, Schulz, Behnke, ICRA2015] 30

Amazon Picking Challenge Large variety of objects Unordered in shelf or tote Picking and stowing tasks [Schwarz et al. ICRA 2017] 31

Deep Learning Semantic Segmentation Adapted from our segmentation of indoor scenes [Husain et al. RA-L 2016] [Schwarz et al. ICRA 2017] 32

DenseCap Object Detection [Schwarz et al. ICRA 2017] [Johnson et a CVPR 2016] 33

Combined Detection and Segmentation Detection x 34 Segmentation [Schwarz et al. IJRR 2017]

Stowing 35

Picking 36

NimbRo Picking APC 2016 Results 2 nd Place Stowing (186 points) 3 rd Place Picking (97 points) [Schwarz et al. IJRR 2017] 37

Detection of Tools 38 [Schwarz et al. IJRR 2017]

MBZIRC Challenge 2 39

Wrench Selection: Detection of Tool Ends 40

Amazon Robotics Challenge 2017 Training with rendered scenes 41

Conclusions Semantic perception is challenging Simple methods rely on strong assumptions Depth helps with segmentation, allows for size normalization, geometric features, shape descriptors Deep learning methods work well Transfer of features from large data sets Synthetic training Many open problems, e.g. total scene understanding, incorporating physics, 42

Questions? 43