Tactile-Visual Integration for Task-Aware Grasping. Mabel M. Zhang, Andreas ten Pas, Renaud Detry, Kostas Daniilidis

Similar documents
String distance for automatic image classification

Multi-view Stereo. Ivo Boyadzhiev CS7670: September 13, 2011

Deep Incremental Scene Understanding. Federico Tombari & Christian Rupprecht Technical University of Munich, Germany

Learning Semantic Environment Perception for Cognitive Robots

Object Recognition. Lecture 11, April 21 st, Lexing Xie. EE4830 Digital Image Processing

The Kinect Sensor. Luís Carriço FCUL 2014/15

Efficient Surface and Feature Estimation in RGBD

Feature Point Extraction using 3D Separability Filter for Finger Shape Recognition

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016

3D Photography: Active Ranging, Structured Light, ICP

ECE 172A: Introduction to Intelligent Systems: Machine Vision, Fall Midterm Examination

CS4495/6495 Introduction to Computer Vision

Visual Perception for Robots

3D object recognition used by team robotto

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

3D Point Cloud Segmentation Using a Fully Connected Conditional Random Field

3D Modeling of Objects Using Laser Scanning

3D Photography: Stereo

3D Perception. CS 4495 Computer Vision K. Hawkins. CS 4495 Computer Vision. 3D Perception. Kelsey Hawkins Robotics

3D Point Cloud Segmentation Using a Fully Connected Conditional Random Field

Improving Vision-based Topological Localization by Combining Local and Global Image Features

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

Dense 3D Reconstruction. Christiano Gava

Dense 3D Reconstruction. Christiano Gava

From 3D descriptors to monocular 6D pose: what have we learned?

Task analysis based on observing hands and objects by vision

Beyond Bags of Features

Indoor Object Recognition of 3D Kinect Dataset with RNNs

Martian lava field, NASA, Wikipedia

Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning

3D Object Representations. COS 526, Fall 2016 Princeton University

Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor Supplemental Document

Perceiving the 3D World from Images and Videos. Yu Xiang Postdoctoral Researcher University of Washington

Processing 3D Surface Data

Cascade Region Regression for Robust Object Detection

3D Shape Analysis with Multi-view Convolutional Networks. Evangelos Kalogerakis

Planning, Execution and Learning Application: Examples of Planning in Perception

Evaluation of GIST descriptors for web scale image search

AUTOMATIC 3D HUMAN ACTION RECOGNITION Ajmal Mian Associate Professor Computer Science & Software Engineering

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

High-Fidelity Augmented Reality Interactions Hrvoje Benko Researcher, MSR Redmond

CRF Based Point Cloud Segmentation Jonathan Nation

Content Based Image Retrieval

Semantic Labeling of 3D Point Clouds with Object Affordance for Robot Manipulation

Efficient Grasping from RGBD Images: Learning Using a New Rectangle Representation. Yun Jiang, Stephen Moseson, Ashutosh Saxena Cornell University

Is 2D Information Enough For Viewpoint Estimation? Amir Ghodrati, Marco Pedersoli, Tinne Tuytelaars BMVC 2014

Shape Matching for 3D Retrieval and Recognition. Agenda. 3D collections. Ivan Sipiran and Benjamin Bustos

LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS

An Exploration of Computer Vision Techniques for Bird Species Classification

Background subtraction in people detection framework for RGB-D cameras

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Automatic Colorization of Grayscale Images

An efficient alternative approach for home furniture detection and localization by an autonomous mobile robot

L2 Data Acquisition. Mechanical measurement (CMM) Structured light Range images Shape from shading Other methods

Category vs. instance recognition

arxiv: v1 [cs.ro] 11 Jul 2016

Multiple Kernel Learning for Emotion Recognition in the Wild

Convolutional-Recursive Deep Learning for 3D Object Classification

Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting

Lecture 12 Recognition. Davide Scaramuzza

A framework for visual servoing

Processing 3D Surface Data

HISTOGRAMS OF ORIENTATIO N GRADIENTS

Tri-modal Human Body Segmentation

3D Computer Vision 1

A New Algorithm for Shape Detection

High-Level Computer Vision

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

WP1: Video Data Analysis

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

Describing 3D Geometric Primitives Using the Gaussian Sphere and the Gaussian Accumulator

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

Image Classification pipeline. Lecture 2-1

Part-Based Models for Object Class Recognition Part 3

Processing 3D Surface Data

Spatial Localization and Detection. Lecture 8-1

Semantic RGB-D Perception for Cognitive Robots

Filtering and mapping systems for underwater 3D imaging sonar

FOREGROUND DETECTION ON DEPTH MAPS USING SKELETAL REPRESENTATION OF OBJECT SILHOUETTES

CIS 467/602-01: Data Visualization

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients

Human Pose Estimation with Deep Learning. Wei Yang

CITS 4402 Computer Vision

(Deep) Learning for Robot Perception and Navigation. Wolfram Burgard

Object Detection Based on Deep Learning

Operation of machine vision system

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials

Object Category Detection. Slides mostly from Derek Hoiem

Towards Robotic Garment Folding: A Vision Approach for Fold Detection

Adaptive Action Detection

3D Models and Matching

BIN PICKING APPLICATIONS AND TECHNOLOGIES

RGBD Face Detection with Kinect Sensor. ZhongJie Bi

Kinsight: Localizing and Tracking Household Objects using Depth-Camera Sensors

3D Models and Matching

Dynamic Time Warping for Binocular Hand Tracking and Reconstruction

Semantic Mapping and Reasoning Approach for Mobile Robotics

Part-based and local feature models for generic object recognition

Lecture 12 Recognition

Ensemble of Bayesian Filters for Loop Closure Detection

Transcription:

Tactile-Visual Integration for Task-Aware Grasping Mabel M. Zhang, Andreas ten Pas, Renaud Detry, Kostas Daniilidis

Molyneux s Question William Molyneux s famous question to John Locke in 1688: Suppose a Man born blind, and now adult, and taught by his touch to distinguish between a Cube, and a Sphere of the same metal, and nighly of the same bigness, so as to tell, when he felt one and t other; which is the Cube, which the Sphere. Suppose then the Cube and Sphere placed on a Table, and the Blind Man to be made to see. Quaere, Whether by his sight, before he touch d them, he could now distinguish, and tell, which is the Globe, which the Cube. 2

Development of Touch Streri 1986, 1987, 1988, 2000, 2003, 2004 Newborns, 2- and 5-month-olds touch-only shape discrimination vision-touch transfer Meltzoff 1979 1-month olds; oral touch Cowey 1975 Rhesus monkeys food/sand in the dark Icons by Freepik, Smashicons from flaticon.com 3

Sensory Substitution & Active Exploration Bach-y-Rita 1969, White 1970 Also: active exploration 4

Why touch? Perception Manipul Darkness Transparency Complex task Underwater Inside a bag Slippag Smoke Reflection Adaptation Occlusion Nonprehe Image sources journalstar.com whaleshark.org.au brede-art.com alibaba.com kjpargeter from Freepik goir/shutterstock.com IROS 2017 Manip. Challenge Hang 2016 7

Broadly-Related Work Transplant vision techniques to touch Schneider 2009 BoW Pezzementi 2011 BoW, moments, SIFT Strub 2014 moments Luo 2015 tactile SIFT Luo 2016 ICP, BoW Yuan 2017 CNN Calandra 2017 CNN Hollis 2018 compress 8

Intuition of End-effector pose Approach: Object poseindependent; Geometric; Holistic; Sparse contacts (low cost $) Contact point (x, y, z) 1 Contact point (x, y, z) 3 Contact point (x, y, z) 2 Triangle (l 0, l 1, a 0 ) Zhang et al. IROS 2016 9

Histogram of Triangles Build 3D histogram of triangle parameters Move hand; Repeat Zhang et al. IROS 2016 Classifier 10

Avg accuracy over 100 train-test splits Avg accuracy over 100 train-test splits Histogram Parameters Mesh Cloud Accuracy for Various Histogram Parameters Physics Simulation Accuracy for Various Histogram Parameters Number of bins per 3D histogram dimension Highest: 90.3% from (l1, l2, a2) and 10 bins Number of bins per 3D histogram dimension Highest: 73.9% from (a1, a2, l0) and 20 bins Zhang et al. IROS 2016 11

Qualitative Contact Clouds Zhang et al. IROS 2016 12

Pairwise Distances Least similar objects Most similar objects (bottle) In-between objects Zhang et al. IROS 2016 13

Intuition of Approach: Objects share similar local features; there s distribution observation z 1 observation z 2 Actions are associated with local geometric observations wrist pose p 1 action a Zhang et al. IROS 2017 wrist pose p 2 14

Active Pose Selection Poses selected at test time (recognized in 2-9 moves): Zhang et al. IROS 2017 15

Active Pose Selection Zhang et al. IROS 2017 16

On a Continuum Manipulator Simulation (recognition under 3 wraps) Real (no sensors) Mao*, Zhang*, et al. IROS 2017 18

Enclosure contacts only Drawbacks Lederman & Klatzky 1987 exploratory procedures 19

Visuotactile Integration (2x3) x 3 TakkTile barometric sensors 20

Problem: Grasp success from vision + touch, with Task semantics (beyond pick and place) 21

Visuotactile Representation 1. Spatial Correspondence? 2. Leverage state of the art in vision? Input CNN 0 1 FC Grasp Success Conv 22

Related Work Yuan CVPR 2017 Fabric material classification Kinect + camera-based touch Calandra CoRL 2017 Grasp success probability RGB + camera-based touch Varley arxiv 2018 3D CNN on voxels Depth + tactile point cloud 23

Visuotactile Representation 3. Semantic task? Implementation from Detry et al. IROS 2017 24

Visuotactile Representation Correspondence: Camera frame Output: Grasp success 25

6DOF Tactile Grasp Collection Random Scene Point Cloud RGB View Off-the-shelf Grasps Tactile Simulation Grasp and Lift 26

Tactile Input & Grasp Label Good Grasp Bad Grasp Contact readings Grasp Success Label Good Lift Slipped Lift 27

Task Label Binary task labels in CAD (Detry et al. IROS 2017): pour handover Transform contact points to object frame Task label from CAD Grasp 28

Tactile Heatmap Visualization Successful grasps Unsuccessful grasps 29

Dex-Net 2.0 Planar Grasps Crop to local grasp; center row of pixels aligned to grasp axis Successful grasps Unsuccessful grasps Mahler et al. RSS 2017 30

Simulated Tactile Heat Map 2D normal + thickness Successful grasps 0.1 + Unsuccessful grasps 0 thickness normal - -0.1 31

Results on Dex-Net Adv-Synth Subset 32

Ongoing Work Currently collected 10,000 grasps 10x? Adv-Synth has 189,300 grasps, 1/6 of Dex-Net 2.0 33

Challenges Visuotactile representation Sim-to-real transfer Sensory input Physics Rusu 2016 James 2017 Inoue 2017 34

Future Work Courtesy of K. Queen 35