Efficient Grasping from RGBD Images: Learning Using a New Rectangle Representation. Yun Jiang, Stephen Moseson, Ashutosh Saxena Cornell University

Similar documents
Determining Grasping Regions using Vision

Learning Hardware Agnostic Grasps for a Universal Jamming Gripper

Keywords: clustering, construction, machine vision

Object Placement using the Adept Arm

Learning from Successes and Failures to Grasp Objects with a Vacuum Gripper

Learning to Grasp Objects: A Novel Approach for Localizing Objects Using Depth Based Segmentation

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation

CS4495/6495 Introduction to Computer Vision

Depth Sensors Kinect V2 A. Fornaser

RGBD Face Detection with Kinect Sensor. ZhongJie Bi

Learning to Grasp Unknown Objects using Weighted Random Forest Algorithm from Selective Image and Point Cloud Feature

Short Survey on Static Hand Gesture Recognition

Imagining the Unseen: Stability-based Cuboid Arrangements for Scene Understanding

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

Adaptive Gesture Recognition System Integrating Multiple Inputs

The Crucial Components to Solve the Picking Problem

Tangled: Learning to Untangle Ropes with RGB-D Perception

Robo-Librarian: Sliding-Grasping Task

Lighting- and Occlusion-robust View-based Teaching/Playback for Model-free Robot Programming

Efficient Surface and Feature Estimation in RGBD

LEARNING BOUNDARIES WITH COLOR AND DEPTH. Zhaoyin Jia, Andrew Gallagher, Tsuhan Chen

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

2. Basic Task of Pattern Classification

Robotics Programming Laboratory

Real-Time Grasp Detection Using Convolutional Neural Networks

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients

Final Project Report: Mobile Pick and Place

Generating Object Candidates from RGB-D Images and Point Clouds

Enabling a Robot to Open Doors Andrei Iancu, Ellen Klingbeil, Justin Pearson Stanford University - CS Fall 2007

High-Fidelity Augmented Reality Interactions Hrvoje Benko Researcher, MSR Redmond

International Journal of Advance Engineering and Research Development

What is Computer Vision?

3D Perception. CS 4495 Computer Vision K. Hawkins. CS 4495 Computer Vision. 3D Perception. Kelsey Hawkins Robotics

arxiv: v1 [cs.ro] 24 Nov 2015

An Introduction to Content Based Image Retrieval

Multi-view stereo. Many slides adapted from S. Seitz

COMP 102: Computers and Computing

Beyond Mere Pixels: How Can Computers Interpret and Compare Digital Images? Nicholas R. Howe Cornell University

Object Purpose Based Grasping

Optimizing Monocular Cues for Depth Estimation from Indoor Images

Learning to Grasp Novel Objects using Vision

Intelligent Robots for Handling of Flexible Objects. IRFO Vision System

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Motion Analysis. Motion analysis. Now we will talk about. Differential Motion Analysis. Motion analysis. Difference Pictures

Ensemble of Bayesian Filters for Loop Closure Detection

Learning Stable Pushing Locations

Computer Vision. CS664 Computer Vision. 1. Introduction. Course Requirements. Preparation. Applications. Applications of Computer Vision

Generality and Simple Hands

DESIGNING A REAL TIME SYSTEM FOR CAR NUMBER DETECTION USING DISCRETE HOPFIELD NETWORK

PHOG:Photometric and geometric functions for textured shape retrieval. Presentation by Eivind Kvissel

Applying Synthetic Images to Learning Grasping Orientation from Single Monocular Images

Real-time Scalable 6DOF Pose Estimation for Textureless Objects

Can we quantify the hardness of learning manipulation? Kris Hauser Department of Electrical and Computer Engineering Duke University

Detecting motion by means of 2D and 3D information

Task analysis based on observing hands and objects by vision

3D Photography: Stereo

Seminar Heidelberg University

Video Inter-frame Forgery Identification Based on Optical Flow Consistency

Graphical Models for Computer Vision

EE368 Project: Visual Code Marker Detection

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Advanced Digital Photography and Geometry Capture. Visual Imaging in the Electronic Age Lecture #10 Donald P. Greenberg September 24, 2015

BIN PICKING APPLICATIONS AND TECHNOLOGIES

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

Motion and Tracking. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Robotic Grasping of Novel Objects

Object Reconstruction

Convolutional Residual Network for Grasp Localization

Depth. Common Classification Tasks. Example: AlexNet. Another Example: Inception. Another Example: Inception. Depth

A Novel q-parameter Automation in Tsallis Entropy for Image Segmentation

AMR 2011/2012: Final Projects

A deformable model driven method for handling clothes

3D Computer Vision 1

crops BGU Main Researchers

Probabilistic 2D Acoustic Source Localization Using Direction of Arrivals in Robot Sensor Networks

A Comparison between Active and Passive 3D Vision Sensors: BumblebeeXB3 and Microsoft Kinect

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing

arxiv: v1 [cs.ro] 11 Jul 2016

Three Embedded Methods

Geometric Reconstruction Dense reconstruction of scene geometry

Human Body Recognition and Tracking: How the Kinect Works. Kinect RGB-D Camera. What the Kinect Does. How Kinect Works: Overview

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation

Advanced Digital Photography and Geometry Capture. Visual Imaging in the Electronic Age Lecture #10 Donald P. Greenberg September 24, 2015

STRUCTURAL EDGE LEARNING FOR 3-D RECONSTRUCTION FROM A SINGLE STILL IMAGE. Nan Hu. Stanford University Electrical Engineering

Chapter 4 - Image. Digital Libraries and Content Management

3D Scanning. Qixing Huang Feb. 9 th Slide Credit: Yasutaka Furukawa

Real Time Person Detection and Tracking by Mobile Robots using RGB-D Images

Combining Monocular and Stereo Depth Cues

String distance for automatic image classification

Introduction. Introduction. Related Research. SIFT method. SIFT method. Distinctive Image Features from Scale-Invariant. Scale.

COMPUTER AND ROBOT VISION

Reinforcement Learning for Appearance Based Visual Servoing in Robotic Manipulation

Classification of Protein Crystallization Imagery

3D Photography: Active Ranging, Structured Light, ICP

Deep Learning for Detecting Robotic Grasps

CVPR 2014 Visual SLAM Tutorial Kintinuous

CS4758: Rovio Augmented Vision Mapping Project

Toward data-driven contact mechanics

ME/CS 132: Introduction to Vision-based Robot Navigation! Low-level Image Processing" Larry Matthies"

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

Transcription:

Efficient Grasping from RGBD Images: Learning Using a New Rectangle Representation Yun Jiang, Stephen Moseson, Ashutosh Saxena Cornell University

Problem Goal: Figure out a way to pick up the object. Approach Grip Pick up Question: where and how to grasp?

How to Perceive Objects RGBD cameras give RGB image plus depth information Stereo cameras ($1000): Bumblebee Kinect Camera ($140) RGB image Depth map 3D point cloud

Our Formulation Input: RGBD image Output: a proper grasp -- the configuration of the gripper at the final grasp stage 3D location, 3D orientation, opening width.

Traditional Approaches Control/Planning Force and form closure (Nguyen1986, Lakshminarayana1978) Requires full 3D knowledge of grippers and objects Disadvantages: Complete 3D model is not always available Noise sensors. Difficult to model friction. Search in enormous configuration space Does not apply to deformable grippers!

Learning Approaches Learning provides generalization on novel objects Robust to noise and variations of environment Previous learning approaches Representation problem 3D orientation of gripper not represented well. (Saxena et al., NIPS 2006) (Le at al., ICRA 2010)

Representation Should contain full 7-dimensional gripper configuration (3D location, 3D orientation, gripper opening width) Specifically model gripper s physical size

New Representation Grasping Rectangle Contains full 7-dimensional gripper configuration Specifically model gripper s physical size. Strictly constraints the boundary of features.

Define the Score Function : the feature vector for a possible grasp G Score of grasp G: Best grasp: the highest-score rectangle in the image

Learning the Score Function Learning algorithm: SVM-Rank Ranking not classification: because the boundary between good / bad grasps is vague Training data: Labeled rectangles for pictures.

Inference Search for all possible rectangles

Search Highest-score Rectangles Image: n x m Features: k (per rectangle) Brute-force search? O(n 2 m 2 ) rectangles, O(nmk) to compute features O(n 3 m 3 k) for one orientation To accelerate: Compute features incrementally O(n 2 m 2 k) Even faster? φ(g) φ( G + G) =?

Fast search Condition: features are independent in pixel level, i.e. The score of a rectangle can be decomposed to the scores of pixels Classical problem: maximum-sum submatrix! In one dimension, array 3-4 5 2-5 5 9-8 sum 3 0 5 7 2 7 16 8 In our problem, reduce the time complexity to O(nmk+n 2 m)

Histogram Features for Fast Search Histograms from 15 filters to capture color, textures and edges Spatial Histogram Features Divide a rectangle into 3 sub-rectangles

Advanced Features Histogram is fast but not able to capture the correlations among the 3 sub-rectangles E.g., One criteria: d 1 >d 2 and d 2 <d 3 Non-linear features E.g., d = d 1 d 3 /(d 2 ) 2 Expressive but not applicable to fast search d 1 d 2 d 3

Two-step Process Algorithm: Two models: First step: Fast, but not accurate (good for pruning). Second step: Accurate, but slow. Step2: Re-ranking Top 100 rectangles after the 1st step Top 3 rectangles after the 2nd step

Summary RGBD images Representation Oriented rectangle Learning using Efficient two-step process Fast search with histogram features Re-rank with more sophisticated features

Experiments Tested on novel objects Offline: 128 images Robot: 12 objects, multiple tries

Results on offline test Evaluation-1: rectangle metric

Results on offline test Evaluation-2: point metric [Saxena2008]

Robotic experiments Adept Viper s850 Parallel plate gripper

Results on robotic experiments

Universal Jamming gripper: Robotic Experiment and Analysis

After Grasp: Learning to Place Challenges: Enormous search space Placing under preference Efficient learning approach to identify good placements Results on robotic experiment Goal: correct location and preferred orientation 92% for New Objects in New Environments.

Thank you! Yun Jiang, Stephen Moseson and Ashutosh Saxena, Efficient Grasping from RGBD Images: Learning using a new Rectangle Representation, ICRA 2011. Learning to Place New Objects: Yun Jiang, Changxi Zheng, Marcus Lim, Ashutosh Saxena, Learning to Place New Objects, ICRA 2012. First appeared in RSS workshop on mobile manipulation, June 2011.

Video

Future Work

Advanced Features Histogram is fast but not able to capture the correlations among the 3 sub-rectangles E.g., One criteria: d 1 >d 2 and d 2 <d 3 Non-linear features d 1 d 2 d 3 Histogram of a non-linear feature d = d 1 d 3 /(d 2 ) 2

Spatial Histogram for Fast Search Time complexity is only multiplied by 3