Team the Amazon Robotics Challenge 1st place in stowing task

Size: px

Start display at page:

Download "Team the Amazon Robotics Challenge 1st place in stowing task"

Hollie Barker
6 years ago
Views:

1 Grasping

2 Team the Amazon Robotics Challenge 1st place in stowing task Andy Zeng Shuran Song Kuan-Ting Yu Elliott Donlon Francois Hogan Maria Bauza Daolin Ma Orion Taylor Melody Liu Eudald Romo Nima Fazeli Ferran Alet Nikhil Dafle Rachel Holladay Isabella Morona Prem Qu Nair Druck Green Ian Taylor Weber Liu Thomas Funkhouser Alberto Rodriguez

3 From model-based to model-free Model-based grasping Pose estimation Grasp planning Works well with known objects in structured environments Can t handle novel objects in unstructured environments (due to pose estimation)

unstructured environments (due to pose estimation) Model-free grasping Visual data

4 From model-based to model-free Model-based grasping Pose estimation Grasp planning Works well with known objects in structured environments Can t handle novel objects in unstructured environments (due to pose estimation) Model-free grasping Visual data Grasp planning Use local geometric features Ignore object identity End-to-end Motivated by industry

5 Recent work on model-free grasping Grasp Pose Detection M. Gualtieri et al., 17 Supersizing Self-Supervision L. Pinto and A. Gupta, 16 Handles clutter and novel objects DexNet J. Mahler et al., 17

Pinto and A. Gupta, 16 DexNet 1.0-3.0 J. Mahler et al.

6 Recent work on model-free grasping Grasp Pose Detection M. Gualtieri et al., 17 Supersizing Self-Supervision L. Pinto and A. Gupta, 16 DexNet J. Mahler et al., 17 Handles clutter on tabletop scenarios and novel objects selected beforehand

7 Recent work on model-free grasping Grasp Pose Detection M. Gualtieri et al., 17 Supersizing Self-Supervision L. Pinto and A. Gupta, 16 DexNet J. Mahler et al., 17 Common limitations: low grasp sample density, small neural network sizes

8 In this talk A model-free grasping method

9 In this talk A model-free grasping method Handles dense clutter in tabletop bin/box scenario Rethink dense clutter: Objects not only tightly packed, but also tossed and stacked on top of each other Objects in corners and on bin edges

10 In this talk A model-free grasping method Handles dense clutter in tabletop bin/box scenario Works for novel objects of all kinds (i.e. any household object should be fair game) 90-95% grasping accuracy is not enough Objects without depth data...

11 In this talk A model-free grasping method Handles dense clutter in tabletop bin/box scenario Works for novel objects of all kinds (i.e. any household object should be fair game) Fast and efficient Standard: Grasp sampling Ours: Grasp ranking Dense pixel-wise predictions

12 In this talk A model-free grasping method Handles dense clutter in tabletop bin/box scenario Works for novel objects of all kinds (i.e. any household object should be fair game) Fast and efficient 1st place stowing task at Amazon Robotics Challenge 17 (i.e. it works) The Beast from the East setup competition footage

13 Overview: multi-affordance grasping Input: multi-view RGB-D images

14 Overview: multi-affordance grasping Input: multi-view RGB-D images Output: dense grasp proposals and affordance scores for 4 primitive grasping behaviors: suction down suction side grasp down flush grasp

15 Dense pixel-wise affordances with FCNs Input RGB-D images fully convolutional ResNet-50

16 Dense pixel-wise affordances with FCNs Input RGB-D images suction down suction side fully convolutional ResNet-50

17 Dense pixel-wise affordances with FCNs Input RGB-D images suction down suction side fully convolutional ResNet-50

18 Dense pixel-wise affordances with FCNs Input RGB-D images suction down suction side fully convolutional ResNet-50 What about grasping?

19 Dense pixel-wise affordances with FCNs Input RGB-D images suction down suction side fully convolutional ResNet-50 RGB-D heightmaps

20 Dense pixel-wise affordances with FCNs Input RGB-D images suction down suction side fully convolutional ResNet-50 RGB-D heightmaps grasp down flush grasp

21 Dense pixel-wise affordances with FCNs Input RGB-D images suction down suction side fully convolutional ResNet-50 RGB-D heightmaps grasp down flush grasp predicts horizontal grasp affordances

22 Training data Manual labeling ~100 different household/office objects Suctionable areas Parallel-jaw grasps

23 Generalization from hardware capabilities High-powered deployable suction Actuated spatula

24 Pros and cons Advantages: Fast runtime speeds from efficient convolution

25 Pros and cons Advantages: Fast runtime speeds from efficient convolution Uses both color and depth information

26 Pros and cons Advantages: Fast runtime speeds from efficient convolution Uses both color and depth information Can leverage fat pre-trained networks Higher good grasp recall Standard: Grasp sampling Ours: Grasp ranking

27 Pros and cons Advantages: Fast runtime speeds from efficient convolution Uses both color and depth information Can leverage fat pre-trained networks Higher good grasp recall Limitations: Considers only top-down parallel-jaw grasps Can trivially extend to more grasp angles Limited to grasping behaviors for which you can define affordances (no real planning) Open-loop

28 Future work Model-based grasping Pose estimation Grasp planning Model-free grasping Visual data Grasp planning

29 Future work Model-based grasping Pose estimation Grasp planning Model-free grasping Visual data Grasp planning How can we improve model-free by making it more like model-based?

30 Future work Model-based grasping Model-free grasping Semantic Scene Completion from a Single Depth Image [Song et al., CVPR 17]

Takeaways A model-free grasping method FCNs to compute dense affordance predictions for multiple grasping behaviors (suction, parallel-jaw) Multiple grasping primitive behaviors dense clutter

31 Takeaways A model-free grasping method FCNs to compute dense affordance predictions for multiple grasping behaviors (suction, parallel-jaw) Multiple grasping primitive behaviors dense clutter in bin/box scenario Multi-view color and depth + diverse training data + robust hardware handle novel objects of all kinds FCNs for grasping affordance predictions efficiency and high grasp recall

32 Takeaways A model-free grasping method FCNs to compute dense affordance predictions for multiple grasping behaviors (suction, parallel-jaw) Multiple grasping primitive behaviors dense clutter in bin/box scenario Multi-view color and depth + diverse training data + robust hardware handle novel objects of all kinds FCNs for grasping affordance predictions efficiency and high grasp recall Paper and code are available: arc.cs.princeton.edu

33 Recognition of novel objects without retraining Match real images of novel objects to their product images (available at test time) After isolating object from clutter with model-free grasping, perform recognition

34 Cross domain image matching (training) product images observed images ℓ2 distance ratio loss match?

35 Cross domain image matching (training) product images observed images ℓ2 distance ratio loss match? softmax loss for K-Net only

36 Cross domain image matching (testing) feature embedding known novel

37 Cross domain image matching (testing) input feature embedding known novel

38 Cross domain image matching (testing) input feature embedding known novel match!

39 Cross domain image matching (testing) input feature embedding known novel match! Pre-trained ImageNet features

arxiv: v3 [cs.ro] 20 Feb 2018

arxiv: v3 [cs.ro] 20 Feb 2018 Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching arxiv:1710.01330v3 [cs.ro] 20 Feb 2018 Andy Zeng1, Shuran Song1, Kuan-Ting Yu2, Elliott