Lighting- and Occlusion-robust View-based Teaching/Playback for Model-free Robot Programming

Lighting- and Occlusion-robust View-based Teaching/Playback for Model-free Robot Programming *Yusuke MAEDA (Yokohama National University) Yoshito SAITO (Ricoh Corp)

Background Conventional Teaching/Playback still widely used model-free: neither task-specific models nor object-specific models are necessary for constant task conditions eg) initial pose of object does not change 2

When the initial object pose is not constant Object localization with cameras Model-based image processing Geometric feature extraction: edge, vertex, Pattern matching Object-specific: model-freeness is lost camera 3

Motivation To develop a model-free robot programming method that can cope with change of task conditions View-based teaching/playback : robot programming with view-based image processing [Maeda 2011 ICRA] 4

Model-based vs View-based Model-based approach with object-specific models accurate cumbersome View-based (Appearance-based) approach without object-specific models versatile 5

View-based Teaching/Playback [Maeda 2011 ICRA] View-based image processing using PCA not object-specific no need for camera calibration Adaptability to change of initial object pose using the generalization ability of neural networks generalization from multiple demonstrations 6

Related Works [Zhang et al 2000]: View-based fine positioning for grippers [Zhao et al 2008]: View-based visual servoing for relative positioning [Levine et al 2016]: View-based grasping of novel objects through massive learning 7

Overview of View-based Teaching/Playback (1/3) Human Demonstration Record all the scene images and corresponding robot motions Goal 8

Overview of View-based Teaching/Playback (2/3) Mapping Acquisition From scene image to robot motion Robot motion Mapping Image 9

Overview of View-based Teaching/Playback (3/3) View-based Playback Autonomous task execution using the acquired image-to-motion mapping Goal 10

Neural Network for Mapping numerous pixels Raw Pixel Data PCA Factor Scores FS (t) Hand Configuration Config (t) Output Layer Robot Motion Move (t) Move (t-δt) Hidden Layer Input Layer 11

Advantage of View-based Teaching/Playback Possible to cope with changes of task conditions (to some extent) Use of generalization ability of NN Model-free Multiple Demonstrations Interpolative Playback Teach2 Playback Teach1 12

Objective Original view-based teaching/playback [Maeda 2011 ICRA] sensitive to lighting conditions sensitive to occlusions To make our view-based teaching/playback lighting- and occulusion-robust Use of range images Use of subimages 13

Experimental Setup Target task: Pushing on a plane Kinect for grayscale and range images Teaching (x2) 14

Change of Lighting Condition (playback with grayscale images) Teaching (x2) Playback (x2) 15

Change of Lighting Condition (playback with range images) Teaching (x2) Playback (x2) 16

Position Errors at Goal for Each Initial Positions 260 Grayscale images used 260 Range images used x [mm] 270 280 290 300 Demonstration Error < 20 mm Error < 30 mm Error < 40 mm Error >= 40 mm Goal x [mm] 270 280 290 300 Demonstration Error < 20 mm Error < 30 mm Error < 40 mm Error >= 40 mm Goal 310 310 320 320 330 5 25 45 65 85 105 125 y [mm] 330 5 25 45 65 85 105 125 y [mm] Larger errors found for range images due to their noise 17

Grayscale Images vs Range Images Grayscale Images Less noisy, but less robust Range Images Noisy, but robust Adaptive online switching of used images Achieve both accuracy and robustness in view-based teaching/playback 18

Switching Grayscale/Range Images (1/2) Neural Network for Robot Motion Factor Scores FS (t) Hand Configuration Config (t) Move (t-δt) Output Layer Hidden Layer Relative Motion Robot Motion Move (t) Additional Neural Network for Inconsistency Detection Factor Scores FS (t) Hand Configuration Config (t) Output Layer Hidden Layer Input Layer Absolute Position Hand Configuration Config (t+δt) Input Layer Hand Configuration Config (t+δt) Consistent? 19

Switching Grayscale/Range Images (2/2) NN for Robot Motion Config(t+Δt) NN for Inconsistency Detection Config(t+Δt) Grayscale image is usable yes Error<T A T A : threshold no Unexpected situation detected 20

Lighting-robust View-based Playback (light added) Playback (x2) Used Images (x2) Range images were used automatically due to change of lighting condition 21

Lighting-robust View-based Playback (light on/off) Playback (x2) Used Images (x2) Successfully adapted to light on/off by switching neural networks 22

Making View-based Teaching/Playback Occlusion-robust Neural networks are trained for not only full images but also subimages to overcome partial occlusions 23

Switching Full/Sub Images Loop Start Inconsistency found for full image? no yes* Loop End *subimage with minimum inconsistency used 24

Occlusion-robust View-based Playback Use of subimage-based NNs Playback (x2) Used Images (x2) 25

Switching Grayscale Image, Range Full image and Range Subimages Loop Start Inconsistency found for grayscale full image? yes Inconsistency found for range full image? yes no no Loop End 26

Lighting- and Occlusion-robust View-based Playback Playback (x2) Used Images (x2) 27

Conclusion Lighting-robust view-based teaching/playback Use of range images Switching between range and grayscale images for better accuracy Occlusion-robust view-based teaching/playback Use of subimages 28

Future Work Application to various robotic tasks that require higher DOF Incorporation of various sensor modalities Force information [Nakagawa et al 2016 IAS-14] 29