Target-driven Visual Navigation in Indoor Scenes Using Deep Reinforcement Learning [Zhu et al. 2017]

Similar documents
Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning

COS Lecture 13 Autonomous Robot Navigation

CS230: Lecture 3 Various Deep Learning Topics

Indoor Target-driven Visual Navigation Using Imitation Learning

arxiv: v1 [cs.ai] 30 Jul 2018

Combining Deep Reinforcement Learning and Safety Based Control for Autonomous Driving

arxiv: v1 [cs.cv] 16 Sep 2016

Introduction to Information Science and Technology (IST) Part IV: Intelligent Machines and Robotics Planning

Mobile Robots Locomotion

Spring Localization II. Roland Siegwart, Margarita Chli, Martin Rufli. ASL Autonomous Systems Lab. Autonomous Mobile Robots

Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning

Visually Augmented POMDP for Indoor Robot Navigation

Unified, real-time object detection

Navigation and Metric Path Planning

Neural Networks for Obstacle Avoidance

CS 378: Computer Game Technology

Spring 2010: Lecture 9. Ashutosh Saxena. Ashutosh Saxena

Spring Localization II. Roland Siegwart, Margarita Chli, Juan Nieto, Nick Lawrance. ASL Autonomous Systems Lab. Autonomous Mobile Robots

CS4758: Moving Person Avoider

Vision based autonomous driving - A survey of recent methods. -Tejus Gupta

Lecture 15: Coordination I. Autonomous Robots, Carnegie Mellon University Qatar

MEMORY AUGMENTED CONTROL NETWORKS

Introduction to Autonomous Mobile Robots

CS 460/560 Introduction to Computational Robotics Fall 2017, Rutgers University. Course Logistics. Instructor: Jingjin Yu

Simultaneous Localization and Mapping

TorontoCity: Seeing the World with a Million Eyes

Tangent Bug Algorithm Based Path Planning of Mobile Robot for Material Distribution Task in Manufacturing Environment through Image Processing

Localization and Map Building

Lecture 14: Path Planning II. Autonomous Robots, Carnegie Mellon University Qatar

UNIVERSITY OF NORTH CAROLINA AT CHARLOTTE

AUTONOMOUS DRONE NAVIGATION WITH DEEP LEARNING

Hierarchical Reinforcement Learning for Robot Navigation

Learning Semantic Environment Perception for Cognitive Robots

Robotics. Haslum COMP3620/6320

Framed-Quadtree Path Planning for Mobile Robots Operating in Sparse Environments

James Kuffner. The Robotics Institute Carnegie Mellon University. Digital Human Research Center (AIST) James Kuffner (CMU/Google)

S7348: Deep Learning in Ford's Autonomous Vehicles. Bryan Goodman Argo AI 9 May 2017

Slides credited from Dr. David Silver & Hung-Yi Lee

Introduction to Reinforcement Learning. J. Zico Kolter Carnegie Mellon University

Autonomous Robotics 6905

Zürich. Roland Siegwart Margarita Chli Martin Rufli Davide Scaramuzza. ETH Master Course: L Autonomous Mobile Robots Summary

Non-Homogeneous Swarms vs. MDP s A Comparison of Path Finding Under Uncertainty

Geometry of Multiple views

Dependency Tracking for Fast Marching. Dynamic Replanning for Ground Vehicles

Learning Semantic Video Captioning using Data Generated with Grand Theft Auto

NERC Gazebo simulation implementation

Towards Building Large-Scale Multimodal Knowledge Bases

COS Lecture 10 Autonomous Robot Navigation

Lesson 1 Introduction to Path Planning Graph Searches: BFS and DFS

The Perfect Swarm. Problem Statement. Table of Contents. Overview. The Perfect Swarm - Goodacre, Jacob, Rossi

3D Simultaneous Localization and Mapping and Navigation Planning for Mobile Robots in Complex Environments

Turning an Automated System into an Autonomous system using Model-Based Design Autonomous Tech Conference 2018

Visual Perception Sensors

REINFORCEMENT LEARNING: MDP APPLIED TO AUTONOMOUS NAVIGATION

Collision-free Path Planning Based on Clustering

Applications of Reinforcement Learning. Ist künstliche Intelligenz gefährlich?

Visual Navigation for Flying Robots Exploration, Multi-Robot Coordination and Coverage

Dense Tracking and Mapping for Autonomous Quadrocopters. Jürgen Sturm

A Modular Software Framework for Eye-Hand Coordination in Humanoid Robots

Robust Color Choice for Small-size League RoboCup Competition

VALIDATION OF 3D ENVIRONMENT PERCEPTION FOR LANDING ON SMALL BODIES USING UAV PLATFORMS

Intention-Net: Integrating Planning and Deep Learning for Goal-Directed Autonomous Navigation

W4. Perception & Situation Awareness & Decision making

Why the Self-Driving Revolution Hinges on one Enabling Technology: LiDAR

MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection

Software Architecture--Continued. Another Software Architecture Example

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL

Announcements. CS 188: Artificial Intelligence Spring Advanced Applications. Robot folds towels. Robotic Control Tasks

CS 188: Artificial Intelligence Spring Announcements

Sensor Modalities. Sensor modality: Different modalities:

Robotics Tasks. CS 188: Artificial Intelligence Spring Manipulator Robots. Mobile Robots. Degrees of Freedom. Sensors and Effectors

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

A Path Tracking Method For Autonomous Mobile Robots Based On Grid Decomposition

April 4-7, 2016 Silicon Valley

3D Convolutional Neural Networks for Landing Zone Detection from LiDAR

Hybrid Genetic-Fuzzy Approach to Autonomous Mobile Robot

LSD-NET: LOOK, STEP AND DETECT FOR JOINT NAV-

You can also create your own maze! Using the board and pieces from pages 7 and 8, build your own maze, then create the code to solve it.

Multi-View 3D Object Detection Network for Autonomous Driving

Wave front Method Based Path Planning Algorithm for Mobile Robots

Autonomous Flight in Unstructured Environment using Monocular Image

Lecture 3 - States and Searching

6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION

Autonomous Navigation in Unknown Environments via Language Grounding

Manipulating a Large Variety of Objects and Tool Use in Domestic Service, Industrial Automation, Search and Rescue, and Space Exploration

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016

Final project: 45% of the grade, 10% presentation, 35% write-up. Presentations: in lecture Dec 1 and schedule:

Continuous Valued Q-learning for Vision-Guided Behavior Acquisition

Machine Learning on Physical Robots

Assignment 4: CS Machine Learning

CIS 580, Machine Perception, Spring 2015 Homework 1 Due: :59AM

Path Planning. Jacky Baltes Dept. of Computer Science University of Manitoba 11/21/10

CMU-Q Lecture 4: Path Planning. Teacher: Gianni A. Di Caro

Announcements. Recap Landmark based SLAM. Types of SLAM-Problems. Occupancy Grid Maps. Grid-based SLAM. Page 1. CS 287: Advanced Robotics Fall 2009

7630 Autonomous Robotics Probabilities for Robotics

Beyond Bags of Features

YOLO9000: Better, Faster, Stronger

Learning algorithms for physical systems: challenges and solutions

! References: ! Computer eyesight gets a lot more accurate, NY Times. ! Stanford CS 231n. ! Christopher Olah s blog. ! Take ECS 174!

Jo-Car2 Autonomous Mode. Path Planning (Cost Matrix Algorithm)

Transcription:

REINFORCEMENT LEARNING FOR ROBOTICS Target-driven Visual Navigation in Indoor Scenes Using Deep Reinforcement Learning [Zhu et al. 2017] A. James E. Cagalawan james.cagalawan@gmail.com University of Waterloo June 27, 2018 (Note: Videos in the slide deck used in presentation have been replaced with YouTube links).

TARGET-DRIVEN VISUAL NAVIGATION IN INDOOR SCENES USING DEEP REINFORCEMENT LEARNING Paper Authors Yuke Zhu 1 Roozbeh Mottaghi 2 Eric Kolve 2 Joseph J. Lim 1 Abhinav Gupta 2,3 Li Fei-Fei 1 Ali Farhadi 2,4 1 Stanford University 2 Allen Institute for AI 3 Carnegie Mellon University 4 University of Washington

MOTIVATION Navigating the Grid World Free Space: Occupied Space: Agent: Goal: Danger:

MOTIVATION Navigating the Grid World Assign Numerical Rewards Free Space: Occupied Space: Agent: Goal: Danger: Can assign numerical rewards to the goal and the dangerous grid cells. +1-100

MOTIVATION Navigating the Grid World Learn a Policy Free Space: Occupied Space: Agent: Goal: Danger: Can assign numerical rewards to the goal and the dangerous grid cells. Then learn a policy.

MOTIVATION Visual Navigation Applications: Treasure Hunting Robotic visual navigation using robots has many applications. Treasure hunting.

MOTIVATION Visual Navigation Applications: Robot Soccer Robotic visual navigation using robots has many applications. Treasure hunting. Soccer robots getting to the ball first.

MOTIVATION Visual Navigation Applications: Pizza Delivery Robotic visual navigation using robots has many applications. Treasure hunting. Soccer robots getting to the ball first. Drones delivering pizzas to your lecture hall.

MOTIVATION Visual Navigation Applications: Self-Driving Cars Robotic visual navigation using robots has many applications. Treasure hunting. Soccer robots getting to the ball first. Drones delivering pizzas to your lecture hall. Autonomous cars driving people to their homes.

MOTIVATION Visual Navigation Applications: Search and Rescue Robotic visual navigation using robots has many applications. Treasure hunting. Soccer robots getting to the ball first. Drones delivering pizzas to your lecture hall. Autonomous cars driving people to their homes. Search and rescue robots finding missing people.

MOTIVATION Visual Navigation Applications: Domestic Robots Robotic visual navigation using robots has many applications. Treasure hunting. Soccer robots getting to the ball first. Drones delivering pizzas to your lecture hall. Autonomous cars driving people to their homes. Search and rescue robots finding missing people. Domestic robots navigating their way around houses to help with chores.

PROBLEM Domestic Robot: Setup Multiple locations in an indoor scene that our robot must navigate to. Actions consist of moving forwards and backwards and turning left and right.

PROBLEM Domestic Robot: Problems Multiple locations in an indoor scene that our robot must navigate to. Actions consist of moving forwards and backwards and turning left and right. Problem 1: Navigating to multiple targets. Problem 2: Using highdimensional visual inputs is challenging and time consuming to train. Problem 3: Training on a real robot is expensive.

PROBLEM Multi Target: Can t We Just Use Q-Learning? We can already navigate grid mazes using Q- learning by assigning rewards for finding a target. Assigning rewards to multiple locations on the grid does not allow specification of different targets.

PROBLEM Multi Target: Can t We Just Use Q-Learning? Let s Try It We can already navigate grid mazes using Q- learning by assigning rewards for finding a target. Assigning rewards to multiple locations on the grid does not allow specification of different targets. +1 +1 +1

PROBLEM Multi Target: Can t We Just Use Q-Learning? Uh-oh We can already navigate grid mazes using Q- learning by assigning rewards for finding a target. Assigning rewards to multiple locations on the grid does not allow specification of different targets. Would end up at a target, but not any specific target. +1 +1 +1

PROBLEM Multi Target: A Policy for Every Target We can already navigate grid mazes using Q- learning by assigning rewards for finding a target. Assigning rewards to multiple locations on the grid does not allow specification of different targets. Would end up at a target, but not any specific target. Could train multiple policies, but that wouldn t scale with the number of targets.

PROBLEM From Navigation to Visual Navigation Sensor Image Step and Turn Target Image Image Action

PROBLEM Visual Navigation Decomposition: Overview The visual navigation problem can be broken up into pieces with specialized algorithms solving each piece. Image Perception World Modelling Planning Action Autonomous Mobile Robots by Roland Siegwart and Illah R. Nourbakhsh (2011)

PROBLEM Visual Navigation Decomposition: Image Image Perception World Modelling Planning Action

PROBLEM Visual Navigation Decomposition: Perception Localization and Mapping Image Perception World Modelling Planning Action

PROBLEM Visual Navigation Decomposition: Perception Object Detection Image Perception World Modelling Planning Action

PROBLEM Visual Navigation Decomposition: World Modelling Image Perception World Modelling Planning Action

PROBLEM Visual Navigation Decomposition: Planning Choosing the End Position Image Perception World Modelling Planning Action

PROBLEM Visual Navigation Decomposition: Planning Searching for a Path Image Perception World Modelling Planning Action

PROBLEM Visual Navigation Decomposition: Action Image Perception World Modelling Planning Action

PROBLEM Visual Navigation Decomposition: Result This decomposition is effective but each step requires a different algorithm. Image Perception World Modelling Planning Action

PROBLEM Visual Navigation with Reinforcement Learning Design a deep reinforcement learning architecture to handle visual navigation from raw pixels. Image Reinforcement Learning Action

PROBLEM Robot Learning: Reinforcement Learning with a Robot Image Reinforcement Learning Action

PROBLEM Robot Learning: Data Efficiency and Transfer Learning Idea: Train in simulation first, then fine tune learned policy on real robot. Image Reinforcement Learning Action

PROBLEM Robot Learning: Goal Design a deep reinforcement learning architecture to handle visual navigation from raw pixels with high data-efficiency. Image Reinforcement Learning Action

IMPLEMENTATION DETAILS Architecture Overview Similar to actor-critic A3C method which outputs a policy and value running multiple threads. Train a different target on each thread rather than copies of same target. Use fixed ResNet-50 pretrained on ImageNet to generate embedding for observation and target. Fuse the embeddings into a feature vector to get an action and a value.

IMPLEMENTATION DETAILS AI2-THOR Simulation Environment The House Of interactions (THOR) 32 scenes of household environments. Can freely move around a 3D environment. High fidelity compared to other simulators. Video: https://youtu.be/smbxmdiorvs?t=18 Synthia Virtual KITTI AI2-THOR

IMPLEMENTATION DETAILS Handling Multiple Scenes Single layer for all scenes might not generalize as well. Train a different set of weights for every different scene in a vine-like model. Discretized the scenes with constant step length of 0.5 m and turning angle of 90, effectively a grid when training and running for simplicity.

EXPERIMENTS Navigation Results: Performance Of Our Method Against Baselines TABLE I

EXPERIMENTS Navigation Results: Data Efficiency Target-driven Method is More Data Efficient

EXPERIMENTS Navigation Results: t-sne Embedding It Learned an Implicit Map of the Environment Bird s eye view t-sne Visualization of Embedding Vector

EXPERIMENTS Generalization Across Targets: The Method Generalizes Across Targets

EXPERIMENTS Generalization Across Scenes: Training On More Scenes Reduces Trajectory Length Over Training Frames

EXPERIMENTS Method Works In Continuous World The method was evaluated with continuous action spaces. Robot could now collide with things and its movement had noise no longer always aligning with a grid. Result: Required 50M more training frames to train on a single target. Could reach door in average of 15 steps. Random agent took 719 steps. Video: https://youtu.be/smbxmdiorvs?t=169

EXPERIMENTS Robot Experiment: Video Video: https://youtu.be/smbxmdiorvs?t=180

EXPERIMENTS Summary of Contributions and Results Designed new deep reinforcement learning architecture using siamese network with scenespecific layers. Generalizes to multiple scenes and targets. Works with continuous actions. The trained policy in simulation can be run in the real world. Video: https://youtu.be/smbxmdiorvs?t=98

CONCLUSIONS Future Work Physical interaction and object manipulation. Situation: Moving obstacles out of path. Situation: Opening containers for objects. Situation: Turning on the lights when the robot enters a dark room and can t see.

CREDIT Thank You Thanks to Twemoji from Twitter used under license CC-BY 4.0. School image changed to carry University of Waterloo logo. Quadcopter created from Helicopter and Robot Face images. Goose created from Duck image. Red Robot created from Robot image. Dumbbell created from Nuts and Bolt image.

RECAP Motivation Navigating the Grid World, Assign Numerical Rewards, Extract a Policy Applications: Treasure Hunting, Robot Soccer, Pizza Delivery, Self-Driving Cars, Search and Rescue, Domestic Robots Problem Domestic Robot Multi Target: Can t We Just Use Q-Learning? From Navigation to Visual Navigation Visual Navigation Decomposition: Image, Perception, World Modelling, Planning, Action, Results Goal of Visual Navigation Robot Learning Considerations Neural Network Architecture Embedding of Scene and Target using ResNet Siamese Network Scene-Specific Layers AI2-THOR Simulator 32 household scenes with interactions, Trained on discretized representation with no interaction required Experiments Shorter Average Trajectory Length, More Data Efficient, Training on More Targets Improves Success, Training on More Scenes Reduces Trajectory Length, Works in Continuous World, Transfer Learning Works Future Work More Sophisticated Interaction