Learning Motor Behaviors: Past & Present Work

Similar documents
Learning to bounce a ball with a robotic arm

Applying the Episodic Natural Actor-Critic Architecture to Motor Primitive Learning

Movement Imitation with Nonlinear Dynamical Systems in Humanoid Robots

Robot learning for ball bouncing

Teaching a robot to perform a basketball shot using EM-based reinforcement learning methods

Learning and Generalization of Motor Skills by Learning from Demonstration

The Organization of Cortex-Ganglia-Thalamus to Generate Movements From Motor Primitives: a Model for Developmental Robotics

Reinforcement Learning to Adjust Robot Movements to New Situations

Reinforcement Learning to Adjust Robot Movements to New Situations

Introduction to Reinforcement Learning. J. Zico Kolter Carnegie Mellon University

Learning to Select and Generalize Striking Movements in Robot Table Tennis

Natural Actor-Critic. Authors: Jan Peters and Stefan Schaal Neurocomputing, Cognitive robotics 2008/2009 Wouter Klijn

Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields

Warsaw University of Technology. Optimizing Walking of a Humanoid Robot using Reinforcement Learning

Feedback Error Learning for Gait Acquisition

Apprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang

Learning of a ball-in-a-cup playing robot

Imitation and Reinforcement Learning for Motor Primitives with Perceptual Coupling

Learning Inverse Dynamics: a Comparison

Model learning for robot control: a survey

Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning

Accelerating Synchronization of Movement Primitives: Dual-Arm Discrete-Periodic Motion of a Humanoid Robot

ICRA 2012 Tutorial on Reinforcement Learning I. Introduction

Reducing Hardware Experiments for Model Learning and Policy Optimization

Inverse KKT Motion Optimization: A Newton Method to Efficiently Extract Task Spaces and Cost Parameters from Demonstrations

Learning CPG-based biped locomotion with a policy gradient method

Explaining and Exploiting Impedance Modulation in Motor Control

Problem characteristics. Dynamic Optimization. Examples. Policies vs. Trajectories. Planning using dynamic optimization. Dynamic Optimization Issues

Learning Task-Space Tracking Control with Kernels

Neuro-Fuzzy Inverse Forward Models

Learning Multiple Models of Non-Linear Dynamics for Control under Varying Contexts

Reinforcement Learning for Parameterized Motor Primitives

10703 Deep Reinforcement Learning and Control

CONTROLO th Portuguese Conference on Automatic Control Instituto Superior Técnico, Lisboa, Portugal September 11-13, 2006

Learning Complex Motions by Sequencing Simpler Motion Templates

Generalized Inverse Reinforcement Learning

Marco Wiering Intelligent Systems Group Utrecht University

Reinforcement Learning to Adjust Parametrized Motor Primitives to New Situations

Locally Weighted Learning

Reinforcement le,arning of dynamic motor sequence: Learning to stand up

Quadruped Robots and Legged Locomotion

Data-Efficient Generalization of Robot Skills with Contextual Policy Search

arxiv: v1 [cs.ro] 10 May 2014

Online movement adaptation based on previous sensor experiences

Learning Parameterized Skills

Self-Reconfigurable Robots for Space Exploration Effect of compliance in modular robots structures on the locomotion Adi Vardi

Reinforcement Learning of Clothing Assistance with a Dual-arm Robot

Rich Periodic Motor Skills on Humanoid Robots: Riding the Pedal Racer

On Movement Skill Learning and Movement Representations for Robotics

Visual Servoing for Floppy Robots Using LWPR

Learned parametrized dynamic movement primitives with shared synergies for controlling robotic and musculoskeletal systems

Parametric Primitives for Motor Representation and Control

Movement Learning and Control for Robots in Interaction

Nonlinear Dynamical Systems for Imitation with Humanoid Robots

Experimental Verification of Stability Region of Balancing a Single-wheel Robot: an Inverted Stick Model Approach

Locally Weighted Learning for Control. Alexander Skoglund Machine Learning Course AASS, June 2005

ARTICLE IN PRESS Neural Networks ( )

Hierarchical Reinforcement Learning with Movement Primitives

Control 2. Keypoints: Given desired behaviour, determine control signals Inverse models:

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL

Two steps Natural Actor Critic Learning for Underwater Cable Tracking

Generalization of Example Movements with Dynamic Systems

Elastic Bands: Connecting Path Planning and Control

Kinesthetic Teaching via Fast Marching Square

Motion Recognition and Generation for Humanoid based on Visual-Somatic Field Mapping

Learning to Walk through Imitation

Policy gradient methods with model predictive control applied to ball bouncing

Imitation Learning of Robot Movement Using Evolutionary Algorithm

This is an author produced version of Definition and composition of motor primitives using latent force models and hidden Markov models.

On-line periodic movement and force-profile learning for adaptation to new surfaces

Distance-based Kernels for Dynamical Movement Primitives

IVR: Open- and Closed-Loop Control. M. Herrmann

Synthesis of Controllers for Stylized Planar Bipedal Walking

Shape-based coordination in locomotion control

Comparative Experiments on Task Space Control with Redundancy Resolution

On-line periodic movement and force-profile learning for adaptation to new surfaces

A Connectionist Learning Control Architecture for Navigation

A Brief Introduction to Reinforcement Learning

Written exams of Robotics 2

Self-Organization of Place Cells and Reward-Based Navigation for a Mobile Robot

Centipede Robot Locomotion

A Fuzzy Reinforcement Learning for a Ball Interception Problem

Simulation. x i. x i+1. degrees of freedom equations of motion. Newtonian laws gravity. ground contact forces

Synthesizing Goal-Directed Actions from a Library of Example Movements

arxiv: v2 [cs.ro] 8 Mar 2018

Learning Humanoid Motion Dynamics through Sensory-Motor Mapping in Reduced Dimensional Spaces

Robotics 2 Iterative Learning for Gravity Compensation

BEST2015 Autonomous Mobile Robots Lecture 2: Mobile Robot Kinematics and Control

10. Cartesian Trajectory Planning for Robot Manipulators

10703 Deep Reinforcement Learning and Control

YaMoR and Bluemove an autonomous modular robot with Bluetooth interface for exploring adaptive locomotion

Robotic Behaviors. Potential Field Methods

arxiv: v2 [cs.ro] 26 Feb 2015

Neuro-Dynamic Programming An Overview

In Homework 1, you determined the inverse dynamics model of the spinbot robot to be

Machine Learning on Physical Robots

Deep Reinforcement Learning

Reinforcement learning for imitating constrained reaching movements

Dynamic Controllers in Character Animation. Jack Wang

Transcription:

Stefan Schaal Computer Science & Neuroscience University of Southern California, Los Angeles & ATR Computational Neuroscience Laboratory Kyoto, Japan sschaal@usc.edu http://www-clmc.usc.edu Learning Motor Behaviors: Past & Present Work

Auke Ijspeert Aaron D Souza Jun Nakanishi Jan Peters Michael Mistry Dimitris Pongas Joint Work With:

How are Motor Skills Generated? A Question Shared by Biological and Robotics Research Movies from collaborations with C. Atkeson, S. Kotosaka, S. Vijayakumar

How are Motor Skills Generated? A Question Shared by Biological and Robotics Research Unfortunately, each of these skills required manual generation of representations control policies, and learning mechanisms Movies from collaborations with C. Akteson, S. Kotosaka, S. Vijayakumar

What Motor Behaviors Exist? Tracking Tasks e.g., tracing a figure-8 on a piece of paper Regulator Tasks e.g., balance control (pole balancing, biped balancing, helicopter hover) Discrete Tasks e.g., reach for a cup, tennis forehand, basket ball shot Periodic Tasks e.g., legged locomotion, swimming, dancing Complex sequences and superposition of the above e.g., assembly tasks, empty the dishwasher, playing tennis, almost every daily life behavior Level of Difficulty

Learning Motor Behaviors: Control Policies The General Goal of Motor Learning: Control Policies u(t)=p(x(t),t,å)

How Are Control Policies Used in Robotics? Direct Control (Model Free) Indirect Control (Model-Based)

Approaches to Learning Motor Behaviors in Robotics Supervised Learning direct inverse model learning, forward model learning distal teacher feedback error learning Reinforcement Learning value function-based approaches policy gradients Motor Primitives schemas, basis behaviors, units of actions, macros, options parameterized policies Imitation Learning learning a policy from observation learning the task goal from observation (inverses RL) learning an initial strategy for self-improvement Past to Present

Supervised Learning of Motor Behaviors Given: A parameterized policy A task goal A measure of (signed) error Usually applied to discrete tasks Goal Learn a task-level controller that produces the right motor command for the given goal from all initial conditions.

Supervised Learning of Motor Behaviors Approaches: Learn Task Models Direct Inverse Learning Forward Model Learning & Search Inverse Model u ff x desired Feedback Controller u fb Σ Robot y Jordan & Rumelhart Distal Teacher Feedback Error Learning Kawato

Supervised Learning of Motor Behaviors Example: Learning Devilsticking

Supervised Learning of Motor Behaviors Example: Learning Polebalancing

Approaches to Learning Motor Behaviors in Robotics Supervised Learning direct inverse model learning, forward model learning distal teacher feedback error learning Reinforcement Learning value function-based approaches policy gradients Motor Primitives schemas, basis behaviors, units of actions, macros, options parameterized policies Imitation Learning learning a policy from observation learning the task goal from observation (inverses RL) learning an initial strategy for self-improvement

Reinforcement Learning: Value Function Based Q-Learning or SARSA requires function approximation for the action value function usually only discrete actions considered only low dimensional robotic systems e.g., acrobot Qπ (x,u) = E{ r 1 + γ r 2 + γ 2 r 3 + x 0 = x,u 0 = u} Watkins; Sutton

Reinforcement Learning: Value Function Based RL in Continuous Time and Space continuous version of actor-critic systems closed form solution for optimal action for motor systems of the form: i.e., x = f ( x) + g( x)u u * g x ( ) T V x particularly useful for model-based RL T V π (x) = E{ r 1 + γ r 2 + γ 2 r 3 + x 0 = x} Doya, Morimoto, Kimura

Reinforcement Learning: Value Function Based RL in Continuous Time and Space: Example

Reinforcement Learning: Policy Gradients Motivation for Policy Gradients value function approximation is too hard in complex motor systems, thus avoid value function smooth policy improvement instead of greedy jumps even useful for hidden state systems useful for parsimoniously parameterized policies e.g., J π θ = X d π x θ π θ π + α J π ( ) π( u x) U θ θ ( ) b( x) ( Q π x,u )dudx Note that policy gradients can only achieve local optimization.

Reinforcement Learning: Policy Gradients Examples: Robot Peg-in-hole insertion, Tuning Biped Locomotion Tedrake Gullapalli more results available, e.g., see Andrew Ng, Drew Bagnell, etc. Benbrahim & Franklin

Approaches to Learning Motor Behaviors in Robotics Supervised Learning direct inverse model learning, forward model learning distal teacher feedback error learning Reinforcement Learning value function-based approaches policy gradients Motor Primitives schemas, basis behaviors, units of actions, macros, options parameterized policies Imitation Learning learning a policy from observation learning the task goal from observation (inverses RL) learning an initial strategy for self-improvement

Motor Primitives Motivation 1: Divide & Conquer Motivation 2: Suitable Parameterization u t ( ) = p x t ( ( ),t,α )

What is Good Motor Primitive? From the view of biological research Previous Su(estions Included: Organizational Principles 2/3 Power Law Piecewise Planarity Speed-Accuracy Tradeoff Optimization of Energy, Jerk, Torque Change, Motor Command Change, Task Variance, Stochastic Feedback Control, Effort, etc. Equilibrium Point/Trajectory Hypotheses VITE Model of trajectory planning Force Fields Pattern Generators and Dynamics System Theory Focusing mostly on coupling phenomena (e.g,. inter-limb, perception-action, intra-limb) and the necessary interaction of control and musculoskeletal dynamics Contraction Theory A version of control theory for modular control and many more

What is Good Motor Primitive? From the view of machine learning/robotics Previous Su(estions Included: hardcrafted basis behaviors that are of some level of generality e.g., flocking, dispersing, door finding, object pick-up, closed-loop policies, etc. automatic regular coarse partitioning of the world e.g., a very coarse grid, potentia,y with hidden state automatic detection of basis behaviors from examining the statistics of the world e.g., states with drastic changes of value gradients, states that are common on successful trials, etc.

Movement Primitives as Attractor Systems Note the similarity between a generic control policy ( ) = p x t u t ( ( ),t,α ) and nonlinear differential equations u( t) = x desired ( t) = p( x ( t),goal,α ) desired This view creates a natural distinction between two major movement classes: Rhythmic Movement Discrete Movement

Rhythmic & Discrete Movement Representation in the Brain PMdr M1,S1 BA40 BA7 BA44 BA47 DISC RETE-RHYTHMIC RHYTHMIC-DISCRETE Joint work with Dagmar Sternad, Rieko Osu, and Mitsuo Kawato Nature Neuroscience 7: 1137-1144, 2004

Movement Primitives as Attractor Systems: Goals x = f ( x,goal) A Class of Dynamic Systems that Can Code: Point-to-point and periodic behavior as their attractor Multi-dimensional systems that required phase locking Attractors that have rather complex shape (e.g., complex phase relationships, movement reversals) Learning and optimization Coupling phenomena Timing (without requiring explicit time) Generalization (structural equivalence for parameter changes) Robustness to disturbances and interactions with the environment Stability guarantees

A Dynamic Systems Model for Discrete Movement A learnable nonlinear point attractor with guaranteed stability properties Behavioral Phase v = α v β ( v g x) v x = α x v ( ) Nonlinear Function f ( x,v) z = α ( β ( z z g y) z) y = α y ( f ( x,v) + z)

A Dynamic Systems Model for Discrete Movement Use Gaussian Basis Functions to build nonlinear learning system Trajectory Plan Dynamics Canonical Dynamics Local Linear Model Approx. ( ( ) z) ( f ( x,v) + z) z = α z β z g y y = α y where ( ( ) v) v = α v β v g x x = α x v f ( x,v) = k i =1 k w i b i v i =1 w i w i = exp 1 2 d i Linear in learning parameters ( ) 2 x c i and x = x x 0 g x 0

An Example Desired Position Desired Velocity Basis Functions in Time Phase Velocity Phase

Extension to Periodic Systems A learnable nonlinear limit cycle attractor with guaranteed stability properties Behavioral Phase r = α r ϕ = ω ( A r) Nonlinear Function f ( r,ϕ) Phase Oscillator with amplitude A z = α ( z β ( z g y) z) y = α y ( f ( r,ϕ) + z)

Example: Policy Gradients with Movement Primitives Goal: Hit ball precisely Note: about 150 trials are needed.

Approaches to Learning Motor Behaviors in Robotics Supervised Learning direct inverse model learning, forward model learning distal teacher feedback error learning Reinforcement Learning value function-based approaches policy gradients Motor Primitives schemas, basis behaviors, units of actions, macros, options parameterized policies Imitation Learning learning a policy from observation learning the task goal from observation (inverses RL) learning an initial strategy for self-improvement

Imitation Learning What can be learned.om imitation? control policies (assume actions are observable) internal models reward criteria e.g., inverse reinforcement learning (Ng et al.) use demonstration as soft-constraint value functions

Imitation Learning: Example Learning an internal model.om demonstration

Imitation Learning: Example Using the demonstrated behavior as soft-constraint

Given: Imitation Learning with Motor A desired Canonical trajectory Dynamics Algorithm Trajectory Plan Dynamics y, y, demo demo Primitives y demo k Local Linear Extract movement duration and movement w i goal i =1 Model Approx. Adjust time constants of canonical dynamics to movement duration w i = exp 1 Using Locally Weighted Learning to solve 2 d i ( x c i ) 2 and x = x x 0 nonlinear function g x 0 approximation problem y target = ( ( ) z) ( f ( x,v) + z) z = α z β z g y y = α y where ( ( ) v) v = α v β v g x x = α x v f ( x,v) = Note: This is a one-shot w i b i v y demo z = f x,v α y learning problem, i.e., ( ) where z can be calculated by integrating the differential equation with desired trajectory information k i =1 no iterations!

Example: A Tennis Forehand as a Movement Primitive

Example: A Tennis Forehand as a Dynamic Primitive

Example: Various Rhythmic Movement Primitives

Example: Imitation Learning with Self-Improvement Goal: Hit ball precisely Note: about 150 trials are needed.

Movement Primitives for Planar Walking

Coupling of Mechanics and Control

Movement Primitives in Interaction with Sound

Discussion The amount of learning research in manipulator robotics is poor! Reinforcement Learning in this domain is very hard! Finding good reward functions is hard! Policy gradients are of some use, at the cost of giving up global optimality and the discovery of new strategies Imitation Learning is great for initializing policies We, designed motor primitives can facilitate learning tremendously. But no autonomous learning.amework yet...