Learning Motor Behaviors: Past & Present Work

Size: px
Start display at page:

Download "Learning Motor Behaviors: Past & Present Work"

Transcription

1 Stefan Schaal Computer Science & Neuroscience University of Southern California, Los Angeles & ATR Computational Neuroscience Laboratory Kyoto, Japan Learning Motor Behaviors: Past & Present Work

2 Auke Ijspeert Aaron D Souza Jun Nakanishi Jan Peters Michael Mistry Dimitris Pongas Joint Work With:

3 How are Motor Skills Generated? A Question Shared by Biological and Robotics Research Movies from collaborations with C. Atkeson, S. Kotosaka, S. Vijayakumar

4 How are Motor Skills Generated? A Question Shared by Biological and Robotics Research Unfortunately, each of these skills required manual generation of representations control policies, and learning mechanisms Movies from collaborations with C. Akteson, S. Kotosaka, S. Vijayakumar

5 What Motor Behaviors Exist? Tracking Tasks e.g., tracing a figure-8 on a piece of paper Regulator Tasks e.g., balance control (pole balancing, biped balancing, helicopter hover) Discrete Tasks e.g., reach for a cup, tennis forehand, basket ball shot Periodic Tasks e.g., legged locomotion, swimming, dancing Complex sequences and superposition of the above e.g., assembly tasks, empty the dishwasher, playing tennis, almost every daily life behavior Level of Difficulty

6 Learning Motor Behaviors: Control Policies The General Goal of Motor Learning: Control Policies u(t)=p(x(t),t,å)

7 How Are Control Policies Used in Robotics? Direct Control (Model Free) Indirect Control (Model-Based)

8 Approaches to Learning Motor Behaviors in Robotics Supervised Learning direct inverse model learning, forward model learning distal teacher feedback error learning Reinforcement Learning value function-based approaches policy gradients Motor Primitives schemas, basis behaviors, units of actions, macros, options parameterized policies Imitation Learning learning a policy from observation learning the task goal from observation (inverses RL) learning an initial strategy for self-improvement Past to Present

9 Supervised Learning of Motor Behaviors Given: A parameterized policy A task goal A measure of (signed) error Usually applied to discrete tasks Goal Learn a task-level controller that produces the right motor command for the given goal from all initial conditions.

10 Supervised Learning of Motor Behaviors Approaches: Learn Task Models Direct Inverse Learning Forward Model Learning & Search Inverse Model u ff x desired Feedback Controller u fb Σ Robot y Jordan & Rumelhart Distal Teacher Feedback Error Learning Kawato

11 Supervised Learning of Motor Behaviors Example: Learning Devilsticking

12 Supervised Learning of Motor Behaviors Example: Learning Polebalancing

13 Approaches to Learning Motor Behaviors in Robotics Supervised Learning direct inverse model learning, forward model learning distal teacher feedback error learning Reinforcement Learning value function-based approaches policy gradients Motor Primitives schemas, basis behaviors, units of actions, macros, options parameterized policies Imitation Learning learning a policy from observation learning the task goal from observation (inverses RL) learning an initial strategy for self-improvement

14 Reinforcement Learning: Value Function Based Q-Learning or SARSA requires function approximation for the action value function usually only discrete actions considered only low dimensional robotic systems e.g., acrobot Qπ (x,u) = E{ r 1 + γ r 2 + γ 2 r 3 + x 0 = x,u 0 = u} Watkins; Sutton

15 Reinforcement Learning: Value Function Based RL in Continuous Time and Space continuous version of actor-critic systems closed form solution for optimal action for motor systems of the form: i.e., x = f ( x) + g( x)u u * g x ( ) T V x particularly useful for model-based RL T V π (x) = E{ r 1 + γ r 2 + γ 2 r 3 + x 0 = x} Doya, Morimoto, Kimura

16 Reinforcement Learning: Value Function Based RL in Continuous Time and Space: Example

17 Reinforcement Learning: Policy Gradients Motivation for Policy Gradients value function approximation is too hard in complex motor systems, thus avoid value function smooth policy improvement instead of greedy jumps even useful for hidden state systems useful for parsimoniously parameterized policies e.g., J π θ = X d π x θ π θ π + α J π ( ) π( u x) U θ θ ( ) b( x) ( Q π x,u )dudx Note that policy gradients can only achieve local optimization.

18 Reinforcement Learning: Policy Gradients Examples: Robot Peg-in-hole insertion, Tuning Biped Locomotion Tedrake Gullapalli more results available, e.g., see Andrew Ng, Drew Bagnell, etc. Benbrahim & Franklin

19 Approaches to Learning Motor Behaviors in Robotics Supervised Learning direct inverse model learning, forward model learning distal teacher feedback error learning Reinforcement Learning value function-based approaches policy gradients Motor Primitives schemas, basis behaviors, units of actions, macros, options parameterized policies Imitation Learning learning a policy from observation learning the task goal from observation (inverses RL) learning an initial strategy for self-improvement

20 Motor Primitives Motivation 1: Divide & Conquer Motivation 2: Suitable Parameterization u t ( ) = p x t ( ( ),t,α )

21 What is Good Motor Primitive? From the view of biological research Previous Su(estions Included: Organizational Principles 2/3 Power Law Piecewise Planarity Speed-Accuracy Tradeoff Optimization of Energy, Jerk, Torque Change, Motor Command Change, Task Variance, Stochastic Feedback Control, Effort, etc. Equilibrium Point/Trajectory Hypotheses VITE Model of trajectory planning Force Fields Pattern Generators and Dynamics System Theory Focusing mostly on coupling phenomena (e.g,. inter-limb, perception-action, intra-limb) and the necessary interaction of control and musculoskeletal dynamics Contraction Theory A version of control theory for modular control and many more

22 What is Good Motor Primitive? From the view of machine learning/robotics Previous Su(estions Included: hardcrafted basis behaviors that are of some level of generality e.g., flocking, dispersing, door finding, object pick-up, closed-loop policies, etc. automatic regular coarse partitioning of the world e.g., a very coarse grid, potentia,y with hidden state automatic detection of basis behaviors from examining the statistics of the world e.g., states with drastic changes of value gradients, states that are common on successful trials, etc.

23 Movement Primitives as Attractor Systems Note the similarity between a generic control policy ( ) = p x t u t ( ( ),t,α ) and nonlinear differential equations u( t) = x desired ( t) = p( x ( t),goal,α ) desired This view creates a natural distinction between two major movement classes: Rhythmic Movement Discrete Movement

24 Rhythmic & Discrete Movement Representation in the Brain PMdr M1,S1 BA40 BA7 BA44 BA47 DISC RETE-RHYTHMIC RHYTHMIC-DISCRETE Joint work with Dagmar Sternad, Rieko Osu, and Mitsuo Kawato Nature Neuroscience 7: , 2004

25 Movement Primitives as Attractor Systems: Goals x = f ( x,goal) A Class of Dynamic Systems that Can Code: Point-to-point and periodic behavior as their attractor Multi-dimensional systems that required phase locking Attractors that have rather complex shape (e.g., complex phase relationships, movement reversals) Learning and optimization Coupling phenomena Timing (without requiring explicit time) Generalization (structural equivalence for parameter changes) Robustness to disturbances and interactions with the environment Stability guarantees

26 A Dynamic Systems Model for Discrete Movement A learnable nonlinear point attractor with guaranteed stability properties Behavioral Phase v = α v β ( v g x) v x = α x v ( ) Nonlinear Function f ( x,v) z = α ( β ( z z g y) z) y = α y ( f ( x,v) + z)

27 A Dynamic Systems Model for Discrete Movement Use Gaussian Basis Functions to build nonlinear learning system Trajectory Plan Dynamics Canonical Dynamics Local Linear Model Approx. ( ( ) z) ( f ( x,v) + z) z = α z β z g y y = α y where ( ( ) v) v = α v β v g x x = α x v f ( x,v) = k i =1 k w i b i v i =1 w i w i = exp 1 2 d i Linear in learning parameters ( ) 2 x c i and x = x x 0 g x 0

28 An Example Desired Position Desired Velocity Basis Functions in Time Phase Velocity Phase

29 Extension to Periodic Systems A learnable nonlinear limit cycle attractor with guaranteed stability properties Behavioral Phase r = α r ϕ = ω ( A r) Nonlinear Function f ( r,ϕ) Phase Oscillator with amplitude A z = α ( z β ( z g y) z) y = α y ( f ( r,ϕ) + z)

30 Example: Policy Gradients with Movement Primitives Goal: Hit ball precisely Note: about 150 trials are needed.

31 Approaches to Learning Motor Behaviors in Robotics Supervised Learning direct inverse model learning, forward model learning distal teacher feedback error learning Reinforcement Learning value function-based approaches policy gradients Motor Primitives schemas, basis behaviors, units of actions, macros, options parameterized policies Imitation Learning learning a policy from observation learning the task goal from observation (inverses RL) learning an initial strategy for self-improvement

32 Imitation Learning What can be learned.om imitation? control policies (assume actions are observable) internal models reward criteria e.g., inverse reinforcement learning (Ng et al.) use demonstration as soft-constraint value functions

33 Imitation Learning: Example Learning an internal model.om demonstration

34 Imitation Learning: Example Using the demonstrated behavior as soft-constraint

35 Given: Imitation Learning with Motor A desired Canonical trajectory Dynamics Algorithm Trajectory Plan Dynamics y, y, demo demo Primitives y demo k Local Linear Extract movement duration and movement w i goal i =1 Model Approx. Adjust time constants of canonical dynamics to movement duration w i = exp 1 Using Locally Weighted Learning to solve 2 d i ( x c i ) 2 and x = x x 0 nonlinear function g x 0 approximation problem y target = ( ( ) z) ( f ( x,v) + z) z = α z β z g y y = α y where ( ( ) v) v = α v β v g x x = α x v f ( x,v) = Note: This is a one-shot w i b i v y demo z = f x,v α y learning problem, i.e., ( ) where z can be calculated by integrating the differential equation with desired trajectory information k i =1 no iterations!

36 Example: A Tennis Forehand as a Movement Primitive

37 Example: A Tennis Forehand as a Dynamic Primitive

38 Example: Various Rhythmic Movement Primitives

39 Example: Imitation Learning with Self-Improvement Goal: Hit ball precisely Note: about 150 trials are needed.

40 Movement Primitives for Planar Walking

41 Coupling of Mechanics and Control

42 Movement Primitives in Interaction with Sound

43 Discussion The amount of learning research in manipulator robotics is poor! Reinforcement Learning in this domain is very hard! Finding good reward functions is hard! Policy gradients are of some use, at the cost of giving up global optimality and the discovery of new strategies Imitation Learning is great for initializing policies We, designed motor primitives can facilitate learning tremendously. But no autonomous learning.amework yet...

Learning to bounce a ball with a robotic arm

Learning to bounce a ball with a robotic arm Eric Wolter TU Darmstadt Thorsten Baark TU Darmstadt Abstract Bouncing a ball is a fun and challenging task for humans. It requires fine and complex motor controls and thus is an interesting problem for

More information

Applying the Episodic Natural Actor-Critic Architecture to Motor Primitive Learning

Applying the Episodic Natural Actor-Critic Architecture to Motor Primitive Learning Applying the Episodic Natural Actor-Critic Architecture to Motor Primitive Learning Jan Peters 1, Stefan Schaal 1 University of Southern California, Los Angeles CA 90089, USA Abstract. In this paper, we

More information

Movement Imitation with Nonlinear Dynamical Systems in Humanoid Robots

Movement Imitation with Nonlinear Dynamical Systems in Humanoid Robots Movement Imitation with Nonlinear Dynamical Systems in Humanoid Robots Auke Jan Ijspeert & Jun Nakanishi & Stefan Schaal Computational Learning and Motor Control Laboratory University of Southern California,

More information

Robot learning for ball bouncing

Robot learning for ball bouncing Robot learning for ball bouncing Denny Dittmar Denny.Dittmar@stud.tu-darmstadt.de Bernhard Koch Bernhard.Koch@stud.tu-darmstadt.de Abstract For robots automatically learning to solve a given task is still

More information

Teaching a robot to perform a basketball shot using EM-based reinforcement learning methods

Teaching a robot to perform a basketball shot using EM-based reinforcement learning methods Teaching a robot to perform a basketball shot using EM-based reinforcement learning methods Tobias Michels TU Darmstadt Aaron Hochländer TU Darmstadt Abstract In this paper we experiment with reinforcement

More information

Learning and Generalization of Motor Skills by Learning from Demonstration

Learning and Generalization of Motor Skills by Learning from Demonstration Learning and Generalization of Motor Skills by Learning from Demonstration Peter Pastor, Heiko Hoffmann, Tamim Asfour, and Stefan Schaal Abstract We provide a general approach for learning robotic motor

More information

The Organization of Cortex-Ganglia-Thalamus to Generate Movements From Motor Primitives: a Model for Developmental Robotics

The Organization of Cortex-Ganglia-Thalamus to Generate Movements From Motor Primitives: a Model for Developmental Robotics The Organization of Cortex-Ganglia-Thalamus to Generate Movements From Motor Primitives: a Model for Developmental Robotics Alessio Mauro Franchi 1, Danilo Attuario 2, and Giuseppina Gini 3 Department

More information

Reinforcement Learning to Adjust Robot Movements to New Situations

Reinforcement Learning to Adjust Robot Movements to New Situations Reinforcement Learning to Adjust Robot Movements to New Situations Jens Kober MPI Tübingen, Germany jens.kober@tuebingen.mpg.de Erhan Oztop ATR, Japan erhan@atr.jp Jan Peters MPI Tübingen, Germany jan.peters@tuebingen.mpg.de

More information

Reinforcement Learning to Adjust Robot Movements to New Situations

Reinforcement Learning to Adjust Robot Movements to New Situations Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Reinforcement Learning to Adjust Robot Movements to New Situations Jens Kober MPI Tübingen, Germany jens.kober@tuebingen.mpg.de

More information

Introduction to Reinforcement Learning. J. Zico Kolter Carnegie Mellon University

Introduction to Reinforcement Learning. J. Zico Kolter Carnegie Mellon University Introduction to Reinforcement Learning J. Zico Kolter Carnegie Mellon University 1 Agent interaction with environment Agent State s Reward r Action a Environment 2 Of course, an oversimplification 3 Review:

More information

Learning to Select and Generalize Striking Movements in Robot Table Tennis

Learning to Select and Generalize Striking Movements in Robot Table Tennis AAAI Technical Report FS-12-07 Robots Learning Interactively from Human Teachers Learning to Select and Generalize Striking Movements in Robot Table Tennis Katharina Muelling 1,2 and Jens Kober 1,2 and

More information

Natural Actor-Critic. Authors: Jan Peters and Stefan Schaal Neurocomputing, Cognitive robotics 2008/2009 Wouter Klijn

Natural Actor-Critic. Authors: Jan Peters and Stefan Schaal Neurocomputing, Cognitive robotics 2008/2009 Wouter Klijn Natural Actor-Critic Authors: Jan Peters and Stefan Schaal Neurocomputing, 2008 Cognitive robotics 2008/2009 Wouter Klijn Content Content / Introduction Actor-Critic Natural gradient Applications Conclusion

More information

Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields

Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields 8 8 th IEEE-RAS International Conference on Humanoid Robots December 1 ~ 3, 8 / Daejeon, Korea Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields Dae-Hyung

More information

Warsaw University of Technology. Optimizing Walking of a Humanoid Robot using Reinforcement Learning

Warsaw University of Technology. Optimizing Walking of a Humanoid Robot using Reinforcement Learning Warsaw University of Technology Faculty of Power and Aeronautical Engineering Division of Theory of Machines and Robots MS Thesis Optimizing Walking of a Humanoid Robot using Reinforcement Learning by

More information

Feedback Error Learning for Gait Acquisition

Feedback Error Learning for Gait Acquisition Feedback Error Learning for Gait Acquisition Master-Thesis von Nakul Gopalan November 22 Fachbereiche ETIT und Informatik Intelligent Autonomous Systems Feedback Error Learning for Gait Acquisition Vorgelegte

More information

Apprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang

Apprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Apprenticeship Learning for Reinforcement Learning with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Table of Contents Introduction Theory Autonomous helicopter control

More information

Learning of a ball-in-a-cup playing robot

Learning of a ball-in-a-cup playing robot Learning of a ball-in-a-cup playing robot Bojan Nemec, Matej Zorko, Leon Žlajpah Robotics Laboratory, Jožef Stefan Institute Jamova 39, 1001 Ljubljana, Slovenia E-mail: bojannemec@ijssi Abstract In the

More information

Imitation and Reinforcement Learning for Motor Primitives with Perceptual Coupling

Imitation and Reinforcement Learning for Motor Primitives with Perceptual Coupling Imitation and Reinforcement Learning for Motor Primitives with Perceptual Coupling Jens Kober, Betty Mohler, Jan Peters Abstract Traditional motor primitive approaches deal largely with open-loop policies

More information

Learning Inverse Dynamics: a Comparison

Learning Inverse Dynamics: a Comparison Learning Inverse Dynamics: a Comparison Duy Nguyen-Tuong, Jan Peters, Matthias Seeger, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics Spemannstraße 38, 72076 Tübingen - Germany Abstract.

More information

Model learning for robot control: a survey

Model learning for robot control: a survey Model learning for robot control: a survey Duy Nguyen-Tuong, Jan Peters 2011 Presented by Evan Beachly 1 Motivation Robots that can learn how their motors move their body Complexity Unanticipated Environments

More information

Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach

Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach Christopher G. Atkeson Robotics Institute and HCII Carnegie Mellon University Pittsburgh, PA 15213, USA cga@cmu.edu

More information

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 12: Deep Reinforcement Learning Types of Learning Supervised training Learning from the teacher Training data includes

More information

Accelerating Synchronization of Movement Primitives: Dual-Arm Discrete-Periodic Motion of a Humanoid Robot

Accelerating Synchronization of Movement Primitives: Dual-Arm Discrete-Periodic Motion of a Humanoid Robot 5 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Congress Center Hamburg Sept 8 - Oct, 5. Hamburg, Germany Accelerating Synchronization of Movement Primitives: Dual-Arm Discrete-Periodic

More information

ICRA 2012 Tutorial on Reinforcement Learning I. Introduction

ICRA 2012 Tutorial on Reinforcement Learning I. Introduction ICRA 2012 Tutorial on Reinforcement Learning I. Introduction Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt Motivational Example: Helicopter Control Unstable Nonlinear Complicated dynamics Air flow

More information

Reducing Hardware Experiments for Model Learning and Policy Optimization

Reducing Hardware Experiments for Model Learning and Policy Optimization Reducing Hardware Experiments for Model Learning and Policy Optimization Sehoon Ha 1 and Katsu Yamane 2 Abstract Conducting hardware experiment is often expensive in various aspects such as potential damage

More information

Inverse KKT Motion Optimization: A Newton Method to Efficiently Extract Task Spaces and Cost Parameters from Demonstrations

Inverse KKT Motion Optimization: A Newton Method to Efficiently Extract Task Spaces and Cost Parameters from Demonstrations Inverse KKT Motion Optimization: A Newton Method to Efficiently Extract Task Spaces and Cost Parameters from Demonstrations Peter Englert Machine Learning and Robotics Lab Universität Stuttgart Germany

More information

Learning CPG-based biped locomotion with a policy gradient method

Learning CPG-based biped locomotion with a policy gradient method Robotics and Autonomous Systems 54 (2006) 911 920 www.elsevier.com/locate/robot Learning CPG-based biped locomotion with a policy gradient method Takamitsu Matsubara a,c,, Jun Morimoto b,c, Jun Nakanishi

More information

Explaining and Exploiting Impedance Modulation in Motor Control

Explaining and Exploiting Impedance Modulation in Motor Control Explaining and Exploiting Impedance Modulation in Motor Control Professor Sethu Vijayakumar FRSE Microsoft Research RAEng Chair in Robotics University of Edinburgh, UK http://homepages.inf.ed.ac.uk/svijayak

More information

Problem characteristics. Dynamic Optimization. Examples. Policies vs. Trajectories. Planning using dynamic optimization. Dynamic Optimization Issues

Problem characteristics. Dynamic Optimization. Examples. Policies vs. Trajectories. Planning using dynamic optimization. Dynamic Optimization Issues Problem characteristics Planning using dynamic optimization Chris Atkeson 2004 Want optimal plan, not just feasible plan We will minimize a cost function C(execution). Some examples: C() = c T (x T ) +

More information

Learning Task-Space Tracking Control with Kernels

Learning Task-Space Tracking Control with Kernels Learning Task-Space Tracking Control with Kernels Duy Nguyen-Tuong 1, Jan Peters 2 Max Planck Institute for Intelligent Systems, Spemannstraße 38, 72076 Tübingen 1,2 Universität Darmstadt, Intelligent

More information

Neuro-Fuzzy Inverse Forward Models

Neuro-Fuzzy Inverse Forward Models CS9 Autumn Neuro-Fuzzy Inverse Forward Models Brian Highfill Stanford University Department of Computer Science Abstract- Internal cognitive models are useful methods for the implementation of motor control

More information

Learning Multiple Models of Non-Linear Dynamics for Control under Varying Contexts

Learning Multiple Models of Non-Linear Dynamics for Control under Varying Contexts Learning Multiple Models of Non-Linear Dynamics for Control under Varying Contexts Georgios Petkos, Marc Toussaint, and Sethu Vijayakumar Institute of Perception, Action and Behaviour, School of Informatics

More information

Reinforcement Learning for Parameterized Motor Primitives

Reinforcement Learning for Parameterized Motor Primitives 2006 International Joint Conference on Neural Networks Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada July 16-21, 2006 Reinforcement Learning for Parameterized Motor Primitives Jan Peters

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient I Used Materials Disclaimer: Much of the material and slides for this lecture

More information

CONTROLO th Portuguese Conference on Automatic Control Instituto Superior Técnico, Lisboa, Portugal September 11-13, 2006

CONTROLO th Portuguese Conference on Automatic Control Instituto Superior Técnico, Lisboa, Portugal September 11-13, 2006 CONTROLO 6 7th Portuguese Conference on Automat Control Instituto Superior Técno, Lisboa, Portugal September -3, 6 TEMPORAL COORDINATION OF SIMULATED TIMED TRAJECTORIES FOR TWO VISION-GUIDED VEHICLES:

More information

Learning Complex Motions by Sequencing Simpler Motion Templates

Learning Complex Motions by Sequencing Simpler Motion Templates Learning Complex Motions by Sequencing Simpler Motion Templates Gerhard Neumann gerhard@igi.tugraz.at Wolfgang Maass maass@igi.tugraz.at Institute for Theoretical Computer Science, Graz University of Technology,

More information

Generalized Inverse Reinforcement Learning

Generalized Inverse Reinforcement Learning Generalized Inverse Reinforcement Learning James MacGlashan Cogitai, Inc. james@cogitai.com Michael L. Littman mlittman@cs.brown.edu Nakul Gopalan ngopalan@cs.brown.edu Amy Greenwald amy@cs.brown.edu Abstract

More information

Marco Wiering Intelligent Systems Group Utrecht University

Marco Wiering Intelligent Systems Group Utrecht University Reinforcement Learning for Robot Control Marco Wiering Intelligent Systems Group Utrecht University marco@cs.uu.nl 22-11-2004 Introduction Robots move in the physical environment to perform tasks The environment

More information

Reinforcement Learning to Adjust Parametrized Motor Primitives to New Situations

Reinforcement Learning to Adjust Parametrized Motor Primitives to New Situations Noname manuscript No. (will be inserted by the editor) Reinforcement Learning to Adjust Parametrized Motor Primitives to New Situations JensKober AndreasWilhelm Erhan Oztop JanPeters Received: 2 May 2

More information

Locally Weighted Learning

Locally Weighted Learning Locally Weighted Learning Peter Englert Department of Computer Science TU Darmstadt englert.peter@gmx.de Abstract Locally Weighted Learning is a class of function approximation techniques, where a prediction

More information

Reinforcement le,arning of dynamic motor sequence: Learning to stand up

Reinforcement le,arning of dynamic motor sequence: Learning to stand up Proceedings of the 1998 EEERSJ Intl. Conference on Intelligent Robots and Systems Victoria, B.C., Canada October 1998 Reinforcement le,arning of dynamic motor sequence: Learning to stand up Jun Morimoto

More information

Quadruped Robots and Legged Locomotion

Quadruped Robots and Legged Locomotion Quadruped Robots and Legged Locomotion J. Zico Kolter Computer Science Department Stanford University Joint work with Pieter Abbeel, Andrew Ng Why legged robots? 1 Why Legged Robots? There is a need for

More information

Data-Efficient Generalization of Robot Skills with Contextual Policy Search

Data-Efficient Generalization of Robot Skills with Contextual Policy Search Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence Data-Efficient Generalization of Robot Skills with Contextual Policy Search Andras Gabor Kupcsik Department of Electrical and

More information

arxiv: v1 [cs.ro] 10 May 2014

arxiv: v1 [cs.ro] 10 May 2014 Efficient Reuse of Previous Experiences to Improve Policies in Real Environment arxiv:1405.2406v1 [cs.ro] 10 May 2014 Norikazu Sugimoto 1,3, Voot Tangkaratt 2, Thijs Wensveen 4, Tingting Zhao 2, Masashi

More information

Online movement adaptation based on previous sensor experiences

Online movement adaptation based on previous sensor experiences 211 IEEE/RSJ International Conference on Intelligent Robots and Systems September 25-3, 211. San Francisco, CA, USA Online movement adaptation based on previous sensor experiences Peter Pastor, Ludovic

More information

Learning Parameterized Skills

Learning Parameterized Skills Bruno Castro da Silva bsilva@cs.umass.edu Autonomous Learning Laboratory, Computer Science Dept., University of Massachusetts Amherst, 13 USA. George Konidaris gdk@csail.mit.edu MIT Computer Science and

More information

Self-Reconfigurable Robots for Space Exploration Effect of compliance in modular robots structures on the locomotion Adi Vardi

Self-Reconfigurable Robots for Space Exploration Effect of compliance in modular robots structures on the locomotion Adi Vardi Self-Reconfigurable Robots for Space Exploration Effect of compliance in modular robots structures on the locomotion Adi Vardi Prof. Auke Jan Ijspeert Stéphane Bonardi, Simon Hauser, Mehmet Mutlu, Massimo

More information

Reinforcement Learning of Clothing Assistance with a Dual-arm Robot

Reinforcement Learning of Clothing Assistance with a Dual-arm Robot Reinforcement Learning of Clothing Assistance with a Dual-arm Robot Tomoya Tamei, Takamitsu Matsubara, Akshara Rai, Tomohiro Shibata Graduate School of Information Science,Nara Institute of Science and

More information

Rich Periodic Motor Skills on Humanoid Robots: Riding the Pedal Racer

Rich Periodic Motor Skills on Humanoid Robots: Riding the Pedal Racer Rich Periodic Motor Skills on Humanoid Robots: Riding the Pedal Racer Andrej Gams 1,2, Jesse van den Kieboom 1, Massimo Vespignani 1, Luc Guyot 1, Aleš Ude 2 and Auke Ijspeert 1 Abstract Just as their

More information

On Movement Skill Learning and Movement Representations for Robotics

On Movement Skill Learning and Movement Representations for Robotics On Movement Skill Learning and Movement Representations for Robotics Gerhard Neumann 1 1 Graz University of Technology, Institute for Theoretical Computer Science November 2, 2011 Modern Robotic Systems:

More information

Visual Servoing for Floppy Robots Using LWPR

Visual Servoing for Floppy Robots Using LWPR Visual Servoing for Floppy Robots Using LWPR Fredrik Larsson Erik Jonsson Michael Felsberg Abstract We have combined inverse kinematics learned by LWPR with visual servoing to correct for inaccuracies

More information

Learned parametrized dynamic movement primitives with shared synergies for controlling robotic and musculoskeletal systems

Learned parametrized dynamic movement primitives with shared synergies for controlling robotic and musculoskeletal systems COMPUTATIONAL NEUROSCIENCE ORIGINAL RESEARCH ARTICLE published: 17 October 2013 doi: 10.3389/fncom.2013.00138 Learned parametrized dynamic movement primitives with shared synergies for controlling robotic

More information

Parametric Primitives for Motor Representation and Control

Parametric Primitives for Motor Representation and Control Proceedings, IEEE International Conference on Robotics and Automation (ICRA-2002) volume 1, pages 863-868, Washington DC, May 11-15, 2002 Parametric Primitives for Motor Representation and Control R. Amit

More information

Movement Learning and Control for Robots in Interaction

Movement Learning and Control for Robots in Interaction Movement Learning and Control for Robots in Interaction Dr.-Ing. Michael Gienger Honda Research Institute Europe Carl-Legien-Strasse 30 63073 Offenbach am Main Seminar Talk Machine Learning Lab, University

More information

Nonlinear Dynamical Systems for Imitation with Humanoid Robots

Nonlinear Dynamical Systems for Imitation with Humanoid Robots Nonlinear Dynamical Systems for Imitation with Humanoid Robots Auke Jan Ijspeert, & Jun Nakanishi & Tomohiro Shibata & Stefan Schaal, Computational Learning and Motor Control Laboratory University of Southern

More information

Experimental Verification of Stability Region of Balancing a Single-wheel Robot: an Inverted Stick Model Approach

Experimental Verification of Stability Region of Balancing a Single-wheel Robot: an Inverted Stick Model Approach IECON-Yokohama November 9-, Experimental Verification of Stability Region of Balancing a Single-wheel Robot: an Inverted Stick Model Approach S. D. Lee Department of Mechatronics Engineering Chungnam National

More information

Locally Weighted Learning for Control. Alexander Skoglund Machine Learning Course AASS, June 2005

Locally Weighted Learning for Control. Alexander Skoglund Machine Learning Course AASS, June 2005 Locally Weighted Learning for Control Alexander Skoglund Machine Learning Course AASS, June 2005 Outline Locally Weighted Learning, Christopher G. Atkeson et. al. in Artificial Intelligence Review, 11:11-73,1997

More information

ARTICLE IN PRESS Neural Networks ( )

ARTICLE IN PRESS Neural Networks ( ) Neural Networks ( ) Contents lists available at ScienceDirect Neural Networks journal homepage: www.elsevier.com/locate/neunet 2008 Special Issue Reinforcement learning of motor skills with policy gradients

More information

Hierarchical Reinforcement Learning with Movement Primitives

Hierarchical Reinforcement Learning with Movement Primitives Hierarchical Reinforcement Learning with Movement Primitives Freek Stulp, Stefan Schaal Computational Learning and Motor Control Lab University of Southern California, Los Angeles, CA 90089, USA stulp@clmc.usc.edu,

More information

Control 2. Keypoints: Given desired behaviour, determine control signals Inverse models:

Control 2. Keypoints: Given desired behaviour, determine control signals Inverse models: Control 2 Keypoints: Given desired behaviour, determine control signals Inverse models: Inverting the forward model for simple linear dynamic system Problems for more complex systems Open loop control:

More information

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY BHARAT SIGINAM IN

More information

Two steps Natural Actor Critic Learning for Underwater Cable Tracking

Two steps Natural Actor Critic Learning for Underwater Cable Tracking 2 IEEE International Conference on Robotics and Automation Anchorage Convention District May 3-8, 2, Anchorage, Alaska, USA Two steps Natural Actor Critic Learning for Underwater Cable Tracking Andres

More information

Generalization of Example Movements with Dynamic Systems

Generalization of Example Movements with Dynamic Systems 9th IEEE-RAS International Conference on Humanoid Robots December 7-1, 29 Paris, France Generalization of Example Movements with Dynamic Systems Andrej Gams and Aleš Ude Abstract In the past, nonlinear

More information

Elastic Bands: Connecting Path Planning and Control

Elastic Bands: Connecting Path Planning and Control Elastic Bands: Connecting Path Planning and Control Sean Quinlan and Oussama Khatib Robotics Laboratory Computer Science Department Stanford University Abstract Elastic bands are proposed as the basis

More information

Kinesthetic Teaching via Fast Marching Square

Kinesthetic Teaching via Fast Marching Square Kinesthetic Teaching via Fast Marching Square Javier V. Gómez, David Álvarez, Santiago Garrido and Luis Moreno Abstract This paper presents a novel robotic learning technique based on Fast Marching Square

More information

Motion Recognition and Generation for Humanoid based on Visual-Somatic Field Mapping

Motion Recognition and Generation for Humanoid based on Visual-Somatic Field Mapping Motion Recognition and Generation for Humanoid based on Visual-Somatic Field Mapping 1 Masaki Ogino, 1 Shigeo Matsuyama, 1 Jun ichiro Ooga, and 1, Minoru Asada 1 Dept. of Adaptive Machine Systems, HANDAI

More information

Learning to Walk through Imitation

Learning to Walk through Imitation Abstract Programming a humanoid robot to walk is a challenging problem in robotics. Traditional approaches rely heavily on prior knowledge of the robot's physical parameters to devise sophisticated control

More information

Policy gradient methods with model predictive control applied to ball bouncing

Policy gradient methods with model predictive control applied to ball bouncing Policy gradient methods with model predictive control applied to ball bouncing Paul Kulchenko Department of Computer Science and Engineering University of Washington, Seattle, WA Email: paul@kulchenko.com

More information

Imitation Learning of Robot Movement Using Evolutionary Algorithm

Imitation Learning of Robot Movement Using Evolutionary Algorithm Proceedings of the 17th World Congress The International Federation of Automatic Control Imitation Learning of Robot Movement Using Evolutionary Algorithm GaLam Park, Syungkwon Ra ChangHwan Kim JaeBok

More information

This is an author produced version of Definition and composition of motor primitives using latent force models and hidden Markov models.

This is an author produced version of Definition and composition of motor primitives using latent force models and hidden Markov models. This is an author produced version of Definition and composition of motor primitives using latent force models and hidden Markov models. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/116580/

More information

On-line periodic movement and force-profile learning for adaptation to new surfaces

On-line periodic movement and force-profile learning for adaptation to new surfaces On-line periodic movement and force-profile learning for adaptation to new surfaces Andrej Gams, Martin Do, Aleš Ude, Tamim Asfour and Rüdiger Dillmann Department of Automation, Biocybernetics and Robotics,

More information

Distance-based Kernels for Dynamical Movement Primitives

Distance-based Kernels for Dynamical Movement Primitives Distance-based Kernels for Dynamical Movement Primitives Diego ESCUDERO-RODRIGO a,1, René ALQUEZAR a, a Institut de Robotica i Informatica Industrial, CSIC-UPC, Spain Abstract. In the Anchoring Problem

More information

IVR: Open- and Closed-Loop Control. M. Herrmann

IVR: Open- and Closed-Loop Control. M. Herrmann IVR: Open- and Closed-Loop Control M. Herrmann Overview Open-loop control Feed-forward control Towards feedback control Controlling the motor over time Process model V B = k 1 s + M k 2 R ds dt Stationary

More information

Synthesis of Controllers for Stylized Planar Bipedal Walking

Synthesis of Controllers for Stylized Planar Bipedal Walking Synthesis of Controllers for Stylized Planar Bipedal Walking Dana Sharon, and Michiel van de Panne Department of Computer Science University of British Columbia Vancouver, BC, V6T 1Z4, Canada {dsharon,van}@cs.ubc.ca

More information

Shape-based coordination in locomotion control

Shape-based coordination in locomotion control Article Shape-based coordination in locomotion control The International Journal of Robotics Research 1 16 The Author(s) 2018 Reprints and permissions: sagepub.co.uk/journalspermissions.nav DOI: 10.1177/0278364918761569

More information

Comparative Experiments on Task Space Control with Redundancy Resolution

Comparative Experiments on Task Space Control with Redundancy Resolution Comparative Experiments on Task Space Control with Redundancy Resolution Jun Nakanishi,RickCory, Michael Mistry,JanPeters, and Stefan Schaal ICORP, Japan Science and Technology Agency, Kyoto, Japan ATR

More information

On-line periodic movement and force-profile learning for adaptation to new surfaces

On-line periodic movement and force-profile learning for adaptation to new surfaces 21 IEEE-RAS International Conference on Humanoid Robots Nashville, TN, USA, December 6-8, 21 On-line periodic movement and force-profile learning for adaptation to new surfaces Andrej Gams, Martin Do,

More information

A Connectionist Learning Control Architecture for Navigation

A Connectionist Learning Control Architecture for Navigation A Connectionist Learning Control Architecture for Navigation Jonathan R. Bachrach Department of Computer and Information Science University of Massachusetts Amherst, MA 01003 Abstract A novel learning

More information

A Brief Introduction to Reinforcement Learning

A Brief Introduction to Reinforcement Learning A Brief Introduction to Reinforcement Learning Minlie Huang ( ) Dept. of Computer Science, Tsinghua University aihuang@tsinghua.edu.cn 1 http://coai.cs.tsinghua.edu.cn/hml Reinforcement Learning Agent

More information

Written exams of Robotics 2

Written exams of Robotics 2 Written exams of Robotics 2 http://www.diag.uniroma1.it/~deluca/rob2_en.html All materials are in English, unless indicated (oldies are in Year Date (mm.dd) Number of exercises Topics 2018 07.11 4 Inertia

More information

Self-Organization of Place Cells and Reward-Based Navigation for a Mobile Robot

Self-Organization of Place Cells and Reward-Based Navigation for a Mobile Robot Self-Organization of Place Cells and Reward-Based Navigation for a Mobile Robot Takashi TAKAHASHI Toshio TANAKA Kenji NISHIDA Takio KURITA Postdoctoral Research Fellow of the Japan Society for the Promotion

More information

Centipede Robot Locomotion

Centipede Robot Locomotion Master Project Centipede Robot Locomotion Brian Jiménez García [brian.jimenez@epfl.ch] Supervisor: Auke Jan Ikspeert Biologically Inspired Robotics Group (BIRG) Swiss Federal Institute of Technology Lausanne

More information

A Fuzzy Reinforcement Learning for a Ball Interception Problem

A Fuzzy Reinforcement Learning for a Ball Interception Problem A Fuzzy Reinforcement Learning for a Ball Interception Problem Tomoharu Nakashima, Masayo Udo, and Hisao Ishibuchi Department of Industrial Engineering, Osaka Prefecture University Gakuen-cho 1-1, Sakai,

More information

Simulation. x i. x i+1. degrees of freedom equations of motion. Newtonian laws gravity. ground contact forces

Simulation. x i. x i+1. degrees of freedom equations of motion. Newtonian laws gravity. ground contact forces Dynamic Controllers Simulation x i Newtonian laws gravity ground contact forces x i+1. x degrees of freedom equations of motion Simulation + Control x i Newtonian laws gravity ground contact forces internal

More information

Synthesizing Goal-Directed Actions from a Library of Example Movements

Synthesizing Goal-Directed Actions from a Library of Example Movements Synthesizing Goal-Directed Actions from a Library of Example Movements Aleš Ude, Marcia Riley, Bojan Nemec, Andrej Kos, Tamim Asfour and Gordon Cheng Jožef Stefan Institute, Dept. of Automatics, Biocybernetics

More information

arxiv: v2 [cs.ro] 8 Mar 2018

arxiv: v2 [cs.ro] 8 Mar 2018 Learning Task-Specific Dynamics to Improve Whole-Body Control Andrej Gams 1, Sean A. Mason 2, Aleš Ude 1, Stefan Schaal 2,3 and Ludovic Righetti 3,4 arxiv:1803.01978v2 [cs.ro] 8 Mar 2018 Abstract In task-based

More information

Learning Humanoid Motion Dynamics through Sensory-Motor Mapping in Reduced Dimensional Spaces

Learning Humanoid Motion Dynamics through Sensory-Motor Mapping in Reduced Dimensional Spaces Learning Humanoid Motion Dynamics through Sensory-Motor Mapping in Reduced Dimensional Spaces Rawichote Chalodhorn, David B. Grimes, Gabriel Y. Maganis and, Rajesh P. N. Rao Neural Systems Laboratory Department

More information

Robotics 2 Iterative Learning for Gravity Compensation

Robotics 2 Iterative Learning for Gravity Compensation Robotics 2 Iterative Learning for Gravity Compensation Prof. Alessandro De Luca Control goal! regulation of arbitrary equilibium configurations in the presence of gravity! without explicit knowledge of

More information

BEST2015 Autonomous Mobile Robots Lecture 2: Mobile Robot Kinematics and Control

BEST2015 Autonomous Mobile Robots Lecture 2: Mobile Robot Kinematics and Control BEST2015 Autonomous Mobile Robots Lecture 2: Mobile Robot Kinematics and Control Renaud Ronsse renaud.ronsse@uclouvain.be École polytechnique de Louvain, UCLouvain July 2015 1 Introduction Mobile robot

More information

10. Cartesian Trajectory Planning for Robot Manipulators

10. Cartesian Trajectory Planning for Robot Manipulators V. Kumar 0. Cartesian rajectory Planning for obot Manipulators 0.. Introduction Given a starting end effector position and orientation and a goal position and orientation we want to generate a smooth trajectory

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient II Used Materials Disclaimer: Much of the material and slides for this lecture

More information

YaMoR and Bluemove an autonomous modular robot with Bluetooth interface for exploring adaptive locomotion

YaMoR and Bluemove an autonomous modular robot with Bluetooth interface for exploring adaptive locomotion YaMoR and Bluemove an autonomous modular robot with Bluetooth interface for exploring adaptive locomotion R. Moeckel, C. Jaquier, K. Drapel, E. Dittrich, A. Upegui, A. Ijspeert Ecole Polytechnique Fédérale

More information

Robotic Behaviors. Potential Field Methods

Robotic Behaviors. Potential Field Methods Robotic Behaviors Potential field techniques - trajectory generation - closed feedback-loop control Design of variety of behaviors - motivated by potential field based approach steering behaviors Closed

More information

arxiv: v2 [cs.ro] 26 Feb 2015

arxiv: v2 [cs.ro] 26 Feb 2015 Learning Contact-Rich Manipulation Skills with Guided Policy Search Sergey Levine, Nolan Wagener, Pieter Abbeel arxiv:1501.05611v2 [cs.ro] 26 Feb 2015 Abstract Autonomous learning of object manipulation

More information

Neuro-Dynamic Programming An Overview

Neuro-Dynamic Programming An Overview 1 Neuro-Dynamic Programming An Overview Dimitri Bertsekas Dept. of Electrical Engineering and Computer Science M.I.T. May 2006 2 BELLMAN AND THE DUAL CURSES Dynamic Programming (DP) is very broadly applicable,

More information

In Homework 1, you determined the inverse dynamics model of the spinbot robot to be

In Homework 1, you determined the inverse dynamics model of the spinbot robot to be Robot Learning Winter Semester 22/3, Homework 2 Prof. Dr. J. Peters, M.Eng. O. Kroemer, M. Sc. H. van Hoof Due date: Wed 6 Jan. 23 Note: Please fill in the solution on this sheet but add sheets for the

More information

Machine Learning on Physical Robots

Machine Learning on Physical Robots Machine Learning on Physical Robots Alfred P. Sloan Research Fellow Department or Computer Sciences The University of Texas at Austin Research Question To what degree can autonomous intelligent agents

More information

Deep Reinforcement Learning

Deep Reinforcement Learning Deep Reinforcement Learning 1 Outline 1. Overview of Reinforcement Learning 2. Policy Search 3. Policy Gradient and Gradient Estimators 4. Q-prop: Sample Efficient Policy Gradient and an Off-policy Critic

More information

Reinforcement learning for imitating constrained reaching movements

Reinforcement learning for imitating constrained reaching movements Advanced Robotics, Vol. 21, No. 13, pp. 1521 1544 (2007) VSP and Robotics Society of Japan 2007. Also available online - www.brill.nl/ar Full paper Reinforcement learning for imitating constrained reaching

More information

Dynamic Controllers in Character Animation. Jack Wang

Dynamic Controllers in Character Animation. Jack Wang Dynamic Controllers in Character Animation Jack Wang Overview Definition Related Work Composable Controllers Framework (2001) Results Future Work 2 Physics-based Animation Dynamic Controllers vs. Simulation

More information