Incremental Imitation Learning of Context-Dependent Motor Skills

Size: px

Start display at page:

Download "Incremental Imitation Learning of Context-Dependent Motor Skills"

Mark Copeland
5 years ago
Views:

Incremental Imitation Learning of Contet-Dependent Motor Skills Marco Ewerton1, Guilherme Maeda1, Gerrit Kollegger2, Josef Wiemeer2 and Jan Peters1,3 Abstract Teaching motor skills to robots through

1 Incremental Imitation Learning of Contet-Dependent Motor Skills Marco Ewerton1, Guilherme Maeda1, Gerrit Kollegger2, Josef Wiemeer2 and Jan Peters1,3 Abstract Teaching motor skills to robots through human demonstrations, an approach called imitation learning, is an alternative to hand coding each new robot behavior. Imitation learning is relativel cheap in terms of time and labor and is a promising route to give robots the necessar functionalities for a widespread use in households, stores, hospitals, etc. However, current imitation learning techniques struggle with a number of challenges that prevent their wide usabilit. For instance, robots might not be able to accuratel reproduce ever human demonstration and it is not alwas clear how robots should generalize a movement to new contets. This paper addresses those challenges b presenting a method to incrementall teach contet-dependent motor skills to robots. The human demonstrates trajectories for different contets b moving the links of the robot and partiall or full refines those trajectories b disturbing the movements of the robot while it eecutes the behavior it has learned so far. A joint probabilit distribution over trajectories and contets can then be built based on those demonstrations and refinements. Given a new contet, the robot computes the most probable trajector, which can also be refined b the human. The joint probabilit distribution is incrementall updated with the refined trajectories. We have evaluated our method with eperiments in which an elasticall actuated robot arm with four degrees of freedom learns how to reach a ball at different positions. I. I NTRODUCTION Since the beginning of the 198s, a large amount of research has been done to make robots capable of learning motor skills b imitating human demonstrations. This strateg, called imitation learning [1], might become a viable route to quickl train general-purpose robots to perform new tasks on demand in environments such as households, stores, hospitals, etc. However, a number of challenges hinder the widespread application of imitation learning in those environments. For eample, the robot might not be able to accuratel reproduce some of the movements demonstrated b the human and the robot should be able to generalize the motor skills it has learned so far to different contets. In those cases, it ma be helpful to structure the imitation learning as an incremental process. In this process, the human could incrementall refine the robot trajectories in order to make it accomplish a certain task or incrementall correct trajectories inferred b the robot 1 M. Ewerton, G. Maeda and J. Peters are with the Intelligent Autonomous Sstems group, department of Computer Science, Technische Universität Darmstadt, Hochschulstr. 1, Darmstadt, German {ewerton, maeda, peters}@ias.tu-darmstadt.de 2 G. Sport nenstr. Kollegger and J. Wiemeer with the Institute of Science, Technische Universität Darmstadt, Magdale27, Darmstadt, German {kollegger, wiemeer}@sport.tu-darmstadt.de 3 J. Peters is also with the Ma Planck Institute for Intelligent Sstems, Spemannstr. 38, 7276 Tübingen, German jan.peters@tuebingen.mpg.de (a) Eperimental setup (b) Kinesthetic teaching Fig. 1: Images of an eperiment involving the BioRob, a 4 DoFs elasticall actuated robot arm. The objective in this eperiment is to teach the robot how to reach a ball at each of the positions pointed b the arrows. (a) The blue arrow points to the current position of the ball, while the red arrows point to other possible ball positions. (b) The human demonstrates through kinesthetic teaching how to reach the ball at a certain position. in the face of new contets. In this wa, the human would not have to demonstrate a movement all over again when the robot had made a mistake. Instead, the human could just give small incremental feedbacks that would be interpreted b the robot as necessar changes to its movements and to the relation between movements and contets. Following this perspective, the main contribution of this paper is an algorithm that allows humans to teach robots contet-dependent motor skills through demonstrations in an incremental manner. Our demonstrations have been obtained through kinesthetic teaching, i.e. b letting the human grasp and move the links of the robot. The proposed algorithm allows the human to incrementall refine the movement of the robot, a desirable feature if the robot cannot imitate the original demonstration of the human or if the movement eecuted b the robot does not et solve the task at hand. In our method, a joint probabilit distribution over refined trajectories and contet variables is built. Having built this joint distribution, the robot is able to infer from previous eperiences what movement it should eecute to accomplish a certain task given a new contet. The contets used in our eperiments have been in the form of via points. More generall, the contet could for eample be positions,

2 orientations or other properties of objects in the workspace of the robot. The human can also incrementall correct the inferences of the robot b intervening in its movements. Each new refined trajector and contet is used to update the joint distribution. The algorithm proposed in this work is fairl general and ma be applied to teach different motor skills to robots. In our eperiments, this algorithm has been applied in a 2D problem to make a particle pass through certain via points, with the BioRob [2], an elasticall actuated robot arm with four degrees of freedom (DoFs), to make it reach a ball at different positions and in a minigolf-like task involving the same robot. The remainder of this paper is organized as follows: Section II presents related work. Section III introduces the proposed method for incremental imitation learning of contet-dependent motor skills. Our method is eplained starting with a procedure to learn trajectories for a single contet. After that, the necessar etensions to deal with contet-dependent motor skills are described. Section IV presents our eperiments. Finall, Section V summarizes the paper and discusses ideas for future work. II. RELATED WORK While in this work we have been specificall dealing with demonstrations through kinesthetic teaching, another form of imitation learning involves observation. In this case, the movements of a human are recorded b a camera or b a motion capture sstem. Those movements are mapped to the kinematics of the robot, which reproduces the demonstration or learns motion primitives from multiple demonstrations as in [3]. Usuall, though, this mapping does not alwas produce the most preferable motions. For this reason, researchers have been investigating the idea of incremental imitation learning through kinesthetic teaching to refine the movements initiall learned b the robot through observation. Calinon and Billard [4] presented an approach based on Principal Component Analsis (PCA) and Gaussian Miture Models (GMMs) to teach gestures to a humanoid robot. The gestures are demonstrated b a human in two was: the human moves while sensors attached to his/her bod record his/her movements or the human performs kinesthetic teaching b grasping and moving the arms of the robot. The robot learns new gestures from multiple human demonstrations and its movements can be incrementall refined b the human. When performing kinesthetic teaching, the human can decide which DoFs he/she wants to drive and which DoFs should be autonomousl driven b the robot. The DoFs driven b the human are set in passive mode. In contrast, in our method, the human disturbs the movements of the robot without setting an DoFs in passive mode. In our work, the disturbances introduced b the human lead to changes in the behavior of the robot. The human does not need to demonstrate the whole movement of a DoF again in case it is not moving correctl et. He/she just needs to appl incremental changes to this movement. Lee and Ott [] also presented a method to teach motor skills to a humanoid robot. In their method, first the robot observes the movements of a human. Subsequentl, a human ma incrementall refine the movement of the robot through kinesthetic teaching. Their work uses the concept of a motion refinement tube, which is a region around the nominal trajector followed b the robot where the controller has low stiffness, allowing the human to refine the trajector. Movements are represented in their method b Hidden Markov Models (HMMs). Changes introduced in a trajector b the human translate into changes in the parameters of the HMM representing that trajector. The updated HMM generates an updated trajector, accounting for the refinements introduced b the human. Their approach offers desirable properties, such as the possibilit of defining the refinement tube in such a wa that changes in the trajector of one joint do not result in accidental disturbances in the trajectories of other joints. In contrast to our work, their approach requires an accurate model of the robot dnamics in order to identif human intervention. Our approach allows robots to learn contet-dependent motor skills in the absence of an accurate model of the robot dnamics at the epense of having the robot eecute a movement with and without human intervention at each iteration of the algorithm. Another approach to incremental imitation learning aims at eliciting human preferences from the feedbacks he/she gives to the robot. Jain et al. [6] have developed a framework that enables robots to learn grocer checkout tasks with the help of human feedback. In their work, it is assumed that the user has an unknown score function quantifing the qualit of trajectories in different contets. This score function is a weighted sum of predefined features describing: 1) interactions between objects, 2) robot arm configurations, 3) orientation and temporal behavior of the object being manipulated, 4) interactions between the object being manipulated and the environment. The robot learns the weights for each of those features from human feedback. The user gives feedback b re-ranking a set of trajectories proposed b the robot or b changing wapoints of trajectories, setting the robot in zero-force gravit-compensation mode. In our method, when giving phsical feedback to the robot, the user does not change specific wapoints. Instead, the user disturbs the movement of the robot while it is moving. B doing so, the user can directl interfere in the speed profile of the trajector of the robot, teaching it for eample to hit a golf ball faster or slower. Our work also differs from the cited one b computing a probabilit distribution between trajectories and contets. While computation of the distribution requires modeling assumptions (Gaussians, GMMs, etc.), it does not require the design of a score function. Akgun et al. [7] conducted a stud evaluating kinesthetic teaching b demonstrating trajectories as well as kinesthetic teaching b demonstrating keframes. The concluded that both trajector and keframes demonstration are viable methods to teach motor skills to humanoid robots and have their specific advantages. When keframes were demonstrated, the trajectories were generated b splines connecting the

3 keframes. The presented a wa to demonstrate keframes iterativel to the robot, but did not present a wa to demonstrate trajectories iterativel. In this paper, we present a method for demonstrating trajectories iterativel. The method presented here might be of interest to future usabilit studies such as the one presented in [7]. The idea of incremental imitation learning has also been used in conjunction with tactile feedback to allow humans to teach grasp positioning and adaptation to robots [8], [9]. As eplained in Section III-B, we have been using as part of our algorithm the framework of Probabilistic Movement Primitives (ProMPs) [1] for achieving generalization. This framework offers a straightforward wa of inferring trajectories given contets. Other methods such as task parameterized models discussed in [11] and [12] have ver good generalization properties as well. In task parameterized models, demonstrations are observed from different frames. Different contets are then represented b different positions and orientations of these frames. For eample, b changing the position and orientation of an object of interest in the workspace of the robot, the position and orientation of the frame associated to this object also changes. The product of distributions over trajectories observed from different frames determines the trajectories that should be eecuted in a new contet. The novelt of our work resides not in the generalization to different contets in itself but in offering the human an intuitive wa to teach trajectories to a robot in an incremental manner. The human feedback also changes how the robot responds to different contets. The incremental learning aspect of this work could be combined with task parameterized models as well. Reinforcement Learning methods, as in [13], have been used to make robots find successful movements for solving a given task and to generalize those movements to new situations. However, in the absence of a good model of the robot and of its environment, which would allow for optimization in simulation, it might take too long to find a successful polic with the real robot. In this case, our method ma be helpful b allowing the human to influence the search for good policies. III. INCREMENTAL IMITATION LEARNING In this section, our method for incremental imitation learning of contet-dependent motor skills is eplained. First, in Section III-A, a procedure is eplained that allows a human to teach a trajector to a robot through kinesthetic teaching and to incrementall introduce adjustments to the trajector. Afterwards, Section III-B eplains how this procedure can be combined with a probabilistic framework to provide the robot with the capabilit of learning contet-dependent motor skills from human demonstrations and incremental refinements. A. Incremental Imitation Learning of a Trajector for a Single Contet The workflow depicted in Fig. 2 describes our procedure to incrementall teach a trajector to the robot for a single contet. First, an initial desired trajector τ D is defined. human intervention τ H* robot in open-loop τ R* τ HR u R τ R (feedback tracking) τ D τ D new = τ D old + α(τ HR -τ R ) Fig. 2: Workflow of procedure to incrementall teach a trajector to a robot for a single contet. This initial desired trajector could have been defined b a vector of Cartesian or joint positions entered b the user or b kinesthetic teaching. We assume the eistence of a feedback controller with potentiall imperfect dnamics compensation. The robot tries to track τ D, but eecutes in general a different trajector τ R. The sequence of forces or torques u R generated b the controller at each time step is recorded. The robot returns then to its initial position and u R is eecuted. While the robot eecutes u R, the human can disturb the trajector of the robot. The resulting trajector τ HR is recorded. We propose an update of the form τ new D = τ old D + α (τ HR τ R ), (1) where α [, 1] is a scalar that defines how much the difference between the trajectories τ HR and τ R should change the desired trajector τ D. In our eperiments, the term α has been set equal to 1. Smaller values of α decrease the update step, what might be useful to adjust for cases in which the human tends to eaggerate his feedback. The update (1) resembles the format of iterative learning control (ILC) [14], [1]. Fundamentall, ILC learns tracking commands with respect to a given trajector that the robot should eecute. The challenge addressed b our method relates to the acquisition of the desired trajector updates, which are assumed to be implicitl provided b human interventions. As in ILC, the control signal, in this case the desired trajector τ D for the feedback controller, is updated according to an error, in our case (τ HR τ R ). In other words, the difference between the trajector that the robot eecutes without disturbance τ R and the trajector that the robot eecutes with human disturbance τ HR indicates how the control signal τ D should be changed in order to perform the movement that the human wishes. While in ILC the reference trajector used in the computation of the tracking error is predefined, in update (1) the reference trajector τ HR changes at each iteration when the human is allowed to disturb the trajector of the robot. The human decides when to stop disturbing the trajector of the robot, in which case τ HR = τ R and τd new = τ D old, which means that the desired trajector does not change anmore.

4 B. Learning Contet-Dependent Motor Skills from Incremental User Feedback This section presents our method for incremental imitation learning of contet-dependent motor skills. This method is based on the procedure eplained in the previous section and uses Probabilistic Movement Primitives (ProMPs) [1]. 1) Probabilistic Movement Primitives: A ProMP is a probabilit distribution over trajectories. This probabilit distribution can be built from multiple demonstrated trajectories. In the ProMP framework, each demonstrated trajector, which has a duration of T time steps, is approimated b a weighted sum of N normalized Gaussian basis functions. This approimation can be represented b the equation τ = Ψw + ɛ, (2) where ɛ is a zero-mean i.i.d. Gaussian noise, i.e. ɛ N (, σ 2 I T T ), and ψ 1 (1) ψ 2 (1) ψ N (1) ψ 1 (2) ψ 2 (2) ψ N (2) Ψ = (3) ψ 1 (T ) ψ 2 (T ) ψ N (T ) The terms ψ n (t) correspond to basis functions indeed b n and evaluated at time t. The centers of those basis functions are positioned at regular intervals along the time ais. The vector of weights w = [w 1, w 2,, w N ] T, containing the weight w n for each basis function ψ n, can be computed through linear least squares, according to w = ( Ψ T Ψ ) 1 Ψ T τ. (4) Once the weight vector w for each demonstrated trajector τ has been computed, a probabilit distribution p (w) over weight vectors can be defined. In this work, p (w) is assumed to be a Gaussian with mean µ w and covariance Σ w, i.e. p (w) = N (µ w, Σ w ). () With this assumption, it is possible to compute in closed form the probabilit distribution p (τ ) over trajectories b integrating out the weight vectors w as in the equation which results in where p (τ ) = p (τ w) p (w) dw, (6) p (τ ) = N (µ τ, Σ τ ), (7) µ τ = Ψµ w, (8) Σ τ = σ 2 I T T + ΨΣ w Ψ T. (9) 2) Modeling Contet-Dependent Motor Skills: In this work, ProMPs are used to infer the most probable trajector for a given contet. In order to achieve this, a normal joint probabilit distribution over weight vectors w and the corresponding contets, here represented b the vectors c, is created, p (w, c) = N (µ joint, Σ joint ), (1) where µ joint = [ ] [ ] µw Σww Σ, Σ µ joint = wc. c Σ cw Σ cc Given a specific contet c, it is then possible to compute the conditional probabilit distribution where p (w c) = N ( µ w c, Σ w c ), (11) µ w c = µ w + Σ wc Σ 1 cc (c µ c ), (12) Σ w c = Σ ww Σ wc Σ 1 cc Σ cw. (13) Net, the conditional probabilit distribution p (τ c) can be computed b solving the equation p (τ c) = p (τ w) p (w c) dw, (14) which results in with p (τ c) = N ( µ τ c, Σ τ c ), (1) µ τ c = Ψµ w c, (16) Σ τ c = σ 2 I T T + ΨΣ w c Ψ T. (17) 3) Online Learning with Human Feedback: This section eplains the proposed algorithm, which allows robots to learn in an online fashion contet-dependent motor skills from human demonstrations and refinement. This algorithm uses the refinement procedure illustrated as a workflow in Fig. 2 in conjunction with the probabilistic modeling of contetdependent motor skills previousl eplained in Section III- B.2. The robot starts with an initial joint probabilit distribution p (w, c) over weights w and contets c. Given this prior and a certain contet c, the robot computes p (τ c) = N ( ) µ τ c, Σ τ c, as in Section III-B.2. The robot s desired trajector τ D is set equal to the mean µ τ c. The algorithm iterates over the refinement loop depicted in Fig. 2 as man times as the human wants. B the end of this iteration, the weights w of the new desired trajector τ D are computed, using (4). This new weight vector w M and the given contet [ vector ] c M are concatenated to form the vector = w T M, c T M, where M is the number of situations eperienced so far. The joint distribution p (w, c) = N (µ joint, Σ joint ) is then updated according to Welford s method for computing mean and covariance online [16]. According to this method, the mean µ joint is updated with µ new joint = µ old joint + ( T µ old joint M ) (18)

5 and the covariance matri Σ joint with S new (i, j) = S old (i, j) + ( ) M 1 ( (i) µ old joint (i) ) ( (j) µ old joint (j) ), (19) M Σ new joint (i, j) = Snew (i, j) M 1, (2) where S = (M 1)Σ is an auiliar matri. Afterwards, the whole procedure is repeated for a new contet. Algorithm 1 summarizes the proposed method. Algorithm 1 Incremental Imitation Learning of Contet- Dependent Motor Skills 1: Initialize µ joint and Σ joint with a few demonstrations or with predefined values 2: for each new c 3: compute µ w c and Σ w c (Eqs. 12 and 13) 4: compute µ τ c and Σ τ c (Eqs. 16 and 17) : τ D = µ τ c 6: refinement loop (until human decides to stop) 7: robot tracks τ D with feedback controller (τ R and u R are recorded) 8: robot eecutes u R in feedforward and human is allowed to disturb the trajector (τ HR is recorded) 9: τ new D = τ old D + α (τ HR τ R ) 1: end 11: w M = ( Ψ T Ψ ) 1 Ψ T τ D 12: update µ joint and Σ joint (Eqs. 18, 19 and 2) 13: end IV. EXPERIMENTS This section presents two eperimental evaluations of the proposed algorithm. First, a simple 2D problem is presented. In this problem, the human can incrementall teach trajectories to the learning sstem, which can infer what to do in the face of new contets. Afterwards, eperiments are described that show the applicabilit of our method for teaching motor skills to a real robot. In all the eperiments described in this paper, the number N of Gaussian basis functions, introduced in Section III- B.1, was 2. This value was determined empiricall b increasing the number of basis until the trajectories could be approimated to a desired level of detail. A. 2D Problem We designed a simple 2D problem in order to facilitate the understanding of the algorithm and the visualization of results. In this problem, a particle moves with constant velocit in the -direction. Initiall, its velocit in the - direction is zero. The user can introduce an acceleration in the -direction b pressing the kes up or down on the keboard. The objective is to teach the sstem a trajector that passes through two via points. The -coordinates of each via point constitute in this case the contet, which is represented b the vector [ 1, 2 ] T, where 1 is the - coordinate of the via point located at = 2 and 2 is the -coordinate of the via point located at = 3.6. After being trained for a number of different configurations of the via points, the sstem should be able to infer the right trajector to pass through the via points both in the configurations that it has alread eperienced as well as in a previousl unseen configuration. The human can iterativel appl disturbances to the trajector of the particle. This wa, the human can correct the trajector initiall inferred b the sstem for a given contet. See Fig. 3 for an illustration of this 2D problem. In this problem, u R is a time-indeed sequence of forces in direction, while the trajectories τ R, τ HR and τ D are time-indeed sequences of (, ) positions. Fig. 4 shows the prior and the posterior probabilit distributions over trajectories after the human had trained the sstem for two contets. In this simple problem, after eperiencing onl two eamples, the sstem could alread infer trajectories that passed close to the via points. The progress in the generalization abilit of the sstem is depicted in Figs. and 6. Fig. shows the prior and the posterior over trajectories without an training, after the human had trained the sstem for one contet and so on until the sstem had previousl observed eight different contets. Fig. 6 shows the mean squared error MSE = 1 ( ( (2) 1 ) 2 + ( (3.6) 2 ) 2) (21) 2 of the trajectories eecuted b the particle when observing each contet for the first time. Here (2) and (3.6) are the -coordinates of the particle when = 2 and = 3.6 respectivel. The MSE for the first contet is 1, because the trajector is 1 units apart from each via point. After the human has taught the trajector for two different contets through the refinement loop, the MSE of the trajectories eecuted b the particle for previousl unseen contets alread drops to almost zero. B. Real Robot Eperiments We tested our algorithm in eperiments with the BioRob [2], a 4-DoF elasticall actuated robot arm. It is a biologicall inspired robot whose cables and springs roughl mimic the functionalit of antagonistic pairs of muscles. The BioRob is light and compliant, nevertheless challenging to model and control 1. In these eperiments, the objective of the human was to teach the robot how to reach a ball positioned on steps of a ladder as shown in Fig. 1. We defined si different positions for the ball (see Fig. 7). We chose to represent each position b two values, epressing the two different heights with respect to the ground and the three different left-to-right positions. This choice was motivated b the fact that the 1 The robot is controlled b a proportional-derivative (PD) controller with feedforward terms to compensate for gravit and friction/stiction. In these eperiments, onl three DoFs are actuall being used.

6 without human with human Fig. 3: 2D problem. Left plot: the red circles represent two via points through which the trajector should pass; the black line represents the trajector being currentl tracked b the particle; the blue line represents the trajector eecuted b the particle until the current instant. Right plot: the red circles represent the same two desired via points; the red line represents the trajector that the particle would eecute without an human interference; the blue curve represents the trajector eecuted b the particle with human interference. relations between the values assumed b the contet variables are important, not their absolute values. In this setup, there was no camera detecting the ball. The values representing the different contets were manuall provided to the robot. In our eperiments with the BioRob, u R is a time-indeed sequence of torques, while the trajectories τ R, τ HR and τ D are time-indeed sequences of joint angles. B working with trajectories in joint space, we avoid an possible problems related to kinematic redundanc that would need to be addressed if our trajectories were sequences of Cartesian end effector positions. In order to initialize the prior probabilit distribution p (w, c) over weight vectors and contets, the refinement loop depicted in Fig. 2 was eecuted for four different positions of the ball: top left represented b the vector [1, 1] T, top right represented b [1, 3] T, down left represented b [2, 1] T and down right represented b [2, 3] T. After building the prior, the for loop described in Algorithm 1 was eecuted. The robot computed the most probable trajector τ for the contet top middle represented b [1, 2] T. The human improved the trajector of the robot through the refinement loop until the robot could successfull reach the ball at the top middle position. After the human decided to quit the refinement loop, the joint probabilit distribution p (w, c) over weights and contets was updated according to (18), (19) and (2). Then the robot used its updated prior to compute a trajector for the contet down middle represented b [2, 2] T. Having built the prior based on trajectories that reach the ball at each of the four corner positions, the robot was able to compute trajectories to reach or pass close to the ball at the two previousl untrained middle positions. Using our refinement loop, the human could correct the trajectories inferred b the robot. In a second pass of the algorithm over all the si positions, the robot was able to reach the ball ever time without needing an further refinement from the human. Please refer to our accompaning video for a recording of Prior Prior and Posterior Fig. 4: Prior and posterior over trajectories for the 2D problem after the human had trained the sstem for two contets. Left plot: the light gra curves represent the trajectories eecuted b the particle after refinement provided b the human for two different configurations of the via points, represented b the light gra circles; the blue curve represents the prior mean and the blue shade represents two standard deviations of the prior. Right plot: the configuration of the via points represented b the red circles is different from the configurations that have been observed so far b the sstem, which are represented b the light gra circles; the black curve represents the prior mean and the gra shade represents two standard deviations of the prior (the prior in this plot is the same as in the left plot); the blue curve represents the posterior mean and the blue shade represents two standard deviations of the posterior Prior 1 and Posterior Prior 4 and Posterior Prior 7 and Posterior Prior 2 and Posterior Prior and Posterior 2 4 Prior 8 and Posterior Prior 3 and Posterior Prior 6 and Posterior Prior 9 and Posterior Fig. : Prior and posterior (with mean and two standard deviations) over trajectories given contets without an training (Prior 1 and Posterior 1), after training for one contet (Prior 2 and Posterior 2) and so on until the sstem had previousl observed eight different contets (Prior 9 and Posterior 9). Onl the two first contets needed refinement from the human. After that, the sstem was alread able to infer trajectories intercepting the via points.

Iteration2.3.3.3.2.2.2.2.1 desired.1.2.1 desired.1..2.1 1 1 -. 2. 1 time (s) 1 -. 2 Iteration1 Iteration2 -.6 -.8 -.8.8.8.6.6.4.1 (m) (m) ball -.6 -.8. 2 -.4 ball -.6.4 1 Iteration3.8.6 1 -.2 -.4 ball z (m) -.

7 Iteration desired desired time (s) Iteration1 Iteration (m) (m) ball ball Iteration ball z (m) time (s) time (s) -.2 (m) desired.1 z (m) -. DoF 1 (rad).3. z (m) Iteration3.3 DoF 1 (rad) DoF 1 (rad) Iteration (m).2 (m).1.1. (m) Fig. 8: Three iterations of the refinement loop. Upper row: Trajectories in joint space of the 1st DoF of the BioRob. Lower row: The correspondent end effector trajectories in Cartesian space. 1 MSE [1,1]T [1,2]T [1,3]T 8 number of observed contets Fig. 6: Mean squared error (MSE) of trajector eecuted b the particle when the sstem is observing each contet for the first time. [2,1]T [2,2]T [2,3]T the full eperiment. Fig. 8 shows three iterations of the refinement loop to teach the robot how to reach the ball at the position top left when the robot had no previous eperience. The figure shows the trajectories in joint space of the 1st DoF of the BioRob and the corresponding end effector trajectories in Cartesian space. In the first iteration, the robot arm simpl hung down. Then, the human demonstrated through kinesthetic teaching how to reach the ball. In the second iteration, the robot tried to track the new desired trajector. Since the controller of the robot is not able to track a desired trajector accuratel, the robot onl passed close to the ball, but still Fig. 7: Ball positions in an eperiment involving the real robot. The positions represented b the vectors written in blue are the contets that were used to initialize the prior probabilit distribution p (w, c) over weight vectors and contets. Having built this prior, the robot infers trajectories to reach the ball at the positions represented b the vectors written in red.

8 did not reach it (see the middle plot in the lower row). The human refined this trajector further b interfering in the movement of the robot. In the third iteration, the robot was able to reach the ball alone and the human did not give an further feedback. The small differences between the red curve (trajector generated b feedback tracking) and the blue curve (trajector generated b reproducing the sequence of torques generated b the feedback tracking controller) are, in the third iteration, due to small errors in the repeatabilit of the robot. We also performed eperiments in which the human taught a robot how to putt in a minigolf-like task. In this case, there was onl one contet. In the phases in which the human was allowed to disturb the trajector, the human kept his hands close to the robot and applied forces to its end effector when he judged necessar. After a few iterations, the desired trajector τ D tracked b the feedback controller was such that it led to the robot being able to sink the ball. This eperiment was recorded and is also included in the accompaning video. V. CONCLUSION AND FUTURE WORK This paper presented an algorithm to allow humans to incrementall teach robots contet-dependent motor skills. This algorithm is particularl relevant when the robot cannot track desired trajectories accuratel or when trajectories initiall computed b the robot given a new contet do not solve the task at hand. In those cases, our algorithm offers the human an intuitive wa of refining the trajectories of the robot. Moreover the refined trajectories and the new contets are used to update the probabilit distribution used b the robot to compute trajectories given contets. A 2D problem and eperiments involving a 4-DoF elasticall actuated robot arm demonstrate the effectiveness of the proposed algorithm. As it is, the presented method for the human to teach the robot is not suitable when the robot moves ver fast, is too heav or manipulates a dangerous object. For those cases, an alternative wa for the human to introduce changes in the trajector of the robot would be necessar. Instead of phsicall interacting with the robot while it moves, the human could for instance use a graphical interface to change the trajector or use teleoperation. In this work, we modeled the joint probabilit distribution p (w, c) over weights and contet variables as a single Gaussian. This model entails that w and c are linearl correlated, which is a reasonable assumption for the simple tasks we have dealt with so far. On the other hand, in a task such as plaing table tennis, in which the robot would have to eecute forehand and backhand strokes, this assumption would probabl not hold. In order to deal with those cases, an alternative would be to model p (w, c) as a Gaussian Miture Model. The contets addressed so far in our work have been onl in the form of via points. Other possible contets could be the weight or the size of objects manipulated b the robot, goal position of an object thrown b the robot, positions and orientations of objects in the workspace, etc. For the simple contets evaluated so far, the human could teach the robot in a few iterations how to solve the task at hand and the generalization capabilities of the algorithm also helped reducing the amount of human intervention needed. Further evaluations shall be made in order to determine if the human can successfull teach skills with other tpes of contet as well. VI. ACKNOWLEDGMENTS The research leading to these results has received funding from the project BIMROB of the Forum für interdisziplinäre Forschung (FiF) of the TU Darmstadt and from the European Communit s Seventh Framework Programme (FP7- ICT-213-1) under grant agreement (3rdHand). REFERENCES [1] S. Schaal, Is imitation learning the route to humanoid robots? Trends in cognitive sciences, vol. 3, no. 6, pp , [2] T. Lens and O. von Strk, Design and dnamics model of a lightweight series elastic tendon-driven robot arm, in Proceedings of the International Conference on Robotics and Automation (ICRA). IEEE, 213, pp [3] D. Kulić, C. Ott, D. Lee, J. Ishikawa, and Y. Nakamura, Incremental learning of full bod motion primitives and their sequencing through human motion observation, International Journal of Robotics Research, vol. 31, no. 3, pp , 212. [4] S. Calinon and A. Billard, Incremental learning of gestures b imitation in a humanoid robot, in Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction. ACM, 27, pp [] D. Lee and C. Ott, Incremental kinesthetic teaching of motion primitives using the motion refinement tube, Autonomous Robots, vol. 31, no. 2-3, pp , 211. [6] A. Jain, B. Wojcik, T. Joachims, and A. Saena, Learning trajector preferences for manipulators via iterative improvement, in Advances in Neural Information Processing Sstems (NIPS), 213, pp [7] B. Akgun, M. Cakmak, J. W. Yoo, and A. L. Thomaz, Trajectories and keframes for kinesthetic teaching: A human-robot interaction perspective, in Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction. ACM, 212, pp [8] B. D. Argall, E. L. Sauser, and A. G. Billard, Tactile guidance for polic refinement and reuse, in Proceedings of the International Conference on Development and Learning (ICDL). IEEE, 21, pp [9] E. L. Sauser, B. D. Argall, G. Metta, and A. G. Billard, Iterative learning of grasp adaptation through human corrections, Robotics and Autonomous Sstems, vol. 6, no. 1, pp. 71, 212. [1] A. Paraschos, C. Daniel, J. Peters, and G. Neumann, Probabilistic movement primitives, in Advances in Neural Information Processing Sstems (NIPS), 213, pp [11] S. Calinon, T. Alizadeh, and D. G. Caldwell, On improving the etrapolation capabilit of task-parameterized movement models, in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Sstems (IROS). IEEE, 213, pp [12] S. Calinon, A tutorial on task-parameterized movement learning and retrieval, Intelligent Service Robotics, vol. 9, no. 1, pp. 1 29, 216. [13] J. Kober, A. Wilhelm, E. Oztop, and J. Peters, Reinforcement learning to adjust parametrized motor primitives to new situations, Autonomous Robots, vol. 33, no. 4, pp , 212. [14] S. Arimoto, S. Kawamura, and F. Miazaki, Bettering operation of robots b learning, Journal of Robotic Sstems, vol. 1, no. 2, pp , [1] D. Bristow, M. Tharail, and A. Allene, A surve of iterative learning control, IEEE Control Sstems Magazine, vol. 26, no. 3, pp , 26. [16] B. Welford, Note on a method for calculating corrected sums of squares and products, Technometrics, vol. 4, no. 3, pp , 1962.

Discussion: Clustering Random Curves Under Spatial Dependence

Discussion: Clustering Random Curves Under Spatial Dependence Gareth M. James, Wenguang Sun and Xinghao Qiao Abstract We discuss the advantages and disadvantages of a functional approach to clustering