Learnng physcal Models of Robots Jochen Mück Technsche Unverstät Darmstadt jochen.mueck@googlemal.com Abstract In robotcs good physcal models are needed to provde approprate moton control for dfferent types of robots. Classcal robotcs falls short, when models are hand-crafted and unmodeled features as well as nose cause problems and naccuracy. Based on the paper Scalable Technques from Nonparametrc Statstcs for Real Tme Robot Learnng [4] by Stefan Schaal, Chrstopher G. Atkeson and Sethu Vjayakumar ths revew paper dscusses the model learnng problem, descrbes algorthms for locally weghted learnng and presents some real world applcatons. 1 Introducton Moton-Control s one of the most mportant basc tasks n robotcs. The classcal robotcs approach, where a physcal model of the robot s hand-crafted by an engneer before the robot s actually used, quckly comes to ts lmts because of the lack of accuracy n the model and nose caused by unmodeled features and change n characterstcs due to envronmental nfluence. In classcal robotcs smple control laws (e.g. PID-Controller) are used to compensate these errors. In contrast the robot learnng approach s to constantly learn and mprove a physcal model of the robot usng recorded data from the robots jonts. The concept of self-mprovement has the flexblty to handle nose and unmodeled features. Hence, usng learned models can help generatng better control-laws, thus moton-control tasks can be performed wth hgher accuracy. Challenges n model learnng for robots are the hgh dmensonalty of the problem (e.g up to 90 dmensons for a humanod robot) and the need to calculate and mprove the model n real-tme gven a contnuous stream of data. Furthermore usng on-lne robot learnng on autonomous systems requres fast algorthms snce hgh performance hardware usually s not avalable. There are many dfferent learnng methods to solve a learnng problem. But snce large amounts of data has to be handled and nexpensve algorthms are preferred for on-lne model learnng, not all of these can be appled to model learnng for robots. In ths paper locally weghted learnng methods (LWL) are descrbed, whch are sutable for robot learnng problems. Before the algorthms are presented n secton 3, some foundatons of model learnng are dscussed n the next secton whch wll also gve a short overvew about statstcs, dfferent types of models and learnng archtectures. Fnally real-world model learnng applcatons are descrbed n secton 4. 1
2 Foundatons of Model Learnng Ths sectons gves an overvew about model learnng. Frst the Model Learnng Problem and a soluton usng regresson methods and gradent descent search are shown. Afterwards dfferent types of models are explaned. At last learnng archtectures, whch make use of models, are descrbed. 2.1 The Model Learnng Problem In robot control, generally dfferent model learnng problems are nterestng. These are: Forward Knematcs Inverse Knematcs Forward Dynamcs Inverse Dynamcs x = f(q) ẋ = J(q) q (1) ẍ = J(q) q + J(q) q q = f 1 (x) (2) q = M 1 (q)(u c( q, q) g(q)) (3) u = M(q) q d + c( q, q) + g(q) (4) All of these model learnng problems except the nverse knematcs can be solved usng regresson methods. The forward dynamcs equaton 3 descrbes the jont acceleratons q gven the appled torque u usng the physcal parameters M 1, c( q, q) and g(q). In contrast the nverse dynamcs equaton 4 descrbes whch torques have to be appled to the actuators to acheve the desred jont acceleratons q d. Hence, the task to learn a physcal model of a robot s to determne the parameters M, c( q, q) and g(q). More generally the model can be wrtten as: y = f θ (x) + ɛ (5) The functon f θ (x) can also be wrtten as φ(x) T θ wth parameters θ and features φ. ɛ denotes Gaussan dstrbuted nose. The goal of model learnng s to fnd parameters θ, so that a cost-functon J s mnmal. J usually s defned as a least squares cost-functon: Whch can also be wrtten as: J = 1 2 N (y f θ (x )) 2 (6) =1 J = 1 2 (Y φθ)t (Y φθ) (7) To fnd parameters θ wth mnmum cost-functon J can be done wth gradent descent search (see secton 2.1.1). A closed form for the soluton can be derved from gradent descent search and can be wrtten as: θ = (φ T φ) 1 φ T Y (8) Equatons 5 to 8 show how a model learnng problem looks lke and how t can be solved usng basc lnear regresson methods. 2
2.1.1 Gradent Descent Search Fgure 1: Gradent Descent Search Gradent descent search s an teratve method to fnd a local mnma of a mult-varable functon F(x). The dea s to teratvely take small steps nto the drecton of the functon s gradent at a certan pont x k, gven a random startng pont. The step-sze can be modfed wth a parameter γ. The teraton-step can be wrtten as: x k+1 = x k γ F(x k ) (9) Fgure 1 shows how the gradent descent search methods works n a quadratc functon. 2.2 Types of Models The goal of model learnng s to learn the behavor of the system gven some observed quanttes. Therefore mssng nformaton has to be predcted. Dependng on what knd of nformaton s mssng, dfferent types of models can be defned (see [1] secton 2.1). Forward Models predct a future state of the system s k+1 gven the current state and acton s k and a k. Snce the forward model drectly descrbes the physcal propertes of the system, whch represents a causal relatonshp between states and actons, learnng a forward model s a well-defned problem. Inverse Models n contrast are used to compute an acton a k whch s needed to get from state s k to s k+1. Snce ths mappng n general s not unque, learnng an nverse model s not always a well-defned problem. But for a robot s nverse dynamcs t s well-defned, thus the robot s nverse dynamcs model can be learned usng regresson methods. Forward and nverse models are the most mportant types of models. Although a combnaton of both models can be useful. The forward model can help to create a unque mappng for an nverse model learnng problem. These models are called Mxed Models. Furthermore n some applcatons t s mportant to know not just the next state of the system, but several states n the future. Models that provde ths nformaton are called Mult-Step Predcton Models. 3
2.3 Learnng Archtectures Dependng on the type of model and the learnng problem dfferent learnng archtectures can be defned. The learnng archtecture descrbes what quanttes are observed to learn the model (see [1] secton 2.2). Usng Drect Modelng a model s learnt by observng the system s nputs and outputs. The dea s to obtan the systems behavor and therefore the model by observng an nput acton and the resultng output state of the system. Ths learnng archtecture can bascally be used to learn all types of models f the learnng problem s well-defned. An Indrect Modelng learnng approach s feedback error learnng, where a feedforward controller s beng learned from a feedback controllers error sgnals. If the learned forward model s perfect, the feedback controller wll not nfluence the control system any more, otherwse the error sgnal s used to update the model. Thus ths learnng archtecture always tres to mnmze the error. Although t can deal wth ll-posed problems such as nverse knematcs learnng. One drawback of ths archtecture s that t has to be appled on-lne n order to get the actual feedback controller s error sgnals. Indrect Modelng can be used to learn nverse models and mxed models. In the Dstal Teacher Approach a unque forward model acts as a teacher to obtan an learned nverse models error and therefore help to update the nverse model. The goal s to mnmze the error. The dea s that the nverse model results n a correct soluton for a desred trajectory when the error between the output of the forward model and the nput of the nverse model s mnmzed. The Dstal Teacher Approach can only be appled for learnng mxed models. 3 Locally Weghted Learnng Locally Weghted Learnng (LWL) s one approach to learn models from tranng data. As descrbed before the am s to fnd the functon f θ (x) n the model equaton 5. The dea of LWL methods s to approxmate ths non-lnear functon by means of pecewse lnear models. The man challenge s to fnd the regon where the local model s vald (receptve feld). Here, for each lnear model the followng receptve feld s used: w k = exp( 1 2 (x c k) T D k (x c k )) (10) The advantage of LWL methods s that we do not need a feature vector φ, whch s usually hand-crafted. The remander of ths secton descrbes four dfferent LWL methods. 3.1 Locally Weghted Regresson Locally Weghted Regresson (LWR) extends the standard regresson method as shown n equaton 8 by a weght matrx whch determnes how much nfluence the data next to the query pont has to the local lnear model. Ths only requres the learnng system to have suffcent tranng data n memory. Thus the model can easly be updated by addng new tranng data. An algorthm for a predcton ŷ q for a query pont x q s shown below (Algorthm 1). Ths algorthm seems quet complex on a frst vew, but snce the wegh-matrx W ensures that data-ponts whch are not close to the query-pont wll be equal zero, the matrx multplcaton and pseudo nverse smplfes a lot. Nevertheless the complexty rses wth the dmensonalty of the system. The not yet defned varable n the above algorthm s the parameter D whch s the dstance matrx of the receptve feld. Thus ths parameter descrbes how bg the regon of valdty around a query-pont s. D can be optmzed usng Leave-One-Out Cross Valdaton (Algorthm 2), when suffcent tranng data s recorded. In purpose to reduce the number of parameters, D s assumed to be a global dagonal matrx multpled by a scalng factor h. Ths scalng factor s now the only parameter to be optmzed. Leave-One-Out Cross Valdaton predcts a value for a query-pont whch s left out of the tranng data and compares 4
Algorthm 1 Locally Weghted Regresson Gven: x q (Query Pont), p (Tranng Ponts {x, y }) Compute weght-matrx W: w = exp( 1 2 (x x q ) T D(x x q )) Buld matrx X and vector y such that: X = ( x 1, x 2..., x p ) T where x = [(x x q ) T 1] T y = (y 1, y 2,..., y p ) T Compute locally lnear model: β = (X T WX) 1 X T Wy Compute predcton x q : ŷ q = β n+1 the predcton afterwards to the actual sample value, resultng n an error. Ths s repeated for all tranng ponts. The factor h s chosen to acheve a mnmal error. Algorthm 2 Leave-One-Out Cross Valdaton Gven: a set H of reasonable values h r for all h r H do sse r = 0 for = 1 : p do x q = x Temporarly exclude {x, y } from tranng data Compute LWR predcton ŷ q wth reduced data sse r = sse r + (y ŷ q ) 2 end for end for Choose optmal h r such that h r = mn{sse r } 3.2 Locally Weghted Partal Least Squares Snce the complexty of the LWR Algorthm (Algorthm 1) rses wth the nput dmensonalty LWR can be slow for hgher dmensons. Furthermore the matrx nverson step can become numercally unstable f there are redundant nput dmensons. Locally Weghted Partal Least Squares (LWPLS) takes care of these problems. In ths approach Partal Least Squares (PLS) s used to reduce the complexty of the problem. PLS s based on a lnear transton from the hgh dmensonalty of the nput to a new varable space based on lower dmensonal orthogonal factors. Ths means, that those orthogonal factors are ndependent lnear combnatons of the orgnal nput. These projectons are used to calculate a predcton for a query-pont (see Algorthm 3). The only undefned parameter s the number of projectons r. Snce the squared error for each new projecton should be reduced, addng new projectons can be stopped f the rate of error reducton learnng tasks. res2 res 2 1 < φ s not hgh enough any more. In [4] φ = 0.5 was used for all 5
Algorthm 3 Locally Weghted Partal Least Squares Gven: x q (Query Pont), p (Tranng Ponts {x, y }) Compute weght-matrx W: w = exp( 1 2 (x x q ) T D(x x q )) Buld matrx X and vector y such that: x = p =1 w x / p =1 w β 0 = p =1 w y / p =1 w X = ( x 1, x 2,..., x p ) T where x = (x x) y =(ỹ 1, ỹ 2,..., ỹ p ) T where ỹ = (y β 0 ) Compute locally lnear model: Intalze: Z 0 = X, res 0 = y for = 1 : r do u = Z T 1 Wres 1 s = Z 1 u β = st Wres 1 s T Ws p = st WZ 1 s T Ws res = res 1 s β Z = Z 1 s p end for Compute predcton x q : Intalze: z 0 = x q x, ỹ q = β 0 for = 1 : r do s = z T 1 u ỹ q ỹ q + s β z = z 1 s p T end for 3.3 Receptve Feld Weghted Regresson When tranng data s receved constantly by the learnng system lke n on-lne learnng scenaros, the data set becomes very large. In ths case LWR and LWPLS fall short because of hgh computatonal cost. Instead of computng a local model when a predcton has to be made, Receptve Feld Weghted Regresson (RFWR) teratvely bulds new local models when tranng data s added. Thus the predcton for a query-pont can be computed as the weghted average over the predctons of all local models: K k=1 ŷ q = w k ˆ K k=1 w k y q,k (11) Algorthm 4 shows how the local models are updated. Lke n LWL and LWPLS the only Algorthm 4 Receptve Feld Weghted Regresson Gven: a tranng pont (x, y) Update K local models: w k = exp( 1 2 (x c k) T D k (x c k )) β n+1 k = βk n + w kp n+1 k xe cv,k where x = [(x c k ) T 1] and P n+1 k Compute predcton x q : ŷ k = βk T x 1 = 1 λ (Pk n Pn k x xt P n k λ w k + x T P n k x) and e cv,k = (y βk nt x) open parameter s the dstance matrx D. Snce RFWR uses several local models, we can use a dfferent dstance matrx D for each of these models. In [3] the followng cost-functon 6
for a gradent descent update of D was used: J = 1 k =1 w k =1 w y ŷ 2 (1 w x T P x ) + γ n 2,j=1 D 2 j (12) Note that RWFR s knd of an ncremental verson of LWR, thus t also can not deal wth very hgh dmensonal problems. 3.4 Locally Weghted Projecton Regresson For hgher dmensonal problems LWPLS was used to reduce the dmensonalty and therefore the complexty of the problem. Usng LWPLS on-lne wth huge data-sets relates to the problem of LWR whch was solved there usng ncremental model updates leadng to RWFR. The dea of Locally Weghted Projecton Regresson s to formulate an ncremental verson of LWPLS whch can deal wth hgh dmensonalty and huge data-sets. Algorthm 5 shows how one local model s updated. Algorthm 5 Locally Weghted Projecton Regresson Gven: a tranng pont (x, y) Update the means of nputs and outputs: x n+1 0 = λw n x n 0 +wx W n+1 β0 n+1 = λw n β n 0 +wy W n+1 where W n+1 = λw n + w Update the local model: Intalze z 0 = x x n+1 0, res 0 = y β0 n+1 for = 1 : r do u n+1 = λu n + wz 1 res 1 s = z T 1 un+1 SS n+1 = λss n + ws2 = λsr n + ws2 res 1 SZ n+1 = λsz n + wz 1s SR n+1 β n+1 p n+1 = SRn+1 SS n+1 = SZn+1 SS n+1 z = z +1 s p n+1 res = res 1 s β n+1 SSE n+1 = λsse n + wres2 end for Lke n RFWR the dstance matrx D s updated usng gradent descent for each local model (see [4]). 7
4 Model Learnng Applcatons In ths secton some very dfferent model learnng applcatons are shown. For every applcaton nformaton about the task that has to be learned, the type of model learnng method used as well as the results are presented. The frst three example applcatons use dfferent LWL methods, whle the last example uses Gaussan process regresson (GPR). Ths example ntents to show that other model learnng methods can be appled to specfc problems but wll not gve an detaled look nto GPR. 4.1 Learnng Devl Stckng Fgure 2: Devl Stckng Task a) and Robot Layout b) (see [4]) Task Descrpton: Jugglng a center stck between two control stcks. The center stck s lfted alternately by the two control stcks to joggle t. The task s to learn a contnuous left-rght-left pattern. Ths task s modeled as a dscrete functon that maps mpact states from one hand to the other. A state s descrbed as a vector x = (p, θ, ẋ, ẏ, θ) T wth mpact poston, angle, veloctes of the center of the center stck and ts angular velocty. The task command u = (x h, y h, θ, v x, v y ) T wth a catch poston, and angular trgger velocty and the two dmensonal throw drecton. Type of Model Learnng: The robot learns a forward model gven the current state and acton and predctng the next state. Ths problem has a 10 dmensonal nput and a fve dmensonal output and therefore s deally suted for LWR methods. Furthermore tranng data s only generated wth 1-2Hz, every tme the center stck hts one of the control stcks. Results: More than 1000 consecutve hts where counted as a successful tral, whch was acheved n about 40-80 trals (see Fgure 3). Ths s a remarkable result snce even for humans devl stckng s a dffcult task and a lot of trals are needed for an untraned human. Fgure 3: Devl Stckng Results (see [4]) 8
4.2 Learnng Pole Balancng Fgure 4: Pole Balancng a) and Results b) (see [4]) Task Descrpton: In ths applcaton balancng a pole up-rght on a robots fnger s the learnng task (see Fgure 4a). The robot arm has 7 degree-of-freedom. Gven the nverse dynamcs model of the robot the goal of the learnng problem was to learn task level commands lke Cartesan acceleratons of the robot s fnger. Type of Model Learnng: On-lne learnng usng RFWR was used wth nput data from a stereo camera system observng the pole. Thus nput data s 12 dmensonal: 3 postons of the lower pole end, 2 angular postons, the 5 correspondng veloctes and two horzontal acceleratons of the robot s fnger. The output that predcts the next state of the pole therefore s 10 dmensonal. So RFWR was used to learn a forward model of the pole to predct the next state. Results: Whle learnng the model from scratch took the system about 10-20 trals to keep the pole up-rght for longer then a mnute, observng a human teacher who demonstrated pole balancng for 30 seconds helped to extract the forward model so that the system fulflled the task on a sngle tral (see Fgure 4b). Ths could also be shown usng dfferent poles and demonstratons from dfferent people. 9
4.3 Inverse Dynamcs Learnng Task Descrpton: Computng a nverse dynamcs model 4 by hand from rgd-body dynamcs does not lead to a satsfyng soluton snce a lot of the systems propertes can not be modeled accurately. Learnng the nverse dynamcs model of a 7-degree-of-freedom robot arm was accomplshed n ths example applcaton (=21 nput dmensons). In a second example the authors learned the nverse dynamcs of a humanod robots shoulder motor wth 30 degrees-of-freedom (=90 nput dmensons). Type of Model Learnng: In both nverse dynamcs learnng problems LWPR was appled. Ths fts the need of handlng hgh dmensonalty nput and copng wth bg data sets. Results: For the 7-degree-of-freedom robot arm Fgure 5 shows the results compared to a parametrc nverse dynamcs model. The normalzed Mean-Squared-Error (=nmse) converges after about 500.000 tranng ponts to nm SE = 0.042. Durng learnng LWPR created about 300 local models. Fgure 5: Inverse dynamcs learnng results wth 7DOF Robot (see [4]) The results of the second example are shown n Fgure 6. Here a lot more tranng ponts where needed, but LWPR outperformed the parametrc model quckly. On ths hgh dmensonal learnng problem LWPR created more than 2000 local models. Fgure 6: Inverse dynamcs learnng results wth 30DOF Robot (see [4]) 10
4.4 Bo-Inspred Moton Control Based on a Learned Inverse Dynamcs Model Task Descrpton: The prevous example shows that learnng an nverse dynamcs model leads to better results n comparson to manually computed models. Ths was shown on a rgd body robot wth stff jonts. The problem of accurately modelng all system propertes gets even worse when elastc (bo-nspred) jonts are used. In ths example a dfferent model learnng approach was used to learn an nverse dynamcs model of the bo-nspred robot BoBped1 (see Fgure 7). Type of Model Learnng: The authors decded to use Gaussan process regresson (GPR) to learn the nverse model off-lne usng recorded tranng data. As shown n [2] GPR s suted well for off-lne model learnng. Fgure 7: Bo-Inspred robot BoBped1 (see [5]) Results: Usng the nverse dynamcs model a model-based feed-forward controller was mplemented and compared to a standard PD-controller. Furthermore a combnaton of the feed-forward controller and the feed-back controller was mplemented. All three approaches were evaluated n an experment, where the robot s standng on the ground wth both feet and performs a perodc up and down swngng moton usng both legs. Fgure 8 shows the jont error of all three approaches. Fgure 8: Bo-Inspred Moton Control Results (see [5]) 11
5 Concluson Model learnng can help to fnd propertes of a system whch s n ths case related to a robot. In 1 the model learnng problem was dscussed, descrbng dfferent types of models and a regresson soluton was shown by mnmzng a cost functon. Dfferent types of models where categorzed n 2 n more detal together wth learnng archtectures whch can be utlzed to learn a specfc model. Based on the regresson soluton to the model learnng problem locally weghted learnng (LWL) methods where dscussed n secton 3. Begnnng wth a very straghtforward approach (see 1), whch than was modfed n 3 to reduce the systems dmensonalty and n 4 and 5 to deal wth bg data-sets. The frst three examples show that applyng LWL methods to dfferent learnng scenaros s successful for learnng forward models, as well as nverse dynamcs models. The last applcaton example (see 4.4) shows that also other learnng methods (here GPR) can be appled successful to a learnng problem. However, the challenge n model learnng remans to deal wth hgh dmensonalty, complex algorthms and bg data sets. On-lne learnng s preferred for self-mproved whle performng a certan task but needs very effcent algorthms (e.g. GPR can not appled on-lne). References [1] Nguyen-Tuong, D., and Peters, J. Model learnng n robotcs: a survey. Cogntve Processng, 12(4) (2011), 319 340. Intellgent Autonomous Systems. [2] Nguyen-Tuong, D., Peters, J., Seeger, M., and Schölkopf, B. Learnng nverse dynamcs: a comparson. In Proceedngs of the European Symposum on Artfcal Neural Networks (ESANN) (2008). Intellgent Autonomous Systems. [3] Schaal, S., and Atkeson, C. G. Constructve ncremental learnng from only local nformaton. Neural Computaton 10 (1997), 2047 2084. [4] Schaal, S., Atkeson, C. G., and Vjayakumar, S. Scalable technques from nonparametrc statstcs for real tme robot learnng. Appled Intellgence 17 (2002), 49 60. 10.1023/A:1015727715131. [5] Scholz, D., Kurowsk, S., Radkhah, K., and von Stryk, O. Bo-nspred moton control of the musculoskeletal bobped1 robot based on a learned nverse dynamcs model. In Proc. 11th IEEE-RAS Intl. Conf. on Humanod Robots (Bled, Slovena, Oct. 26-28 2011). 12