Learning physical Models of Robots

Similar documents
Learning Inverse Kinematics

Support Vector Machines

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Machine Learning 9. week

Fast and Efficient Incremental Learning for High-dimensional Movement Systems

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

An Optimal Algorithm for Prufer Codes *

Smoothing Spline ANOVA for variable screening

Hermite Splines in Lie Groups as Products of Geodesics

Lecture 4: Principal components

Lecture 5: Multilayer Perceptrons

Edge Detection in Noisy Images Using the Support Vector Machines

S1 Note. Basis functions.

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

Feature Reduction and Selection

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Cluster Analysis of Electrical Behavior

User Authentication Based On Behavioral Mouse Dynamics Biometrics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

Programming in Fortran 90 : 2017/2018

Computer Animation and Visualisation. Lecture 4. Rigging / Skinning

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

High-Boost Mesh Filtering for 3-D Shape Enhancement

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

A Machine Learning Approach to Developing Rigid-body Dynamics Simulators for Quadruped Trot Gaits

Classifier Selection Based on Data Complexity Measures *

Active Contours/Snakes


Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Biostatistics 615/815

Adaptive Transfer Learning

LECTURE : MANIFOLD LEARNING

Load Balancing for Hex-Cell Interconnection Network

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

y and the total sum of

CS 534: Computer Vision Model Fitting

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

Training ANFIS Structure with Modified PSO Algorithm

GSLM Operations Research II Fall 13/14

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

Simulation Based Analysis of FAST TCP using OMNET++

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Mathematics 256 a course in differential equations for engineering students

The Codesign Challenge

Support Vector Machines

Artificial Intelligence (AI) methods are concerned with. Artificial Intelligence Techniques for Steam Generator Modelling

3. CR parameters and Multi-Objective Fitness Function

Parallel matrix-vector multiplication

An Entropy-Based Approach to Integrated Information Needs Assessment

Classification / Regression Support Vector Machines

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Fusion Performance Model for Distributed Tracking and Classification

Solving two-person zero-sum game by Matlab

A Binarization Algorithm specialized on Document Images and Photos

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

Cost-efficient deployment of distributed software services

S.P.H. : A SOLUTION TO AVOID USING EROSION CRITERION?

Inverse Kinematics (part 2) CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Spring 2016

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

Analysis on the Workspace of Six-degrees-of-freedom Industrial Robot Based on AutoCAD

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Meta-heuristics for Multidimensional Knapsack Problems

The Research of Support Vector Machine in Agricultural Data Classification

Kinematics of pantograph masts

Classification Based Mode Decisions for Video over Networks

A Robust LS-SVM Regression

Random Kernel Perceptron on ATTiny2313 Microcontroller

Specialized Weighted Majority Statistical Techniques in Robotics (Fall 2009)

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

[33]. As we have seen there are different algorithms for compressing the speech. The

AVO Modeling of Monochromatic Spherical Waves: Comparison to Band-Limited Waves

High resolution 3D Tau-p transform by matching pursuit Weiping Cao* and Warren S. Ross, Shearwater GeoServices

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Multi-stable Perception. Necker Cube

Review of approximation techniques

ROBOT KINEMATICS. ME Robotics ME Robotics

A SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES

Inverse kinematic Modeling of 3RRR Parallel Robot

ViSP: A Software Environment for Eye-in-Hand Visual Servoing

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

APPLICATION OF PREDICTION-BASED PARTICLE FILTERS FOR TELEOPERATIONS OVER THE INTERNET

Unsupervised Learning

Real-time Joint Tracking of a Hand Manipulating an Object from RGB-D Input

Angle-Independent 3D Reconstruction. Ji Zhang Mireille Boutin Daniel Aliaga

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005

Network Intrusion Detection Based on PSO-SVM

Support Vector Machines. CS534 - Machine Learning

THE PULL-PUSH ALGORITHM REVISITED

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Transcription:

Learnng physcal Models of Robots Jochen Mück Technsche Unverstät Darmstadt jochen.mueck@googlemal.com Abstract In robotcs good physcal models are needed to provde approprate moton control for dfferent types of robots. Classcal robotcs falls short, when models are hand-crafted and unmodeled features as well as nose cause problems and naccuracy. Based on the paper Scalable Technques from Nonparametrc Statstcs for Real Tme Robot Learnng [4] by Stefan Schaal, Chrstopher G. Atkeson and Sethu Vjayakumar ths revew paper dscusses the model learnng problem, descrbes algorthms for locally weghted learnng and presents some real world applcatons. 1 Introducton Moton-Control s one of the most mportant basc tasks n robotcs. The classcal robotcs approach, where a physcal model of the robot s hand-crafted by an engneer before the robot s actually used, quckly comes to ts lmts because of the lack of accuracy n the model and nose caused by unmodeled features and change n characterstcs due to envronmental nfluence. In classcal robotcs smple control laws (e.g. PID-Controller) are used to compensate these errors. In contrast the robot learnng approach s to constantly learn and mprove a physcal model of the robot usng recorded data from the robots jonts. The concept of self-mprovement has the flexblty to handle nose and unmodeled features. Hence, usng learned models can help generatng better control-laws, thus moton-control tasks can be performed wth hgher accuracy. Challenges n model learnng for robots are the hgh dmensonalty of the problem (e.g up to 90 dmensons for a humanod robot) and the need to calculate and mprove the model n real-tme gven a contnuous stream of data. Furthermore usng on-lne robot learnng on autonomous systems requres fast algorthms snce hgh performance hardware usually s not avalable. There are many dfferent learnng methods to solve a learnng problem. But snce large amounts of data has to be handled and nexpensve algorthms are preferred for on-lne model learnng, not all of these can be appled to model learnng for robots. In ths paper locally weghted learnng methods (LWL) are descrbed, whch are sutable for robot learnng problems. Before the algorthms are presented n secton 3, some foundatons of model learnng are dscussed n the next secton whch wll also gve a short overvew about statstcs, dfferent types of models and learnng archtectures. Fnally real-world model learnng applcatons are descrbed n secton 4. 1

2 Foundatons of Model Learnng Ths sectons gves an overvew about model learnng. Frst the Model Learnng Problem and a soluton usng regresson methods and gradent descent search are shown. Afterwards dfferent types of models are explaned. At last learnng archtectures, whch make use of models, are descrbed. 2.1 The Model Learnng Problem In robot control, generally dfferent model learnng problems are nterestng. These are: Forward Knematcs Inverse Knematcs Forward Dynamcs Inverse Dynamcs x = f(q) ẋ = J(q) q (1) ẍ = J(q) q + J(q) q q = f 1 (x) (2) q = M 1 (q)(u c( q, q) g(q)) (3) u = M(q) q d + c( q, q) + g(q) (4) All of these model learnng problems except the nverse knematcs can be solved usng regresson methods. The forward dynamcs equaton 3 descrbes the jont acceleratons q gven the appled torque u usng the physcal parameters M 1, c( q, q) and g(q). In contrast the nverse dynamcs equaton 4 descrbes whch torques have to be appled to the actuators to acheve the desred jont acceleratons q d. Hence, the task to learn a physcal model of a robot s to determne the parameters M, c( q, q) and g(q). More generally the model can be wrtten as: y = f θ (x) + ɛ (5) The functon f θ (x) can also be wrtten as φ(x) T θ wth parameters θ and features φ. ɛ denotes Gaussan dstrbuted nose. The goal of model learnng s to fnd parameters θ, so that a cost-functon J s mnmal. J usually s defned as a least squares cost-functon: Whch can also be wrtten as: J = 1 2 N (y f θ (x )) 2 (6) =1 J = 1 2 (Y φθ)t (Y φθ) (7) To fnd parameters θ wth mnmum cost-functon J can be done wth gradent descent search (see secton 2.1.1). A closed form for the soluton can be derved from gradent descent search and can be wrtten as: θ = (φ T φ) 1 φ T Y (8) Equatons 5 to 8 show how a model learnng problem looks lke and how t can be solved usng basc lnear regresson methods. 2

2.1.1 Gradent Descent Search Fgure 1: Gradent Descent Search Gradent descent search s an teratve method to fnd a local mnma of a mult-varable functon F(x). The dea s to teratvely take small steps nto the drecton of the functon s gradent at a certan pont x k, gven a random startng pont. The step-sze can be modfed wth a parameter γ. The teraton-step can be wrtten as: x k+1 = x k γ F(x k ) (9) Fgure 1 shows how the gradent descent search methods works n a quadratc functon. 2.2 Types of Models The goal of model learnng s to learn the behavor of the system gven some observed quanttes. Therefore mssng nformaton has to be predcted. Dependng on what knd of nformaton s mssng, dfferent types of models can be defned (see [1] secton 2.1). Forward Models predct a future state of the system s k+1 gven the current state and acton s k and a k. Snce the forward model drectly descrbes the physcal propertes of the system, whch represents a causal relatonshp between states and actons, learnng a forward model s a well-defned problem. Inverse Models n contrast are used to compute an acton a k whch s needed to get from state s k to s k+1. Snce ths mappng n general s not unque, learnng an nverse model s not always a well-defned problem. But for a robot s nverse dynamcs t s well-defned, thus the robot s nverse dynamcs model can be learned usng regresson methods. Forward and nverse models are the most mportant types of models. Although a combnaton of both models can be useful. The forward model can help to create a unque mappng for an nverse model learnng problem. These models are called Mxed Models. Furthermore n some applcatons t s mportant to know not just the next state of the system, but several states n the future. Models that provde ths nformaton are called Mult-Step Predcton Models. 3

2.3 Learnng Archtectures Dependng on the type of model and the learnng problem dfferent learnng archtectures can be defned. The learnng archtecture descrbes what quanttes are observed to learn the model (see [1] secton 2.2). Usng Drect Modelng a model s learnt by observng the system s nputs and outputs. The dea s to obtan the systems behavor and therefore the model by observng an nput acton and the resultng output state of the system. Ths learnng archtecture can bascally be used to learn all types of models f the learnng problem s well-defned. An Indrect Modelng learnng approach s feedback error learnng, where a feedforward controller s beng learned from a feedback controllers error sgnals. If the learned forward model s perfect, the feedback controller wll not nfluence the control system any more, otherwse the error sgnal s used to update the model. Thus ths learnng archtecture always tres to mnmze the error. Although t can deal wth ll-posed problems such as nverse knematcs learnng. One drawback of ths archtecture s that t has to be appled on-lne n order to get the actual feedback controller s error sgnals. Indrect Modelng can be used to learn nverse models and mxed models. In the Dstal Teacher Approach a unque forward model acts as a teacher to obtan an learned nverse models error and therefore help to update the nverse model. The goal s to mnmze the error. The dea s that the nverse model results n a correct soluton for a desred trajectory when the error between the output of the forward model and the nput of the nverse model s mnmzed. The Dstal Teacher Approach can only be appled for learnng mxed models. 3 Locally Weghted Learnng Locally Weghted Learnng (LWL) s one approach to learn models from tranng data. As descrbed before the am s to fnd the functon f θ (x) n the model equaton 5. The dea of LWL methods s to approxmate ths non-lnear functon by means of pecewse lnear models. The man challenge s to fnd the regon where the local model s vald (receptve feld). Here, for each lnear model the followng receptve feld s used: w k = exp( 1 2 (x c k) T D k (x c k )) (10) The advantage of LWL methods s that we do not need a feature vector φ, whch s usually hand-crafted. The remander of ths secton descrbes four dfferent LWL methods. 3.1 Locally Weghted Regresson Locally Weghted Regresson (LWR) extends the standard regresson method as shown n equaton 8 by a weght matrx whch determnes how much nfluence the data next to the query pont has to the local lnear model. Ths only requres the learnng system to have suffcent tranng data n memory. Thus the model can easly be updated by addng new tranng data. An algorthm for a predcton ŷ q for a query pont x q s shown below (Algorthm 1). Ths algorthm seems quet complex on a frst vew, but snce the wegh-matrx W ensures that data-ponts whch are not close to the query-pont wll be equal zero, the matrx multplcaton and pseudo nverse smplfes a lot. Nevertheless the complexty rses wth the dmensonalty of the system. The not yet defned varable n the above algorthm s the parameter D whch s the dstance matrx of the receptve feld. Thus ths parameter descrbes how bg the regon of valdty around a query-pont s. D can be optmzed usng Leave-One-Out Cross Valdaton (Algorthm 2), when suffcent tranng data s recorded. In purpose to reduce the number of parameters, D s assumed to be a global dagonal matrx multpled by a scalng factor h. Ths scalng factor s now the only parameter to be optmzed. Leave-One-Out Cross Valdaton predcts a value for a query-pont whch s left out of the tranng data and compares 4

Algorthm 1 Locally Weghted Regresson Gven: x q (Query Pont), p (Tranng Ponts {x, y }) Compute weght-matrx W: w = exp( 1 2 (x x q ) T D(x x q )) Buld matrx X and vector y such that: X = ( x 1, x 2..., x p ) T where x = [(x x q ) T 1] T y = (y 1, y 2,..., y p ) T Compute locally lnear model: β = (X T WX) 1 X T Wy Compute predcton x q : ŷ q = β n+1 the predcton afterwards to the actual sample value, resultng n an error. Ths s repeated for all tranng ponts. The factor h s chosen to acheve a mnmal error. Algorthm 2 Leave-One-Out Cross Valdaton Gven: a set H of reasonable values h r for all h r H do sse r = 0 for = 1 : p do x q = x Temporarly exclude {x, y } from tranng data Compute LWR predcton ŷ q wth reduced data sse r = sse r + (y ŷ q ) 2 end for end for Choose optmal h r such that h r = mn{sse r } 3.2 Locally Weghted Partal Least Squares Snce the complexty of the LWR Algorthm (Algorthm 1) rses wth the nput dmensonalty LWR can be slow for hgher dmensons. Furthermore the matrx nverson step can become numercally unstable f there are redundant nput dmensons. Locally Weghted Partal Least Squares (LWPLS) takes care of these problems. In ths approach Partal Least Squares (PLS) s used to reduce the complexty of the problem. PLS s based on a lnear transton from the hgh dmensonalty of the nput to a new varable space based on lower dmensonal orthogonal factors. Ths means, that those orthogonal factors are ndependent lnear combnatons of the orgnal nput. These projectons are used to calculate a predcton for a query-pont (see Algorthm 3). The only undefned parameter s the number of projectons r. Snce the squared error for each new projecton should be reduced, addng new projectons can be stopped f the rate of error reducton learnng tasks. res2 res 2 1 < φ s not hgh enough any more. In [4] φ = 0.5 was used for all 5

Algorthm 3 Locally Weghted Partal Least Squares Gven: x q (Query Pont), p (Tranng Ponts {x, y }) Compute weght-matrx W: w = exp( 1 2 (x x q ) T D(x x q )) Buld matrx X and vector y such that: x = p =1 w x / p =1 w β 0 = p =1 w y / p =1 w X = ( x 1, x 2,..., x p ) T where x = (x x) y =(ỹ 1, ỹ 2,..., ỹ p ) T where ỹ = (y β 0 ) Compute locally lnear model: Intalze: Z 0 = X, res 0 = y for = 1 : r do u = Z T 1 Wres 1 s = Z 1 u β = st Wres 1 s T Ws p = st WZ 1 s T Ws res = res 1 s β Z = Z 1 s p end for Compute predcton x q : Intalze: z 0 = x q x, ỹ q = β 0 for = 1 : r do s = z T 1 u ỹ q ỹ q + s β z = z 1 s p T end for 3.3 Receptve Feld Weghted Regresson When tranng data s receved constantly by the learnng system lke n on-lne learnng scenaros, the data set becomes very large. In ths case LWR and LWPLS fall short because of hgh computatonal cost. Instead of computng a local model when a predcton has to be made, Receptve Feld Weghted Regresson (RFWR) teratvely bulds new local models when tranng data s added. Thus the predcton for a query-pont can be computed as the weghted average over the predctons of all local models: K k=1 ŷ q = w k ˆ K k=1 w k y q,k (11) Algorthm 4 shows how the local models are updated. Lke n LWL and LWPLS the only Algorthm 4 Receptve Feld Weghted Regresson Gven: a tranng pont (x, y) Update K local models: w k = exp( 1 2 (x c k) T D k (x c k )) β n+1 k = βk n + w kp n+1 k xe cv,k where x = [(x c k ) T 1] and P n+1 k Compute predcton x q : ŷ k = βk T x 1 = 1 λ (Pk n Pn k x xt P n k λ w k + x T P n k x) and e cv,k = (y βk nt x) open parameter s the dstance matrx D. Snce RFWR uses several local models, we can use a dfferent dstance matrx D for each of these models. In [3] the followng cost-functon 6

for a gradent descent update of D was used: J = 1 k =1 w k =1 w y ŷ 2 (1 w x T P x ) + γ n 2,j=1 D 2 j (12) Note that RWFR s knd of an ncremental verson of LWR, thus t also can not deal wth very hgh dmensonal problems. 3.4 Locally Weghted Projecton Regresson For hgher dmensonal problems LWPLS was used to reduce the dmensonalty and therefore the complexty of the problem. Usng LWPLS on-lne wth huge data-sets relates to the problem of LWR whch was solved there usng ncremental model updates leadng to RWFR. The dea of Locally Weghted Projecton Regresson s to formulate an ncremental verson of LWPLS whch can deal wth hgh dmensonalty and huge data-sets. Algorthm 5 shows how one local model s updated. Algorthm 5 Locally Weghted Projecton Regresson Gven: a tranng pont (x, y) Update the means of nputs and outputs: x n+1 0 = λw n x n 0 +wx W n+1 β0 n+1 = λw n β n 0 +wy W n+1 where W n+1 = λw n + w Update the local model: Intalze z 0 = x x n+1 0, res 0 = y β0 n+1 for = 1 : r do u n+1 = λu n + wz 1 res 1 s = z T 1 un+1 SS n+1 = λss n + ws2 = λsr n + ws2 res 1 SZ n+1 = λsz n + wz 1s SR n+1 β n+1 p n+1 = SRn+1 SS n+1 = SZn+1 SS n+1 z = z +1 s p n+1 res = res 1 s β n+1 SSE n+1 = λsse n + wres2 end for Lke n RFWR the dstance matrx D s updated usng gradent descent for each local model (see [4]). 7

4 Model Learnng Applcatons In ths secton some very dfferent model learnng applcatons are shown. For every applcaton nformaton about the task that has to be learned, the type of model learnng method used as well as the results are presented. The frst three example applcatons use dfferent LWL methods, whle the last example uses Gaussan process regresson (GPR). Ths example ntents to show that other model learnng methods can be appled to specfc problems but wll not gve an detaled look nto GPR. 4.1 Learnng Devl Stckng Fgure 2: Devl Stckng Task a) and Robot Layout b) (see [4]) Task Descrpton: Jugglng a center stck between two control stcks. The center stck s lfted alternately by the two control stcks to joggle t. The task s to learn a contnuous left-rght-left pattern. Ths task s modeled as a dscrete functon that maps mpact states from one hand to the other. A state s descrbed as a vector x = (p, θ, ẋ, ẏ, θ) T wth mpact poston, angle, veloctes of the center of the center stck and ts angular velocty. The task command u = (x h, y h, θ, v x, v y ) T wth a catch poston, and angular trgger velocty and the two dmensonal throw drecton. Type of Model Learnng: The robot learns a forward model gven the current state and acton and predctng the next state. Ths problem has a 10 dmensonal nput and a fve dmensonal output and therefore s deally suted for LWR methods. Furthermore tranng data s only generated wth 1-2Hz, every tme the center stck hts one of the control stcks. Results: More than 1000 consecutve hts where counted as a successful tral, whch was acheved n about 40-80 trals (see Fgure 3). Ths s a remarkable result snce even for humans devl stckng s a dffcult task and a lot of trals are needed for an untraned human. Fgure 3: Devl Stckng Results (see [4]) 8

4.2 Learnng Pole Balancng Fgure 4: Pole Balancng a) and Results b) (see [4]) Task Descrpton: In ths applcaton balancng a pole up-rght on a robots fnger s the learnng task (see Fgure 4a). The robot arm has 7 degree-of-freedom. Gven the nverse dynamcs model of the robot the goal of the learnng problem was to learn task level commands lke Cartesan acceleratons of the robot s fnger. Type of Model Learnng: On-lne learnng usng RFWR was used wth nput data from a stereo camera system observng the pole. Thus nput data s 12 dmensonal: 3 postons of the lower pole end, 2 angular postons, the 5 correspondng veloctes and two horzontal acceleratons of the robot s fnger. The output that predcts the next state of the pole therefore s 10 dmensonal. So RFWR was used to learn a forward model of the pole to predct the next state. Results: Whle learnng the model from scratch took the system about 10-20 trals to keep the pole up-rght for longer then a mnute, observng a human teacher who demonstrated pole balancng for 30 seconds helped to extract the forward model so that the system fulflled the task on a sngle tral (see Fgure 4b). Ths could also be shown usng dfferent poles and demonstratons from dfferent people. 9

4.3 Inverse Dynamcs Learnng Task Descrpton: Computng a nverse dynamcs model 4 by hand from rgd-body dynamcs does not lead to a satsfyng soluton snce a lot of the systems propertes can not be modeled accurately. Learnng the nverse dynamcs model of a 7-degree-of-freedom robot arm was accomplshed n ths example applcaton (=21 nput dmensons). In a second example the authors learned the nverse dynamcs of a humanod robots shoulder motor wth 30 degrees-of-freedom (=90 nput dmensons). Type of Model Learnng: In both nverse dynamcs learnng problems LWPR was appled. Ths fts the need of handlng hgh dmensonalty nput and copng wth bg data sets. Results: For the 7-degree-of-freedom robot arm Fgure 5 shows the results compared to a parametrc nverse dynamcs model. The normalzed Mean-Squared-Error (=nmse) converges after about 500.000 tranng ponts to nm SE = 0.042. Durng learnng LWPR created about 300 local models. Fgure 5: Inverse dynamcs learnng results wth 7DOF Robot (see [4]) The results of the second example are shown n Fgure 6. Here a lot more tranng ponts where needed, but LWPR outperformed the parametrc model quckly. On ths hgh dmensonal learnng problem LWPR created more than 2000 local models. Fgure 6: Inverse dynamcs learnng results wth 30DOF Robot (see [4]) 10

4.4 Bo-Inspred Moton Control Based on a Learned Inverse Dynamcs Model Task Descrpton: The prevous example shows that learnng an nverse dynamcs model leads to better results n comparson to manually computed models. Ths was shown on a rgd body robot wth stff jonts. The problem of accurately modelng all system propertes gets even worse when elastc (bo-nspred) jonts are used. In ths example a dfferent model learnng approach was used to learn an nverse dynamcs model of the bo-nspred robot BoBped1 (see Fgure 7). Type of Model Learnng: The authors decded to use Gaussan process regresson (GPR) to learn the nverse model off-lne usng recorded tranng data. As shown n [2] GPR s suted well for off-lne model learnng. Fgure 7: Bo-Inspred robot BoBped1 (see [5]) Results: Usng the nverse dynamcs model a model-based feed-forward controller was mplemented and compared to a standard PD-controller. Furthermore a combnaton of the feed-forward controller and the feed-back controller was mplemented. All three approaches were evaluated n an experment, where the robot s standng on the ground wth both feet and performs a perodc up and down swngng moton usng both legs. Fgure 8 shows the jont error of all three approaches. Fgure 8: Bo-Inspred Moton Control Results (see [5]) 11

5 Concluson Model learnng can help to fnd propertes of a system whch s n ths case related to a robot. In 1 the model learnng problem was dscussed, descrbng dfferent types of models and a regresson soluton was shown by mnmzng a cost functon. Dfferent types of models where categorzed n 2 n more detal together wth learnng archtectures whch can be utlzed to learn a specfc model. Based on the regresson soluton to the model learnng problem locally weghted learnng (LWL) methods where dscussed n secton 3. Begnnng wth a very straghtforward approach (see 1), whch than was modfed n 3 to reduce the systems dmensonalty and n 4 and 5 to deal wth bg data-sets. The frst three examples show that applyng LWL methods to dfferent learnng scenaros s successful for learnng forward models, as well as nverse dynamcs models. The last applcaton example (see 4.4) shows that also other learnng methods (here GPR) can be appled successful to a learnng problem. However, the challenge n model learnng remans to deal wth hgh dmensonalty, complex algorthms and bg data sets. On-lne learnng s preferred for self-mproved whle performng a certan task but needs very effcent algorthms (e.g. GPR can not appled on-lne). References [1] Nguyen-Tuong, D., and Peters, J. Model learnng n robotcs: a survey. Cogntve Processng, 12(4) (2011), 319 340. Intellgent Autonomous Systems. [2] Nguyen-Tuong, D., Peters, J., Seeger, M., and Schölkopf, B. Learnng nverse dynamcs: a comparson. In Proceedngs of the European Symposum on Artfcal Neural Networks (ESANN) (2008). Intellgent Autonomous Systems. [3] Schaal, S., and Atkeson, C. G. Constructve ncremental learnng from only local nformaton. Neural Computaton 10 (1997), 2047 2084. [4] Schaal, S., Atkeson, C. G., and Vjayakumar, S. Scalable technques from nonparametrc statstcs for real tme robot learnng. Appled Intellgence 17 (2002), 49 60. 10.1023/A:1015727715131. [5] Scholz, D., Kurowsk, S., Radkhah, K., and von Stryk, O. Bo-nspred moton control of the musculoskeletal bobped1 robot based on a learned nverse dynamcs model. In Proc. 11th IEEE-RAS Intl. Conf. on Humanod Robots (Bled, Slovena, Oct. 26-28 2011). 12