An Introduction to Distributed Robotic Systems Alcherio Martinoli School of Architecture, Civil and Environmental Engineering Distributed Intelligent Systems and Algorithms Laboratory École Polytechnique Fédérale de Lausanne CH-1015 Lausanne, Switzerland http://disal.epfl.ch/ alcherio.martinoli@epfl.ch EPFL, Topics in Autonomous Robotics, April 23, 2012
Outline Part I (9:15 am - noon) Distributed Robotics: Background Machine-Learning Methods for Distributed Control Particle Swarm Optimization (PSO) Noise-resistant algorithms Collective-specific issues for distributed control design and optimization Particle Swarm Optimization applied to multi-robot systems Conclusion Part I
Outline Part II (2:15-5pm) p.m.) Model-Based Methods Bottom-up and top down approaches Multi-level approach Model Calibration Examples Linear example Nonlinear example Conclusion Part II
Background: Multi-Robot Systems
Typical Tasks for Multirobot Teams Mapping and exploration Target tracking Carrying objects Playing team games CMPack; 2002 Construction Caloud et al; 1990 Inspection Sensing and monitoring From N. Kalra, PhD CMU, 2006
Why Multiple Robots? Faster execution More robust Simpler robots More solutions Task requires it From N. Kalra, PhD CMU, 2006
Why Not Multiple Robots? Need comms More complexity Harder to test N x the trouble Expensive From N. Kalra, PhD CMU, 2006
Goal of a Multirobot Team Coordinate to distribute tasks and resources in a way that leads to efficient and robust mission completion.
Challenges Dynamic events Changing gtask demands Resource failures Adversaries Limited sensing, planning, and actuation Limited communication Diverse team
Machine-Learning: i Rationale and Terminology
Why Machine-Learning? Complementary to a model-based/engineering approaches: when low-level details matter (optimization) and/or good models do not exist (design)! When the design/optimization space is too big (infinite)/too computationally expensive (e.g. NPhard) to be systematically searched Automatic design and optimization techniques Role of the engineer refocused to performance specification,,problem encoding, and customization of algorithmic parameters (and perhaps operators)
Why Machine-Learning? There are design and optimization techniques robust to noise, nonlinearities, discontinuities Possible search spaces: parameters, rules, SW or HW structures/architectures Individual real-time adaptation to new environmental conditions; i.e. increased individual flexibility when environmental conditions are not known/cannot predicted a priori
ML Techniques: Classification Axis 1 Supervised learning: off-line, a teacher is available Unsupervised learning: off-line, teacher not available Reinforcement-based (or evaluative) learning: online, no pre-established training and evaluation data sets
ML Techniques: Classification i Axis 2 In simulation: reproduces the real scenario in simulation and applies there machine-learning techniques; the learned solutions lti are then downloaded ddonto real lhardware when certain criteria are met Hybrid: most of the time in simulation (e.g. 90%), last period (e.g. 10%) of the learning process on real hardware Hardware-in-the-loop: from the beginning on real hardware (no simulation). Depending on the algorithm more or less rapid
ML Techniques: Classification i Axis 3 ML algorithms require often fairly important computational resources (in particular for populationbased algorithms), therefore a further classification is: On-board: machine-learning algorithm run on the system to be learned (no external unit) Off-board: the machine-learning algorithm runs off-board and the system to be learned just serves as embodied implementation of a candidate solution
ML Techniques: Classification Axis 4 Population-based ( multi-agent ): a population of candidate solutions is maintained by the algorithm, for instance Genetic Algorithms (individuals), Particle Swarm Optimization (particles), Particle Filter (particle), Ant Colony Optimization (tour solution set, etc.); it can refer also to the fact that multiple agents are needed to build the solution set, for instance in virtual ants as core multi-agent principle in Ant Colony Optimization Hill-climbing ( single-agent ): the machine-learning algorithm work on a single candidate solution and try to improve on it; for instance, (stochastic) gradient-based methods, reinforcement learning algorithms
Particle Swarm Optimization
Flocking and Swarming Inspiration Principles: Imitate Evaluate Compare
PSO: Terminology Population: set of candidate solutions tested in one time step, consists of m particles (e.g., m = 20) Particle: represents a candidate solution; it is characterized by a velocity vector v and a position vector x in the hyperspace of dimension D Evaluation span: evaluation period of each candidate solution during a single time step (algorithm iteration); as in GA the evaluation span might take more or less time depending on the experimental scenario Life span: number of iterations a candidate solution survives Fitness function: measurement of efficacy of a given candidate solution during the evaluation span Population manager: update velocities and position for each particle according to the main PSO loop
Evolutionary Loop: Several Generations Start Initialize particles Perform main PSO loop Ex. of end criteria: # of time steps best solution performance End criterion i met? Y End N
Initialization: Positions and Velocities
The Main PSO Loop Parameters and Variables Functions rand ()= uniformly distributed random number in [0,1] Parameters w: velocity inertia (positive scalar) c p : personal best coefficient/weight (positive scalar) c n : neighborhood best coefficient/weight (positive scalar) Variables x ij (t): position of particle i in the j-th dimension at time step t (j = [1,D]) v ij (t): velocity particle i in the j-th dimension at time step t x * ij ( t ) : position of particle i in the j-th dimension i with maximal fitness up to iteration t * x i j ( t) : position of particle i in the j-th dimension having achieved the maximal fitness up to iteration t in the neighborhood of particle i
The Main PSO Loop (Eberhart, Kennedy, and Shi, 1995, 1998) At each time step t for each particle i for each component j update the velocity v ( t + 1) = wv ( t ) + c ij ( ij p rand()( x * ij * x ) + c rand()( x ij n i j x ij ) then move x ( t + 1) = x ( t ) + v ( t + 1 ) ij ( ij ij
The main PSO Loop - Vector Visualization Here I am! * xi My position for optimal fitness up to date x i * x i v i The position with optimal fitness of my neighbors up to date
Neighborhood Types Size: Neighborhood index considers also the particle itself in the counting Local: only k neighbors considered over m particles in the population (1 < k < m); k=1 means no information from other particles used in velocity update Global: m neighbors Topology: Geographical Social Indexed Random
Neighborhood Examples: Geographical vs. Social geographical social
Neighborhood Example: Indexed and Circular (lbest) Particle 1 s 3- neighbourhood 8 1 2 7 3 Virtual circle 6 4 5
A Simple PSO Animation M. Clerc
GA vs. PSO in Absence of Noise
GA vs. PSO Overview According to most recent research, PSO outperforms GA on most (but not all!) continuous optimization problems GA still much more widely used in general research community and robust to continuous and discrete optimization problems Because of random aspects, very difficult to analyze either metaheuristic or make guarantees about performance
Benchmark Functions Goal: minimization of a given f(x) Standard benchmark functions with thirty terms (n = 30) and a fixed number of iterations All x i constrained to [-5.12, 5.12]
Performance with Small Populations GA: Roulette Wheel for selection, mutation applies numerical adjustment to gene PSO: lbest ring topology with neighborhood of size 3 Algorithm parameters used (but not thoroughly optimized):
Performance with Small Populations Bold: best results; 30 runs; no noise on the performance function Function GA PSO (no noise) (mean ± std dev) (mean ± std dev) Sphere 0.02 ± 0.01 0.00 ± 0.00 Generalized Rosenbrock 34.6 ± 18.9 7.38 ± 3.27 Rastrigin 157 ±21.8 48.3 ± 14.4 Griewank 001± 0.01 001 0.01 001± 0.01 003 0.03
Expensive Optimization and Noise Resistance
Expensive Optimization Problems Two fundamental reasons making robot control design and optimization expensive in terms of time: 1. Time for evaluation of candidate solutions (e.g., tens of seconds) )>>ti time for application of metaheuristic operators (e.g., milliseconds) 2. Noisy performance evaluation disrupts the adaptation process and require multiple evaluations for actual performance Multiple evaluations at the same point in the search space yield different results Noise causes decreased convergence speed and residual error
Reducing Evaluation Time General recipe: exploit more abstracted, calibrated representations (models and simulation tools) See also part II
Dealing with Noisy Evaluations Better information about candidate solution can be obtained by combining multiple noisy evaluations We could evaluate systematically each candidate solution for a fixed number of times not efficient from a computational perspective We want tto dedicate di more computational ti ltime to evaluate promising solutions and eliminate as quickly as possible the lucky ones each candidate solution might have been evaluated a different number of times when compared In GA good and robust candidate solutions survive over generations; in PSO they survive in the individual memory Use dedicated functions for aggregating multiple evaluations: e.g., minimum and average or generalized aggregation functions (e.g., quasi-linear weighted means), perhaps combined with a statistical test for comparing resulting aggregated performances
GA PSO
Testing Noise-Resistant GA and PSO on Benchmarks Benchmark 1 : generalized Rosenbrock function 30 real parameters Minimize objective function Expensive only because of noise Benchmark 2: obstacle avoidance on a robot 22 real parameters Maximize objective function Expensive because of noise and evaluation time
Benchmark 1: Gaussian Additive Noise on Generalized Rosenbrock Fair test: same number of evaluations candidate solutions for all algorithms (i.e. n generations/ iterations i of standard d versions compared with n/2 of the noise-resistant ones)
Benchmark 2: Obstacle Avoidance on a Mobile Robot Similar to [Floreano and Mondada 1996] Discrete-time, single-layer, artificial recurrent neural network controller Shaping of neural weights and biases (22 real parameters) fitness function: rewards speed, straight movement, avoiding obstacles Different from [Floreano and Mondada 1996] Environment: bounded dopen-space of 2x2 m instead of a maze A t l lt [P h J EPFL PhD Th i N Actual results: [Pugh J., EPFL PhD Thesis No. 4256, 2008] Similar results: [Pugh et al, IEEE SIS 2005] V = average wheel speed, Δv = difference between wheel speeds, i = value of most active proximity sensor
Baseline Experiment: Extended-Time Adaptation [Pugh J., EPFL PhD Thesis No. 4256, 2008] Compare the basic algorithms with their corresponding noiseresistant version Population size 100, 100 iterations, evaluation span 300 seconds (150 s for noiseresistant algorithms) 34.7 days Fair test: same total evaluation time for all the algorithms Realistic simulation (Webots) Best evolved solutions averaged over 30 evolutionary runs Best candidate solution in the final pool selected based on 5 runs of 30 s each; performance tested over 40 runs of 30s each Similar performance for all algorithms
Where Can Noise-Resistant Algorithms Make the Difference? Limited adaptation time Sudden change in noise distribution Large amount of noise Notes: all examples from shaping obstacle avoidance behavior best evolved solution averaged over multiple runs fair tests: same total amount of evaluation time for all the different algorithms (standard and noise-resistant)
Limited-Time Adaptation Especially good with small populations Varying population size vs. number of iterations Total adaptation time = 8.3 hours (1/100 of previous learning time) Trade-offs: population size, number of iterations, evaluation span, number of re- evaluations Realistic simulation (Webots) [Pugh J., EPFL PhD Thesis No. 4256, 2008]
Sudden Change of Noise Distributions Move from realistic simulation (Webots) to real robots after 90% learning (even faster evolution) Noise-resistance helps speeding up learning after transition [Pugh J., EPFL PhD Thesis No. 4256, 2008]
Increased Noise Level Set-Up Scenario 1: One robot learning obstacle avoidance Scenario 2: One robot learning obstacle avoidance, one robot running pre-evolved obstacle avoidance Scenario 3: Two robots co-learning obstacle avoidance 1x1 m arena, PSO, 50 iterations, scenario 3 Idea: more robots more noise (as perceived from an individual robot); no standard com between een the robots but in scenario 3 information sharing through the population manager! [Pugh et al, IEEE SIS 2005]
Increased Noise Level Sample Results [Pugh et al, IEEE SIS 2005]
From Single to Multi-Unit Systems: Co-Adaptation in a Shared World
Adaptation in Multi-Robot Scenarios Collective: fitness become noisy due to partial perception, independent parallel actions
Credit Assignment Problem With limited communication, no communication at all, or partial perception:
Co-Adaptation in a Collaborative Framework
Co-Shaping Collaborative Behavior Three orthogonal axes to consider (extremities and balanced solutions are possible): 1. Performance evaluation: individual vs. group fitness or reinforcement 2. Solution sharing: private vs. public policies 3. Team diversity: i homogeneous vs. heterogeneous learning
Multi-Agent Algorithms for Multi-Robot Systems Policy Performance Sharing Diversity i-pr-he individual private heterogeneous i-pr-ho individual private homogeneous i-pu-he individual public heterogeneous i-pu-ho individual public homogeneous g-pr-he group private heterogeneous g-pr-ho group private homogeneous g-pu-he group public heterogeneous g-pu-ho group public homogeneous
MA Algorithms for MR Systems Do not make sense (inconsistent) Possible but not scalable Interesting g( (consistent) Policy Performance Sharing Diversity i-pr-he individual private heterogeneous i-pr-ho individual private homogeneous i-pu-he individual public heterogeneous i-pu-ho individual public homogeneous g-pr-he group private ht heterogeneous g-pr-ho group private homogeneous g-pu-he group public heterogeneous g-pu-ho group public homogeneous
MA Algorithms for MR Systems Example of collaborative co-learning with binary encoding of 100 candidate solutions and 2 robots
Co-Shaping Competitive Behavior fitness f 1 fitness f 2
Co-Shaping Obstacle Avoidance Effects of robotic group size Effects of robotic group size Effects of communication constraints
MA Algorithms for MR Systems
Distributed Robotic Adaptation For an individual-public-heterogeneous policy: Standard solution: off-board adaptation using a centralized population manager New solution [Pugh, SIJ 2009]: on-board on-board adaptation using a distributed population manager Fully scalable solution, no single-point of failure Why PSO: appears interesting since this metaheuristic works well with small pools of candidate solutions: candidate pool size robot team size limited particle neighborhood sizes scalable, on-board operation
Varying the Robotic Group Size
Varying the Robotic Group Size
Varying the Robotic Group Size
Varying the Robotic Group Size
Varying the Robotic Group Size Evolving vs. Testing Environment PSO only; number of particles = 10 Gradually increase number of robots on team Up to 10x faster learning with little performance loss Arena 3x3 m [Pugh and Martinoli, Swarm Intelligence J., 2009]
Communication-Based Neighborhoods Default neighborhood - ring topology, 2 neighbors for each robot Problem for real robots: neighbor could be very far away Possible solutions use two closest robots in the arena (capacity limitation), use all robots within wt some radius adus r (a (range limitation); tato reality is affected often by both How will this affect the performance?
Communication-Based Neighborhoods
Communication-Based Nihb Neighborhoodsh Ring Topology - Standard
Communication-Based Neighborhoods 2-closest Model A
Communication-Based Nihb Neighborhoodsh Within range r Model B
Communication-Based Neighborhoods [Pugh and Martinoli, Swarm Intelligence J., 2009] Realistic simulation results (Webots) 10 Khepera III robots (one particle per robot) Standard neighborhood is 2 fixed particles New particle neighborhoods defined by communication limitations: Model A: particles from closest two robots Model B: particles from all robots within range r = 120 cm No drawback to communication-based neighborhoods
Varying the Communication Range Arena: 3x3 m, 10 robots total
Varying the Communication Range Realistic simulation results (Webots) Low communication ranges learning too slowly High communication ranges fitness plateaus too early Limited communication range gives best performance scalable solution [Pugh and Martinoli, Swarm Intelligence J., 2009]
Distributed Adaptation with Real Robots - Enabling Technology On-board relative positioning system Principle: belt of IR emitters (LED) and receivers (photodiode) IR LED used as antennas; modulated light (carrier 10.7 MHz) RF chip behind, measured RSSI Measure range & bearing of the next robot (relative positioning) can be coupled with RF channel (e.g. 802.11) for heading assessment Can also be used for 20 kbit/s com channel [Pugh et al., IEEE Trans. on Mechatronics, 2009]
Distributed Adaptation with Real Robots - Sample Results All operations on-board using relative positioning system Effective learning of obstacle avoidance behavior Similar results for communication-based neighborhoods Arena 3 x 3 m [Pugh and Martinoli, Swarm Intelligence J., 2009]
Distributed Adaptation with Real lrobots Before adaptation (5x speed-up)
Distributed Adaptation with Real lrobots After adaptation (5x speed-up)
Conclusion
Take Home Messages Evaluative machine-learning techniques are key for robotic adaptation Computationally efficient, noise-resistant algorithms can be obtained with a simple aggregation criterion in the main algorithmic loop Several successful strategies have been designed for dealing with collective-specific problems; they differentiate along three axes: public/private; homogeneous/heterogeneous, individual/group Multi-robot platforms can be exploited for testing in parallel multiple candidate solutions. PSO appears to be well suited for fully distributed on- board operation and strategies based local proximitybased neighborhood have been successfully tested
Additional Literature Part I Books Kennedy J. and Eberhart R. C. with Y. Shi, Swarm Intelligence. Morgan Kaufmann Publisher, 2001. Clerc M., Particle Swarm Optimization. ISTE Ltd., London, UK, 2006. Engelbrecht A. P., Fundamentals of Computational Swarm Intelligence. John Wiley & Sons, 2006. Papers Pugh J. and Martinoli A., Distributed Scalable Multi-Robot Learning using Particle Swarm Optimization. Swarm Intelligence Journal, 3(3): 203-222, 222, 2009. Pugh J., Raemy X., Favre C., Falconi R., and Martinoli A., A Fast On-Board Relative Positioning Module for Multi-Robot Systems. Special issue on Mechatronics in Multi-Robot Systems, Chow M.-Y., Chiaverini S., Kitts C., editors, IEEE Trans. on Mechatronics, 14(2): 151-162, 2009. Murciano A. and Millán J. del R., "Specialization in Multi-Agent Systems Through Learning". Biological Cybernetics, 76: 375-382, 1997. Mataric, M. J. Learning in behavior-based multi-robot systems: Policies, models, and other agents. Special Issue on Multi-disciplinary studies of multiagent learning, Ron Sun, editor, Cognitive Systems Research, 2(1):81-93, 2001.