Reinforcement learning algorithms for non-stationary environments. Devika Subramanian Rice University

Similar documents
A New Approach to Routing With Dynamic Metrics

From Routing to Traffic Engineering

Lecture 4 Wide Area Networks - Routing

Chapter 12. Routing and Routing Protocols 12-1

William Stallings Data and Computer Communications 7 th Edition. Chapter 12 Routing

CSCE 463/612 Networks and Distributed Processing Spring 2018

Alternate Routing Diagram

New Approaches to Routing for Large-Scale Data Networks

William Stallings Data and Computer Communications. Chapter 10 Packet Switching

New Approaches to Routing for Large-Scale Data Networks

Routing in a network

DATA COMMUNICATOIN NETWORKING

ETSF05/ETSF10 Internet Protocols. Routing on the Internet

Routing Outline. EECS 122, Lecture 15

Routing in packet-switching networks

Network layer. Network Layer 4-1. application transport network data link physical. network data link physical. network data link physical

Fairness Example: high priority for nearby stations Optimality Efficiency overhead

Hybrid Control and Switched Systems. Lecture #17 Hybrid Systems Modeling of Communication Networks

BLM6196 COMPUTER NETWORKS AND COMMUNICATION PROTOCOLS

CSC 401 Data and Computer Communications Networks

A Hybrid Systems Modeling Framework for Fast and Accurate Simulation of Data Communication Networks. Motivation

CSC 4900 Computer Networks: Routing Algorithms

Lecture 13: Routing in multihop wireless networks. Mythili Vutukuru CS 653 Spring 2014 March 3, Monday

EE 122: Intra-domain routing

Data and Computer Communications. Chapter 12 Routing in Switched Networks

Routing Algorithms. CS158a Chris Pollett Apr 4, 2007.

CSE 461 Routing. Routing. Focus: Distance-vector and link-state Shortest path routing Key properties of schemes

CSCD 330 Network Programming Spring 2018

CSCD 330 Network Programming Spring 2017

Computer Networks. Wenzhong Li. Nanjing University

Routing Algorithms : Fundamentals of Computer Networks Bill Nace

Lecture 13: Traffic Engineering

CS 457 Networking and the Internet. What is Routing. Forwarding versus Routing 9/27/16. Fall 2016 Indrajit Ray. A famous quotation from RFC 791

Introduction. Routing & Addressing: Multihoming 10/25/04. The two SIGCOMM papers. A Comparison of Overlay Routing and Multihoming Route Control

Link State Routing & Inter-Domain Routing

Intra-domain Routing

Congestion Avoidance

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling

ECE 435 Network Engineering Lecture 11

Routing, Routing Algorithms & Protocols

Announcement. Project 2 extended to 2/20 midnight Project 3 available this weekend Homework 3 available today, will put it online

C13b: Routing Problem and Algorithms

Routing in Switched Networks

Course Routing Classification Properties Routing Protocols 1/39

Congestion Control for High Bandwidth-Delay Product Networks

Chapter 22 Network Layer: Delivery, Forwarding, and Routing 22.1

cs/ee 143 Fall

Routing, Routers, Switching Fabrics

CHAPTER 3 EFFECTIVE ADMISSION CONTROL MECHANISM IN WIRELESS MESH NETWORKS

Lecture 19: Network Layer Routing in the Internet

Missing Pieces of the Puzzle

Distance-Vector Routing: Distributed B-F (cont.)

Efficient solutions for the monitoring of the Internet

Reinforcement Learning: A brief introduction. Mihaela van der Schaar

Communication Networks

5.1 introduction 5.5 The SDN control 5.2 routing protocols plane. Control Message 5.3 intra-as routing in Protocol the Internet

Last time. Transitioning to IPv6. Routing. Tunneling. Gateways. Graph abstraction. Link-state routing. Distance-vector routing. Dijkstra's Algorithm

CE693: Adv. Computer Networking

ETSF05/ETSF10 Internet Protocols Routing on the Internet

Deep Reinforcement Learning

Module 8. Routing. Version 2 ECE, IIT Kharagpur

Review: Routing in Packet Networks Shortest Path Algorithms: Dijkstra s & Bellman-Ford. Routing: Issues

Outline. Internet. Router. Network Model. Internet Protocol (IP) Design Principles

Routing. 4. Mar INF-3190: Switching and Routing

Network layer: Overview. Network layer functions Routing IP Forwarding

5.2 Routing Algorithms

ECSE-6600: Internet Protocols Spring 2007, Exam 1 SOLUTIONS

Routing in Ad-hoc Networks

Routing Metric. ARPANET Routing Algorithms. Problem with D-SPF. Advanced Computer Networks

An Adaptive Policy Management Approach to Resolving BGP Policy Conflicts

! Network bandwidth shared by all users! Given routing, how to allocate bandwidth. " efficiency " fairness " stability. !

Lecture 15: Measurement Studies on Internet Routing

Chapter 4: Network Layer, partb

Routing Strategies. Fixed Routing. Fixed Flooding Random Adaptive

Important Lessons From Last Lecture Computer Networking. Outline. Routing Review. Routing hierarchy. Internet structure. External BGP (E-BGP)

Network Layer: Routing

Computer Networks. Routing Algorithms

Congestion Management in Lossless Interconnects: Challenges and Benefits

Congestion Control for High Bandwidth-delay Product Networks. Dina Katabi, Mark Handley, Charlie Rohrs

Redes de Computadores. Shortest Paths in Networks

COMP3331/9331 XXXX Computer Networks and Applications Final Examination (SAMPLE SOLUTIONS)

ECE 435 Network Engineering Lecture 11

Dynamic Routing. The Protocols

CS 5114 Network Programming Languages Control Plane. Nate Foster Cornell University Spring 2013

EECS 122, Lecture 16. Link Costs and Metrics. Traffic-Sensitive Metrics. Traffic-Sensitive Metrics. Static Cost Metrics.

15-744: Computer Networking. Overview. Queuing Disciplines. TCP & Routers. L-6 TCP & Routers

Routing in Delay Tolerant Networks (2)

CSEP 561 Routing. David Wetherall

ELEC / COMP 177 Fall Some slides from Kurose and Ross, Computer Networking, 5 th Edition

ECEN Final Exam Fall Instructor: Srinivas Shakkottai

Lecture 16: Interdomain Routing. CSE 123: Computer Networks Stefan Savage

Announcements. CS 5565 Network Architecture and Protocols. Project 2B. Project 2B. Project 2B: Under the hood. Routing Algorithms

ROUTING ALGORITHMS Part 1: Data centric and hierarchical protocols

Routing Algorithms. 1 Administrivia. Tom Kelliher, CS 325. Apr. 25, Announcements. Assignment. Read 4.6. From Last Time. IP protocol.

CS 43: Computer Networks. 23: Routing Algorithms November 14, 2018

CS 268: Computer Networking

Chapter 4: Network Layer

Chapter 4 Network Layer. Network Layer 4-1

CMPE 150/L : Introduction to Computer Networks. Chen Qian Computer Engineering UCSC Baskin Engineering Lecture 13

The fundamentals of Ethernet!

Transcription:

Reinforcement learning algorithms for non-stationary environments Devika Subramanian Rice University Joint work with Peter Druschel and Johnny Chen of Rice University. Supported by a grant from Southwestern Bell. Outline A short introduction to reinforcement learning Modeling routing as a distributed reinforcement learning problem The evolution of algorithms for routing in dynamic networks Conclusion 1

Reinforcement learning learning optimal policies for sequential decision making tasks by repeated interaction with the environment. E Action a Reward r L State s Example: finding shortest paths 1 B 2 A 1 D 2 C 4 Reward: 1 hop cost Action: send packet to node e.g., send packet to B L Packet location Routing action choices e.g., packet at node A, route to B or C? 2

Example: continued Goal: find shortest paths to node D. Interaction with environment: World: packet at node x, choices for routing are nodes y,z... Learner: send packet to y. World: that cost you distance(x,y). World: packet at node y, choices for routing are nodes Learner: send packet to Example continued Reinforcement learner samples paths in the network (A,B,1),(B,D,1) (A,C,2),(C,D,4) (A,C,2),(C,A,2),(A,B,1),(B,D,1).. Reinforcement learner acquires a policy that maps each node to the neighbor it must route a packet to, in order to minimize distance to node D. 3

Reinforcement learning: basics Associate with each node s, a value function V D which is the cost of the shortest path from s to goal node D. Update V D online as follows, upon observing (s,s,r) V ( s) = (1 α ) V ( s) + α ( r V ( s')) D D + Learning rate D Current estimate of distance from s to D. New estimate of distance from s to D Why does this equation work? It is an online approximation of Bellman s dynamic programming [ r( s, s') V ( s') ] V ( s) = min + D s' Nbs( s) D Bellman s equation updates V(s) according to best neighbor of s, while reinforcement learning updates V(s) according to observed neighbor of s in the sampled trajectory. In the limit, they both converge to the same V. 4

The reinforcement learning algorithm Initialize V(s) = 0, for all s in S. Repeat until V converges s = s 0 while (s not a terminal state) do Pick best action a in s, consistent with current V estimates. (can be stochastic) Do a, and transition to state s with reward r. Update V(s) using (s,s,r) end while Policies and value functions Once the algorithm converges on the optimal value function V D*, each node can calculate its best policy as follows: * π D ( s) = arg min [ r( s, s') + V s' Nbs( s ) * D ( s')] this is the best neighbor of s to forward a packet headed for node D. 5

Condition for convergence of RL RL learns the value function V by repeated biased random sampling of trajectories. Requires that the environment be stationary for convergence: Network topology does not change with time. Network edge costs do not change with time. The Internet routing problem A s forwarding table Dst Next-hop B B C C D B A 1 2 Dst Next-hop A A C C D D B 1 C Dst Next-hop A A C B D B B s forwarding table 2 D s forwarding table Dst Next-hop B B D C B A B 4 C s forwarding table 6

Challenge System state is distributed, obtaining global state requires communication. System is non-stationary: topology changes with time (link/router failure or recovery), link costs change with time (congestion). Modeling routing as a distributed reinforcement learning problem System A reinforcement learner at each node of the network Learners 7

Distributing the value function For each destination d, a node maintains V d : the cost of the cheapest path to that d. π d is the best neighbor to forward packet to for d. Question: how do we update these local projections V d for each node s? [ r( s, s' ) V ( s' )] V ( s) = min + d s' Nbs( s) d A space of RL algorithms for routing Information propagation strategy for V estimation Indirect path sampling: get neighbors to supply r(s,s ) and V d (s ). Direct path sampling: s sends probes to d and gets path cost from s to d directly. Action choice deterministic or stochastic routing. Policy computation Indirectly computed from value function. Direct policy update (no value function maintained). 8

Direct & indirect path sampling Network x n d INDIRECT PATH SAMPLING V(x,d)= min n [r(x,n) + V(n,d)], where n ranges over neighbors of x. Neighbors exchange value functions. (classical RL) DIRECT PATH SAMPLING Probes travel end-to-end from x to d informing routers in between of cost to x. Stochastic sampling & flooding for direct path sampling Network x n d STOCHASTIC PATH SAMPLING When a probe reaches node n, it is probabilistically routed to ONE of the neighbors of n. FLOODING When a probe reaches node n, it is forwarded to all neighbors of n except for the one that it was received from. 9

Traditional RL solution: distance vector algorithm Information propagation: indirect path sampling Action choice: deterministic routing Policy computation: indirect via value function update V(x,d) = min n [r(x,n) + V(n,d)], where n ranges over neighbors of x. Routing with ants (IJCAI) Information propagation: direct sampling stochastic Action choice: stochastic routing policy at every router Probabilistic forwarding table entry at x: (s, (y 1,p 1 ),..., (y n,p n )) Policy computation: direct policy update routing probabilities are modified using path costs announced by ants. 10

Routing with ants: the details (1) Each node s periodically sends an ant [s,d,c]. When node x receives ant [s,d,c] from neighbor y i, x adds to C the cost of link (x,y i ). x updates its probabilistic forwarding table entry for node s: pi + p p = i 1+ p p j p j =, for 1 j n, i j 1+ p k p =, k > 0 C Routing with ants: the details (2) Regular ants x forwards ant [s,d,c ] to a neighbor according to its data packet forwarding policy for node d (probabilistic policy). Ant [s,d,c ] is destroyed when it reaches node d. Uniform ants x forwards ant [s,d,c ] to one of its neighbors with equal probability. Ant [s,d,c ] destroyed when C exceeds a threshold. 11

Regular ants (theoretical analysis) p( t + 1) = p( t) + p 1 + p p( t ) = 1 + p q( t ) + p q( t + 1) = 1+ p q( t ) = 1 + p b a with prob q( t ) with prob 1 q( t ) b a b a with prob p( t ) with prob 1 p( t ) p a =k/c a p b =k/c b If C a < C b then p(t) 1 and q(t) 1 as t infinity Properties of regular ants Bad news travels fast. Good news travels slow. 12

Uniform ants (theoretical analysis) p( t + 1) = p( t) + p 1 + p p( t ) = 1 + p q( t ) + p q( t + 1) = 1+ p q( t) = 1 + p b a with prob 0.5 with prob 0.5 b a b a with prob 0.5 with prob 0.5 Traffic is divided in inverse proportion to the link costs. Multi-path routing!! Routing performance study We test routing traffic scalability by Fixing the convergence times for the algorithms: uniform and regular ants, and two state-of-the-art routing algorithms: Distance Vector and Link State. Creating a parametric Internet like network topology: tree with cross edges. Measuring routing traffic in bytes/sec/link in network as size of network increases. 13

Path convergence times Routing traffic with Internetlike topology 14

Routing traffic with topology changes Resilience to state corruption Both regular and uniform ants perform exceedingly well in the face of state corruption in routers. 15

Lessons learned Direct policy updates slow in responding to topology changes compared to state-ofthe-art solution (distance vector). Stochastic path sampling (biased random or uniform random) requires an order of magnitude more routing resources than conventional solution. Stochastic path sampling offers a degree of resilience unmatched by conventional solution. The Scout routing algorithm Information propagation: direct flooding Action choice: deterministic routing Policy computation: indirect via value function update V(x,d) : cost of path from x to d revised by ants. Ants with all the stochastic stuff thrown away! 16

Scout routing algorithm (the details) A node x sends scout [x,0] to each of its neighbors. When node y receives scout [x,c] from neighbor z, it updates C to C by adding cost of link (y,z) to C. it compares its current estimate of path cost to x to C ; if C < x then it updates its next-hop to x to be z. it forwards [x,c ] to every neighbor other than z. Scout performance on Internet-like topology Note Scout s fast convergence! Scout convergence times can be adjusted by varying the sampling rate. 17

Scout routing costs Both in terms of messages and bytes, Scout requires 1-2 orders of magnitude more resources for the same routing performance. Lessons learned Direct path sampling is very expensive because route computations are not aggregated. 18

Adapting to network congestion: traditional solution Link costs depend on traffic conditions. Path recomputation triggered when the change in a link s cost exceeds a specified threshold. All paths affected by cost change are recalculated. Recalculations occur quasi-simultaneously. First implemented in early 1980s on the Arpanet; providing 50% improvements in round-trip delay and 18% improvement in throughput. Problems with traditional solution All-pairs and quasi-simultaneous path recomputations cause routing instabilities (e.g., oscillations) Routing traffic is data traffic dependent (need thresholds and hold-downs to control routing traffic overhead). 19

Lessons learned from Scout algorithm Direct path sampling allows path recomputations to be uncorrelated across time, so that all nodes in the network do not simultaneously switch paths in response to congestion (route oscillation). Therefore, direct path sampling is excellent for adapting paths to network congestion. Harnessing the power of direct path sampling Sample paths to a limited number of hot destinations. 20

Network traffic locality Analyzed traffic traces from DEC and LBL. Top 10% of destinations receive over 90% of the traffic. Top 1% of destinations receive over 60% of the traffic. Hybrid Scout (Infocomm) Direct path sampling with flooding (Scout) for hot destinations alone. Indirect path sampling (Distance vector) for all other nodes. Deterministic routing and indirect policy computation via value function update in both cases. 21

Routing costs vs performance Routing cost vs performance 22

Lessons learned Indirect path sampling performs poorly in non-stationary environments. Route oscillation problems. inability to solve selective portions of the all-pairs-shortest-path problem. Direct path sampling overcomes stability problems in non-stationary environments, but performance requirements restrict its use to important nodes. Reinforcement learning & routing Indirect path sampling Direct path sampling det stochastic det stochastic Policy Vf policy vf policy policy vf vf Distance vector scouts ants 23

Conclusions Reinforcement learning algorithms for nonstationary environments can be built and they can be very effective. Blending direct and indirect path sampling is the key to handling non-stationarity. Direct path sampling decouples path re-computations in changing network providing stability and convergence. Indirect path sampling s efficiency hard to beat under stationary conditions! Stochastic action choice cannot compete with deterministic action choice in Internet routing. Direct policy update is inefficient compared to value function updates. Open questions How do we formulate convergence criteria for non-stationary environments? How do we prove convergence of the direct sampling reinforcement learning methods? 24