Smart Routing with Learning- based QoS-aware Meta-strategies Ying Zhang Markus Fromherz Lukas Kuhn (Ludwig Maximilian University) October 2004
Outline Message-initiated Constraint-based Routing Learning-based Meta-routing Performance Evaluations 2
Message-initiated Constraint-Based Routing Routing in Traditional Networks Address-based Table-driven Statically configured Off-line optimization MCBR: Destination Constraints Local Route Constraints Global Route Objectives Routing in Wireless Ad-hoc Sensor Networks Attribute-based Message-initiated Dynamically configured On-line optimization Properties: Embedded Embedded destination Dynamic Dynamic (pursuer location) Dense Dense Asymmetric Asymmetric prefer (high on energy) avoid (evader reading) 3 avoid Attributes: (evader reading) Constants: Constants: Unit Unit Cost Cost Address Address Group Group ID ID Variables: Variables: Location Location Time Time Sensors Sensors source Hops Hops route Energy Energy (min. length)
Learning-based Meta-Routing Node Packet Cost Spec. Data Learning Meta-strategy Routing packet Cost Spec. Data Specification: Specification: Destination Destination constraints constraints Local Local route route constraints constraints Global Global route route objectives objectives MCBR MCBR strategies: strategies: Real-time Real-time Search Search Constrained Constrained Flooding Flooding Adaptive Adaptive Tree Tree Ant-routing Ant-routing 4 Learning: Learning: Q-learning Q-learning Ant Ant learning learning Cost: Cost: projection projection of of global global objective objective to to local local node node
Q-learning-based Routing Routing core: reinforcement learning QValue:» estimated cost from the current node to the destination (any cost function) NQValue: estimate for each neighbor s QValue» or recorded value whenever a message is received from that neighbor learning process:» QValue = (-α)*qvalue () + α*(min NQValues + lcost)» α: learning rate» lcost: attribute-based local cost function every message includes the node s QValue Q 2 NQ 2 Q Q NQ NQ 3 Q 3 5
Q-learning Meta-strategies Routing meta-strategies: search-based flood-based tree-based Routing phases Structured Source-Destination Path Spanning Tree Adaptive Spanning Tree Reinforcement Learning Connectionless Real-time Search Flooding Constraint-based Search Constrained Flooding Initialization: establishing structures Routing: passing packets Learning: updating QValue and NQValues Initialization Routing Learning 6
Search-based Meta-strategy Initialization (required): establish neighborhood structures by sending out Hello Routing: pass received designated packet to neighbor with best QValue Learning: update NQValue i for all received or overheard packets from i set NQValue i to max(nqvalues)+ if didn t hear neighbor i deliver the designated packet calculate own QValue according to () 7
Search Algorithm Initialization Learning Routing st message Routing 2 nd message 2 2 0 2 2 3 2 2 2 2 3 3 2 02 0 Message received lost message 8
Flood-based Meta-strategy Initialization (not required): establish initial QValues propagate from destination Routing: = data.qvalue QValue (for received data) if + T > 0 ( temperature T) transmit after f( ) (f decreasing with increasing ) else ignore Learning: update NQValue i for all received or overheard packets from i calculate own QValue according to () 9
Flood Algorithm Learning Initialization Routing st message 24 0 30 Msgwith sid= is already sent the node will not send again 20 Msgwith sid= is already sent the node will not send again 0 smallest QValue shortest delay send first 0 0 0 Msgwith sid= is already received by the destination the node will not send again message received 0
Tree-based Meta-strategy Initialization (required): establish a spanning tree from destination by flooding from destination select node i with minimum NQValue i as parent Routing: pass received designated packet on to parent Learning: update NQValue i for all received or overheard packets from i set NQValue i to max(nqvalues)+ if didn t hear neighbor i deliver the designated packet calculate own QValue according to () reselect node i with min NQValue i as parent
Tree Algorithm Learning Initialization 02 0234 0 0 Message received Routing st message 02 02 03 02 0 2
MCBR Take-away away Messages Flexible framework: generic encoding of application requirements and metrics multiple learning methods possible multiple meta-strategies, re-use of representation & learning core Adaptive routing: learning of best routes constantly improving adapting to dynamic network and application Low overhead: learning while routing implicit information exchange no extra maintenance needed 3
Performance Metrics Latency (s): T received T sent Throughput (p/s): R/T Loss Rate: L/(L+R) Success Rate: ΣR/ΣS Energy Use: ΣU Energy Efficiency: ΣR/ΣU Lifetime Predication: E max (U+σ) U ΣU/N σ 2 Σ(U U) 2 /N T: time R: received packets L: lost packets S: original packets U: used energy N: total nodes E: energy 4
Application Scenario: Pursuer/Evader Game (PEG) Source: Dynamic (0.2d/s) Rate: p/s Destination: Mobile (0.2d/s) Simulation time: 5s Total runs: 0 destination (pursuer location) E PEG Application avoid (evader reading) P prefer (high on energy) avoid (evader reading) 5 route (min. length) source
Simulation Model Assumptions P P rec,ideal rec (i,j) α : N( 0,σ β : N( 0,σ i j (d) α β P P ),σ P rec,ideal ),σ rec α transmit β 0. 45 0. 02 (i,j) > γ + d (d )( + α(i,j))( + i,j β(t)) 6
0.8 0.9 0.6 0.4 real time search constrained flooding adaptive tree aodv 0.8 0.7 Latency (seconds) 0.2 0. 0.08 0.06 Success rate 0.6 0.5 0.4 0.3 real time search constrained flooding adaptive tree aodv 0.04 0.2 0.02 0. 450 0 0 5 0 5 Simulation time (seconds) 0 0 5 0 5 Simulation time (seconds) Performance Comparison on PEG scenario 0.06 400 0.05 Energy consumption 350 300 250 200 real time search constrained flooding adaptive tree aodv Energy efficiency 0.04 0.03 0.02 50 00 0 5 0 5 Simulation time (seconds) 0.0 0 0 5 0 5 7 real time search constrained flooding adaptive tree aodv Simulation time (seconds)
Conclusions Routing Spec. Geographical Energy-aware Congestion-aware Meta-strategies Q-Learning Real-time search Constrained flooding Adaptive tree Separate Specification From Routing Routing while Learning Performance Tradeoffs 8