A Hybrid Systems Modeling Framework for Fast and Accurate Simulation of Data Communication Networks. Motivation

A Hybrid Systems Modeling Framework for Fast and Accurate Simulation of Data Communication Networks Stephan Bohacek João P. Hespanha Junsoo Lee Katia Obraczka University of Delaware University of Calif. Santa Barbara University of Southern California University of Calif. Santa Cruz Motivation Why model network traffic? to validate designs through simulation (scalability, performance) to analyze and design protocols (throughput, fairness, security, etc.) to tune network parameters (queue sizes, bandwidths, etc.)

Fluid-based modeling tracks time/ensemble-average packet across the network does not explicitly model individual events (acknowledgments, drops, queues becoming empty, etc.) time accuracy of a few seconds for time-average only suitable to model many similar flows for ensemble-average computationally very efficient (at least for first order statistics) Types of models Packet-level modeling tracks individual data packets, as they travel across the network ignores the data content of individual packets sub-millisecond time accuracy computationally very intensive Types of models captures fast dynamics even for a small number of flow provide information about both average, peak, and instantaneous resource utilization (queues, bandwidth, etc.) Hybrid modeling keeps track of packet for each flow averaged over small time scales explicitly models some discrete events (drops, queues becoming empty, etc.) time accuracy of a few milliseconds (round-trip time) computationally efficient

Talk outline Modeling 1 st pass: Dumbbell topology & simplified TCP Modeling 2 nd pass: General topology, TCP and UDP models Validation Simulation complexity Conclusions and future work 1st pass Dumbbell topology f 1 r 1 bps queue f 1 f 2 r 2 bps rate B bps f 2 f 3 r 3 bps q( t ) queue size f 3 Several flows follow the same path and compete for bandwidth in a single bottleneck link Prototypical network to study congestion control routing is trivial single queue

Queue dynamics f 1 r 1 bps queue f 1 f 2 r 2 bps rate B bps f 2 f 3 r 3 bps q( t ) queue size f 3 When f r f exceeds B the queue fills and data is lost (drops) drop (discrete event relevant for congestion control) Queue dynamics f 1 r 1 bps queue f 1 f 2 r 2 bps rate B bps f 2 f 3 r 3 bps Hybrid automaton representation: q( t ) queue size transition enabling condition f 3 exported discrete event

w f (window size) Window-based rate adjustment number of packets that can remain unacknowledged for by the destination e.g., w f = 3 source f destination f t 0 1 st packet sent 2 nd packet sent t 1 3 rd packet sent t 2 τ 0 1 st packet received & ack. sent τ 1 2 nd packet received & ack. sent 1 st ack. received t 3 τ 2 4 th packet can be sent 3 rd packet received & ack. sent w f effectively determines the sending rate r f : t t time in queue until transmission round-trip time propagation delay TCP congestion avoidance 1. While there are no drops, increase w f by 1 on each RTT (additive increase) 2. When a drop occurs, divide w f by 2 (multiplicative decrease) (congestion controller constantly probes the network for more bandwidth) TCP congestion avoidance additive increase multiplicative increase disclaimer: this is a very simplified version of TCP Reno, better models later

TCP congestion avoidance 1. While there are no drops, increase w f by 1 on each RTT (additive increase) 2. When a drop occurs, divide w f by 2 (multiplicative decrease) (congestion controller constantly probes the network for more bandwidth) Queuing model TCP congestion avoidance r f additive increase RTT drop multiplicative increase disclaimer: this is a very simplified version of TCP Reno, better models later 2nd pass general topology A communication network can be viewed as the interconnection of several blocks with specific dynamics server data network dynamics (queuing & routing) a) Routing: in-node rate out-node acks b) Queuing: client out-queue rate queue size in-queue rate congestion control c) End2end cong. control acks & drops server sending rate

Routing determines the sequence of links followed by each flow f n n' n Conservation of flows: end2end sending rate of flow f in-queue rate of flow f upstream out-queue rate of flow f indexes and determined by routing tables Routing determines the sequence of links followed by each flow Multicast Multi-path routing n' n 1 n' n 2 n''

Queue dynamics in-queue out-queue drop link bandwidth total queue size queue size due to flow f Queue dynamics: Queue dynamics in-queue out-queue no drops drop queue empty same in and outqueue queue not empty/full drops proportional to fraction inqueue queue full out-queue proportional to fraction of packets in the queue

Hybrid queue model -queue-not-full transition enabling condition discrete modes -queue-full exported discrete event see paper for model considering drop bursts Hybrid queue model -queue-not-full stochastic counter discrete modes -queue-full Random Early Drop active queuing see paper for model considering drop bursts

Network dynamic & Congestion control routing in-queue queue dynamics out-queue sending end2end congestion control drops TCP/UDP Additive Increase/Multiplicative Decrease 1. While there are no drops, increase w f by 1 on each RTT (additive increase) 2. When a drop occurs, divide w f by 2 (multiplicative decrease) (congestion controller constantly probe the network for more bandwidth) congestionavoidance imported discrete event propagation delays set of links transversed by flow f TCP-Reno is based on AIMD but uses other discrete modes to improve performance

Slow start 3. Until a drop occurs (or a threshold ssth f is reached), double w f on each RTT 4. When a drop occurs, divide w f and the threshold ssth f by 2 slow-start cong.-avoid. especially important for short-lived flows Fast recovery, timeouts, drop-detection delay 5. During retransmission, data is sent at a rate consistent with a window size of w f /2 6. When a drop is detected through timeout: a. the slow-start threshold ssth f is set equal to half the window size, b. the window size is reduced to one, c. the controller transitions to slow-start TCP SACK version

Network dynamic & Congestion control routing in-queue out-queue sending RTTs queue dynamics end2end congestion control drops see paper for on/off TCP & UDP model Validation methodology Compared simulation results from ns-2 packet-level simulator hybrid models implemented in Modelica Plots in the following slides refer to two test topologies dumbbell Y-topology 10ms propagation delay drop-tail queuing 5-500Mbps bottleneck throughput 0-10% UDP on/off background traffic 45,90,135,180ms propagation delays drop-tail queuing 5-500Mbps bottleneck throughput 0-10% UDP on/off background traffic

Simulation traces single TCP flow 5Mbps bottleneck throughput no background traffic cwnd and queue size (packets) 140 120 100 80 60 40 20 hybrid model cwnd of TCP 1 queue size 0 0 2 4 6 8 10 12 14 16 18 20 time (seconds) cwnd and queue size (packets) 140 120 100 80 60 40 20 ns-2 cwnd of TCP 1 queue size 0 0 2 4 6 8 10 12 14 16 18 20 time (seconds) slow-start, fast recovery, and congestion avoidance accurately captured Simulation traces four competing TCP flow (starting at different times) 5Mbps bottleneck throughput no background traffic cwnd and queue size (packets) 140 120 100 80 60 40 20 hybrid model cwnd size of TCP 1 cwnd size of TCP 2 cwnd size of TCP 3 cwnd size of TCP 4 Queue size of Q1 Queue size of Q2 0 0 2 4 6 8 10 12 1 time (seconds) 4 16 18 20 cwnd and queue size (packets) 140 120 100 80 60 40 20 ns-2 cwnd size of TCP 1 cwnd size of TCP 2 cwnd size of TCP 3 cwnd size of TCP 4 Queue size of Q1 Queue size of Q2 0 0 2 4 6 8 10 12 14 16 18 20 time (seconds) the hybrid model accurately captures flow synchronization

Simulation traces four competing TCP flow (different propagation delays) 5Mbps bottleneck throughput 10% UDP background traffic (exp. distributed on-off times) cwnd and queue size (packets) 140 120 100 80 hybrid model CWND size of TCP 1 (Prop=0.045ms) CWND size of TCP 2 (Prop=0.090ms) CWND size of TCP 3 (Prop=0.135ms) CWND size of TCP 4 (Prop=0.180ms) Queue size of Q1 Queue size of Q3 60 40 20 0 0 2 4 6 8 10 12 14 16 18 20 time (seconds) cwnd and queue size (packets) 140 120 100 80 ns-2 CWND size of TCP 1 (Prop=0.045ms) CWND size of TCP 2 (Prop=0.090ms) CWND size of TCP 3 (Prop=0.135ms) CWND size of TCP 4 (Prop=0.180ms) Queue size of Q1 Queue size of Q3 60 40 20 0 0 2 4 6 8 10 12 14 16 18 20 time (seconds) Average throughput and RTTs 45,90,135,180ms propagation delays drop-tail queuing 5Mbps bottleneck throughput 10% UDP on/off background traffic four competing TCP flow (different propagation delays) 5Mbps bottleneck throughput 10% UDP background traffic (exp. distributed on-off times) Thru. 1 Thru. 2 Thru. 3 Thru. 4 RTT1 RTT2 RTT3 RTT4 ns-2 1.873 1.184.836.673.0969.141.184.227 hybrid model 1.824 1.091.823.669.0879.132.180.223 relative error 2.6% 7.9% 1.5%.7% 9.3% 5.9% 3.6% 2.1% the hybrid model accurately captures TCP unfairness for different propagation delays

Empirical distributions probability 0.15 0.1 0.05 hybrid model CWND of TCP1 CWND of TCP2 CWND of TCP3 CWND of TCP4 Queue 3 0 0 10 20 30 40 50 60 70 cwnd & queue size probability 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 ns-2 CWND of TCP1 CWND of TCP2 CWND of TCP3 CWND of TCP4 Queue 3 0 0 10 20 30 40 50 60 70 cwnd & queue size L-1 difference dumbbell Y-shape cwnd1.71%.34% cwnd2.67%.44% cwnd3.71%.25% cwnd4.66%.33% bottleneck queue 1.1%.54% the hybrid model captures the whole distribution of congestion windows and queue size execution time for 10min of simulation time [sec] 10000 1000 100 10 1 1 flow 3 flows 5Mbps Execution time 50Mbps 500Mbps ns-2 hybrid model 1 10 100 1000 0.1 bottleneck bandwidth [Mbps] number of flows ns-2 complexity approximately scales with (# packets) hybrid simulator complexity approximately scales with per-flow throughput hybrid models are particularly suitable for large, highbandwidth simulations (satellite, fiber optics, backbone)

Conclusions Hybrid systems provide a promising approach to model network traffic: retain the low-dimensionality of continuous approximations to traffic flow are sufficiently expressive to represent event-based control mechanisms with high accuracy, even at small time-scales the complexity scales inversely with throughput and RTT, making them especially useful to simulate for high-bandwidth networks they are also amenable to formal analysis Current and future work: 1. Construct models for other forms of congestion control, queuing policies, drop models (e.g., wireless), etc. 2. Development of a freeware simulator 3. Further validation using more complex topologies 4. Extension of these techniques to other levels of abstraction to scale up to very large networks (e.g., abstract drops but keep discrete event such as changes in routing)