Network Performance: Queuing EE 122: Intro to Communication Networks Fall 2007 (WF 4-5:30 in Cory 277) Vern Paxson TAs: Lisa Fowler, Daniel Killebrew & Jorge Ortiz http://inst.eecs.berkeley.edu/~ee122/ Materials with thanks to Jennifer Rexford, Ion Stoica, and colleagues at Princeton and UC Berkeley 1 Announcements Next Wednesday s lecture (wireless) will be given by Jorge I will be away next Wednesday and won t have office hours that day But will have my usual Friday office hour Reminder: Phase 1 of Project #2 due Tuesday @ 11PM 2 1
Goals of Today s Lecture Finish discussion of TCP performance: Window Scaling TCP Throughput Equation Computes approximate long-running TCP performance for a given packet loss probability p Relationship between performance and queuing Router architecture FIFO queuing Active Queue Management - RED Explicit Congestion Notification - ECN Modeling of queuing systems & Little s Law 3 Going Fast Q: on a path with RTT = 100 msec, what s the absolute fastest rate that TCP can achieve? Q: what s the absolute largest sliding window that TCP can use? A: advertised window is 16 bits 65,535 bytes Thus: max speed = 65,535 bytes / 100 msec = 655 KB/s Q: how can we fix this problem? A: we need a larger window Q: how do we make the window larger? A: using a TCP option 4 2
Window Scaling Option (RFC 1323) Source port Destination port HdrLen specifies 4 bytes of options. Kind=3 indicates Window Scaling. Kind=0 indicates end of options. Len=6 0 Checksum Sequence number Acknowledgment Flags Data Advertised window Urgent pointer +---------+---------+---------+---------+ Kind=3 Length=3 shift.cnt Kind=0 +---------+---------+---------+---------+ 5 Window Scaling, con t Sent in initial SYN If server s SYN-ACK also includes a Window Scaling option, then scaling is in effect The server including the option confirms its use shift.cnt specifies scaling factor for units used in window advertisement E.g., shift.cnt = 5 advertised window is 2 5 = 32-byte units 6 3
Window Scaling, con t Q: Now how large can the window be? A: Clearly, must not exceed 2 32 If it does, then can t disambiguate data in flight So, scaling 16 In fact, somewhat subtle requirements limit window to 2 30 to allow receiver to determine whether data fits in the offered window So, scaling 14 7 Window Scaling, con t Now we can go fast. Suppose high-speed LAN, RTT = 1 msec. How fast can we transmit? 1 GB/msec = 1 TB/sec. What problem arises if packets are occasionally delayed in the network for 10 msec? Sequence number wrap: can t tell earlier, delayed segments from later instances. Fix: another TCP option to associate (high-res) timestamps with TCP segments Essentially, adds more bits to sequence space (Side effect: no more need for Karn/Partridge restriction not to compute RTT for ACKs of retransmitted packets) 8 4
Relationship of Performance & Loss For packets of B bytes and packet loss rate p, throughput is: 1.5B T = RTT p Implications: Long-term throughput falls as 1/RTT Long-term throughput falls as 1/sqrt(p) Non-TCP transport can use equation to provide TCP-friendly congestion control 9 Where Does Loss (= p) Come From, Anyway? Routers & Queuing 10 5
Generic Router Architecture Input and output interfaces are connected through an interconnect Interconnect can be implemented by Shared memory o Low capacity routers (e.g., PC-based routers) Shared bus o Medium capacity routers Point-to-point (switched) bus o High capacity routers o Packets fragmented into cells o Essentially a network inside the router! input interface Interconnect output interface 11 Shared Memory (1 st Generation) Shared Backplane CPU Route Table Buffer Memory CPU Memory Line Interface Line Interface MAC Line Interface MAC Line Interface MAC Typically < 0.5Gbps aggregate capacity Limited by rate of shared memory (* Slide by Nick McKeown) 12 6
Shared Bus (2 nd Generation) CPU Route Table Buffer Memory Line Card Line Card Line Card Buffer Memory Fwding Cache MAC Buffer Memory Fwding Cache MAC Buffer Memory Fwding Cache MAC Typically < 5Gb/s aggregate capacity; Limited by shared bus (* Slide by Nick McKeown) 13 Point-to-Point Switch (3 rd Generation) Switched Backplane Line Card CPU Card Line Card Line Interface CPU Local Buffer Memory Routing Table Local Buffer Memory Memory Fwding Table Fwding Table MAC MAC Typically ~ 100Gbps aggregate capacity (*Slide by Nick McKeown) 14 7
What a Router Looks Like Cisco GSR 12416 19 Juniper M160 19 6ft Capacity: 160Gb/s Power: 4.2kW Lines of Code: 8M (!) (circa year 2000) 3ft Capacity: 80Gb/s Power: 2.6kW 2ft 2.5ft Slide courtesy Nick McKeown 15 Input Interface Packet forwarding: decide to which output interface to forward each packet based on the information in packet header Examine packet header Lookup in forwarding table Update packet header Question: do we send the packet to the output interface immediately? input interface output interface Interconnect 16 8
Output Functions Buffer management: decide when and which packet to drop Scheduler: decide when and which packet to transmit Buffer Scheduler 1 2 17 Output Queued Routers Only output interfaces store packets Advantages Easy to design algorithms: only one congestion point Disadvantages Requires an output speedup R o = C N, where N is the number of interfaces not feasible input interface output interface Backplane R o C 18 9
Input Queued Routers Input interfaces store packets Easier to build since only need R C Though need to implement back pressure to know when to send But harder to build efficiently due to contention and head-of-line blocking input interface output interface Backplane C R 19 Head-of-line Blocking Cell at head of an input queue cannot be transferred, thus blocking the following cells Cannot be transferred because is blocked by orange cell Input 1 Output 1 Input 2 Input 3 Cannot be transferred because output buffer overflow Output 2 Output 3 Modern high-speed routers use combination of input & output queuing, with flow control & multiple virtual queues 20 10
5 Minute Break Questions Before We Proceed? 21 Simple Queuing - FIFO and Drop Tail Most of today s routers Transmission via FIFO scheduling First-in first-out queue Packets transmitted in the order they arrive Buffer management: drop-tail If the queue is full, drop the incoming packet 22 11
Refinements to FIFO Random Early Detection (RED) Explicit Congestion Notification (ECN) 23 Bursty Loss From Drop-Tail Queuing TCP depends on packet loss Packet loss is the indication of congestion In fact, TCP drives the network into packet loss by continuing to increase the sending rate Drop-tail queuing leads to bursty loss When a link becomes congested many arriving packets encounter a full queue And, as a result, many flows perceive congestion and in fact tend to become synchronized by nearsimultaneous loss 24 12
Slow Feedback from Drop Tail Feedback comes when buffer is completely full even though the buffer has been filling for a while Plus, the filling buffer is increasing RTT and the variance in the RTT Idea: give early feedback Get one or two flows to slow down, not all of them Get these flows to slow down before it is too late Spread out congestion detection to break synchronization 25 Random Early Detection (RED) Basic idea of RED Router notices that the queue is getting backlogged and randomly drops arriving packets to signal congestion Packet drop probability Drop probability increases with average queue length If buffer is below some level, don t drop anything otherwise, set drop probability as function of length Probability Average Queue Length 26 13
Properties of RED Drops packets before queue is full In the hope of reducing the rates of some flows Drops packets in proportion to each flow s rate High-rate flows have more packets and, hence, a higher chance of being selected Drops are spaced out in time Helps desynchronize TCP senders Tolerant of burstiness in the traffic By basing the decisions on average queue length 27 RED In Practice Hard to get the tunable parameters just right How early to start dropping packets? What slope for the increase in drop probability? What time scale for averaging the queue length? RED is implemented in practice E.g., modern routers have config option to turn it on But not clear it s used (due to tuning challenges) Many variations proposed ( Blue, FRED ) More generally, think of RED as example of how network can give more refined feedback to end systems regarding current available capacity An example of Active Queue Management (AQM) 28 14
Explicit Congestion Notification Early dropping of packets Good: gives early/refined feedback Bad: costs a packet drop to give the feedback Explicit Congestion Notification (ECN) Router instead marks the packet with an ECN bit which end system interprets as a sign of congestion Surmounting the challenges Must be supported by both end hosts as well as routers Requires two bits in the IP header (one for ECN mark, one to indicate sender understands ECN) o Solution: taken from Type-Of-Service bits in IPv4 packet header Also requires 2 TCP header bits (to echo back to sender) 29 Queuing Theory 30 15
Queuing Example P bits Q bits Packet arrival P = 1 Kbit; R = 1 Mbps P/R = 1 ms 0 0.5 1 7 7.5 Time (ms) Q(t) 2 Kb 1.5 Kb 1 Kb 0.5 Kb Delay for packet that arrives at time t, d(t) = Q(t)/R + P/R Time packet 1, d(0) = 1ms packet 2, d(0.5) = 1.5ms packet 3, d(1) = 2ms 31 Queuing Theory Enormous literature exists on mathematical modeling of queues Abstractions: Arrivals ( customers ) comes to the system according to some probability distribution F of interarrival times o Arrival rate is designated λ (e.g., packets/sec) System has a set of servers ( 1) Each arrival has a service time taken from another probability distribution, G Questions: Given F, G, λ how much delay do customers see? Unfortunately, most solutions require F with only weak correlations - far from the case for networking 32 16
Generalized Delay Diagram 1 2 Latest bit seen by time t Sender Receiver at point 1 at point 2 Delay (in network/system/router) time 33 Little s Theorem Assume a system where packets arrive at rate λ Let d be mean delay of packet, i.e., mean time a packet spends in the system Q: What is N, mean # of packets in the system? = average occupancy E.g., for a router N would give the size of the queue d = mean delay λ = mean arrival rate system A: N = λ x d 34 17
Little s Theorem: Proof Sketch Latest bit seen by time t d(i) = delay of packet i x(t) = number of bits in transit (in the system) at time t Sender 1 2 d(i) Receiver x(t) time What is the system occupancy, i.e., average number of bits in transit between 1 and 2? T 35 Little s Theorem: Proof Sketch Latest bit seen by time t d(i) = delay of packet i x(t) = number of bits in transit (in the system) at time t Sender 1 2 d(i) Receiver S= area x(t) T time Average occupancy = S/T 36 18
Little s Theorem: Proof Sketch Latest bit seen by time t d(i) = delay of packet i x(t) = number of bits in transit (in the system) at time t 1 2 Sender S(N) P S(N-1) d(n-1) Receiver S= area x(t) T time S = S(1) + S(2) + + S(N) = P (d(1) + d(2) + + d(n)) 37 Little s Theorem: Proof Sketch Latest bit seen by time t d(i) = delay of packet i x(t) = number of bits in transit (in the system) at time t 1 2 Sender S(N) P S(N-1) d(n-1) Receiver S= area x(t) time Average occupancy T S/T = (P (d(1) + d(2) + + d(n)))/t = ((P N)/T) ((d(1) + d(2) + + d(n))/n) Average arrival rate Average delay 38 19
Little s Theorem: Proof Sketch Latest bit seen by time t d(i) = delay of packet i x(t) = number of bits in transit (in the system) at time t 1 2 Sender S(N) P S(N-1) d(n-1) Receiver S= area x(t) T time Average occupancy = (average arrival rate) x (average delay) 39 Summary TCP Throughput Equation - if connection not windowlimited, then performance scales As 1/RTT As 1/sqrt(p) for loss probability p Router architecture: fabric interconnects input interfaces with output interfaces Implications of input-queued vs. output-queued Queuing in practice - simple FIFO queuing predominates o RED as exemplar of refined Active Queue Management o And ECN as a way to avoid using drops for feedback Queuing in theory Little s Law relates arrival rate, service time, # in system Next lecture: Quality of Service (QOS) 40 20