Congestion Control & Resource Allocation Issues in Resource Allocation Queuing Discipline TCP Congestion Control Reacting to Congestion Avoiding Congestion QoS Issues 1 Issues in Resource Allocation RA & Congestion Control are two sides of the same coin pre-allocate resources so as to avoid congestion, precise allocation is difficult control congestion if (and when) is occurred Bottleneck router RA: by which NEs try to meet the competing demands the appl. have for resource Link bandwidth & buffer space Two points of RA implementation (end user using signaling protocol to convey) hosts at the edges of the network (transport protocol) routers inside the network (queuing discipline) Underlying service model of our discussion best-effort (assume for now) multiple qualities of services (later) 2 Framework Our focus: IP connectionless datagram + TCP end-to-end connection abstraction, not X.25 (resource reservation at each router) Connectionless flows (host-to host, or process-to-process) sequence of packets sent between source/destination pair following the same path maintain soft state at the routers to make RA (queuing discipline) decision about the flow Soft vs. hard state: soft state need not always be explicitly created or removed by signaling; It is the middle ground between purely connectionless & purely connection-oriented network ; the correct operation of the network does not depend on soft state being present. 3 It can be implicitly defined (router inspects the address in the header, and treat the packets as belonging to the same flow for the purpose of congestion control) or explicitly established (source sends a flow setup message), little diff from connection-oriented network, it exists for the purpose of resource allocation. Taxonomy for resource allocation mechanisms classification (reading assignment: p462-464) router-centric vs. host-centric (non exclusive) reservation-based vs. feedback-based: reservation-based implies a router-centric resource allocation mechanism; feedback-based can be either router- or host-centric window-based (e.g. TCP) vs. rate-based (an open area of research): for both flow control & congestion control Note: rate-based could be used to support video; it is a logical choice in a reservation based system that supports different QoS. More on Soft State 4
Evaluation Criteria Effectiveness: Throughput: allow as many packets into the network as possible, increase network resource utilization. However, longer queue -> larger delay Power (ratio of throughput to delay) assumption1: M/M/1 queue; assumption2: for a single connection, not clear how to extend it to multi-flow (n-dimensional?) Better metric? Fairness: fair equal (quantification is difficult) Ideal Performance Ideal goal of network utilization: infinite buffer throughput = full capacity traffic load = full capacity Normalized by the maximum theoretical throughput Power = thrupt/delay 5 2007 6 Queuing Discipline The queuing policy can be thought of as allocating both bandwidth (which packets get transmitted) and buffer space (which packets get discarded) First-In-First-Out (FIFO) : scheduling discipline + drop policy does not discriminate between traffic sources variation: priority queue via IP TOS, routers maintain multiple FIFO queues, one for each priority -> pushback by economic; Fair Queuing (FQ) explicitly segregates traffic based on flows ensures no flow captures more than its share of capacity To be used in conjunction with an end-to-end congestion control mechanism variation: weighted fair queuing (WFQ) (e.g. DiffServ), Problem? packet-by-packet round-robin vs. bit-by-bit round-robin (not feasible to implement, why??) 7 FQ Algorithm (packet-by-packet): approximating bit-by-bit round-robin For Single Flow Suppose clock ticks each time a bit is transmitted Let P i denote the length of packet i Let S i denote the time when start to transmit packet i Let F i denote the time when finish transmitting packet i F i = S i + P i When does router start transmitting packet i? if it arrives before router finishes packet i - 1 from this flow, then immediately after last bit of i -1 (F i-1 ) if no current packets for this flow, then start transmitting when arrives (call this A i ) Thus: F i = MAX (F i -1, A i ) + P i 8
FQ Algorithm (cont) For multiple flows (w/ slower clock) calculate F i for each packet that arrives on each flow treat all F i s as timestamps next packet to transmit is one with lowest timestamp Another version of Fair Queuing byte-by-byte A 1 6 11 15 19 20 Packet Finishing-Time C 8 B 2 7 12 16 B 16 C 3 8 D E 4 5 9 10 13 14 17 18 D 17 E 18 A 20 Ref: book by Tanenbaum 9 FQ Algorithm (cont) FQ is work-conserving link never left idle if queue is not empty If the link is fully loaded with n flows sending data, no one can use more than 1/n of the link bandwidth 10 TCP Congestion Control Flow Control: a sender to avoid overrunning a slow receiver; Congestion Control: a set of senders to avoid overrunning the network of resource General Congestion Control: to prevent or respond network overload condition; commonly with fairness concerns TCP strategy: w/o reservation, react to observable events that occur assumes best-effort network (may have FIFO or FQ in routers), each source determines available network capacity for itself uses implicit feedback (timeout, retransmission) ACKs pace transmission (self-clocking) Challenges determining the available capacity in the first place adjusting to changes in the available capacity (dynamically) 11 Additive Increase/Multiplicative Decrease Objective: adjust to changes in the available capacity New state variable per connection: CongestionWindow limits how much data source has in transit MaxWin = MIN(CongestionWindow, AdvertisedWindow) EffWin = MaxWin - (LastByteSent - LastByteAcked) Unlike AdvertisedWindow, CongestionWindow set by TCP source based on the observed congestion level Idea: increase CongestionWindow when congestion goes down decrease CongestionWindow when congestion goes up Question: how does the source determine whether or not the network is congested? Answer: a timeout occurs timeout signals that a packet was lost packets are seldom lost due to transmission error lost packet implies congestion 12
Algorithm AIMD (cont) increase CongestionWindow by one packet per RTT (linear increase) divide CongestionWindow by two whenever a timeout occurs (multiplicative decrease) In practice: increment a little for each ACK Increment = MSS*(MSS/CongestionWindow) = MSS*(1/# of ACKs Per RTT) CongestionWindow += Increment 13 Trace: sawtooth behavior AIMD (cont) Reason of being conservative: consequences of having too large a window are much worse than those of it being too small: e.g., dropped packets have to be retransmitted->more cong. Note: the accurate timeout mechanism is desirable, which is a function of RTT in TCP (Karn/Partridge algorithm, and Jacobson/Karels algorithm) Problem: it takes too long to ramp up a connection when it is starting from scratch (cold start) 14 Slow Start Slow start increase the congestion window exponentially rather than linearly. Idea: begin with CongestionWindow = 1 packet double CongestionWindow each RTT (increment by 1 packet for each ACK). Slow Start (cont) Exponential growth, slower than all at once (original TCP) Used when first starting connection when connection goes dead waiting for timeout (more knowledge) Trace of TCP CongestionWindow: interplay of slow start & AIMD Packet trans. Packet retrans. later(no ACK) timeout Why slow? compare with the original behavior of TCP (advertised window), not compare with linear growth of AIMD. 15 Problem: lose up to half a CongestionWindow s worth of data Idea of so called packet pair : gap between the ACKs 16
Fast Retransmit and Fast Recovery Problem: coarse-grain TCP timeouts lead (flat part in previous figure) to idle periods EffWin = MaxWin - (LastByteSent - LastByteAcked) Too aggressive, once lost, all lost, no enough duplicated ACKs to trigger fast retransmission Results Fast retransmit: use duplicate ACKs to trigger retransmission, until the sender sees some # of duplicated ACKs, it then retransmits the missing packet. In TCP, sender waits till three duplicated ACKs Fast Recovery reading assignment: p485 17 Note: For a small window size, there will not be enough packets in transit to cause enough duplicate ACKs to be delivered; Given the current 64KB maximum advertised window size, TCP s fast retransmit mechanism is able to detect up to three dropped packets per window in practice. AIMD -> Slow Start -> Fast Retransmit Fast Recovery: improvement 18 Congestion Avoidance TCP s strategy control congestion once it happens repeatedly increase load in an effort to find the point at which congestion occurs, and then back off Alternative strategy predict when congestion is about to happen reduce rate before packets start being discarded call this congestion avoidance, instead of congestion control Two possibilities router-centric: put a small amount of additional functionality into the router to assist the end node in the prediction. e.g., DECbit and RED Gateways host-centric: effort purely from end nodes. e.g., TCP Vegas 19 DECbit Add binary congestion bit to each packet header Router monitors average queue length over last busy+idle cycle+current cycle The router set congestion bit if average queue length > 1 attempts to balance throughput against delay -> optimize the power function 20
End Hosts Destination copies and echoes bit back to source Source records how many packets resulted in set bit If less than 50% of last window s worth had bit set increase CongestionWindow by 1 packet If 50% or more of last window s worth had bit set decrease CongestionWindow by 0.875 times previous value 21 Random Early Detection (RED) Similar to DECbit: each router monitor its own queue length, and notify the source once it senses congestion Notification is implicit (diff from DECbit) just drop the packet (TCP will timeout, or duplicate ACKs) Be used in conjunction with TCP, which currently detects congestion by timeout or duplicate ACKs could make explicit by marking the packet like DECbit Early random drop rather than wait for queue to become full, drop each arriving packet with some drop probability whenever the queue length exceeds some drop level 22 RED Details 1) Compute average queue length dynamically AvgLen=(1 - Weight) * AvgLen + Weight * SampleLen 0 < Weight < 1 (usually 0.002) SampleLen is queue length at each time a packet arrives Note: use average instead of instantaneous because the bursty nature of Internet traffic RED Details (cont) 2) Two queue length thresholds if AvgLen <= MinThreshold then queue the packet if MinThreshold < AvgLen < MaxThreshold then calculate probability P drop arriving packet with probability P if MaxThreshold <= AvgLen then drop arriving packet MaxThreshold MinThreshold MaxThreshold MinThreshold AvgLen 23 AvgLen 24
RED Details (cont) 3) Computing probability P TempP = MaxP * (AvgLen - MinThreshold)/ (MaxThreshold - MinThreshold) = MaxP/(MaxThreshold - MinThreshold) * (AvgLen - MinThreshold) P = TempP/(1 - count*tempp) Note: count keeps track of how many newly arriving packets have been queued (not dropped) while AvgLen has been between the two thresholds Reading Assignment: P487-P492 4) Drop Probability Curve Tuning RED- A Research Area (reading assignment: p492-p493 - skipped) Probability of dropping a particular flow s packet(s) should be roughly proportional to the share of the bandwidth that flow is currently getting -> fairness concern MaxP is typically set to 0.02, meaning that when the average queue size is halfway between the two thresholds the gateway drops roughly one out of 50 packets. If traffic is bursty, then MinThreshold should be sufficiently large to allow link utilization to be maintained at an acceptably high level Difference between two thresholds should be larger than the typical increase in the calculated average queue length in one RTT setting MaxThreshold to twice MinThreshold is reasonable for traffic on today s Internet 25 26 Source-Based Congestion Avoidance: TCP Vegas Idea: source watches for some sign that router s queue is building up and congestion will happen too; e.g., RTT grows for each successive packet Average sending rate (average throughput) flattens Bottleneck router Algorithm Let BaseRTT be the minimum of all measured RTTs (commonly the RTT of the first packet) If not overflowing the connection, then ExpectRate = CongestionWindow/BaseRTT Source calculates sending rate (ActualRate) once per RTT Source compares ActualRate with ExpectRate (Note: β > α > 0) Diff = ExpectedRate - ActualRate if Diff < α (means real RTT is small->no congest.) increase CongestionWindow linearly else if Diff > β (means real RTT big ->congestion) decrease CongestionWindow linearly else leave CongestionWindow unchanged 27 28
Parameters α = 1 packet/sec β = 3 packets/sec Trace of TCP Vegas When actual rate below shaded area, decrease CW When actual rate above shaded area, increase CW Algorithm (cont) Expected Rate Actual Rate Although (early) linear decrease in the above algorithm; once timeout occurs (hopefully not), it does multiplicative decrease. 29 Quality of Service Application Requirements Real-time Audio Example Taxonomy of Real-Time Application Approaches to QoS Support Integrated Services (RSVP) Service Classes Overview of Mechanisms Flowspecs Admission Control Packet Classifying & Scheduling Reservation Protocol Differentiated Services 30 Realtime Applications Require deliver on time assurances must come from inside the network Playback Buffer Microphone Sampler, A D converter Buffer, D A Speaker Example application (audio) sample voice once every 125us (8000HZ) each sample has a playback time : a time at which the sample is playback packets experience variable delay in network: Jitter add a constant offset to playback time: playback point Question: How much? Answer: < 300ms Sequence number Packet generation Network delay Time Buffer Packet arrival Playback 31 32
Example Distribution of Delays for an Internet Connection Taxonomy of Applications 3 90% 97% 98% 99% Applications Real time Elastic Packets (%) 2 1 Delay/ bit rate Adaptive Tolerant Nonadaptive Occasional loss Rate-adaptive Intolerant Interactive Nonadaptive Interactive bulk Asynchronous Delayadaptive Rateadaptive 50 100 Delay (milliseconds) 150 200 Reading assignment: p503-505 33 34 Approaches to QoS Support Fine-grained approach: provide QoS to individual applications or Flows (e.g. Integrated Service, associated with RSVP) Coarse-grained approaches: provide QoS to large classes of data or aggregated traffic (e.g. Differentiated Services) Integrated Services (RSVP): IntServ (defined by IETF around 95-97) Service Classes Delay Guaranteed: for intolerant applications to guarantee the maximum delay controlled-load: for tolerant/adaptive applications, which run well on networks that are not heavily loaded, e.g., audio application vat. Mechanisms for QoS implementation Flowspec: a set of information about the flow (within a single application, share common requirements) admission control (say no when necessary) signaling protocol: by which user & network components exchange information such as requests of service, flowspecs, admission decision. in ATM, it is called signaling, in Internet, it is called resource reservation. packet scheduling: how routers and switches behave based on admission control decision 35 Flowspec Rspec: describes service requested from network controlled-load guaranteed: delay target/bound Tspec: describes flow s traffic characteristics (varies with time) average bandwidth + burstiness: token 2 bucket filter token bucket rate r : a long term average rate 1 bucket depth B must have a token to send a byte must have n tokens to send n bytes accumulate tokens at rate of r per second start with no tokens can accumulate no more than B tokens (big B corresponding to big variant) Bandwidth (MBps) Flow A Flow B 1 2 3 4 time 36
QoS Mechanisms (reading:p509-516) Admission Control: Based on Tspec & Rspec decide if a new flow can be supported answer depends on service class not the same as policing (e.g. priority when resource available) Packet Processing: how the router deliver the packet after network decides to admit the traffic and made the reservation on routers classification: associate each packet with the appropriate reservation (check src/dest addrs/ports, and protocol number) scheduling: manage queues so each packet receives the requested service 37 Reservation Protocol Called signaling in ATM Proposed Internet standard: RSVP Consistent with robustness of today s connectionless model Uses soft state (refresh periodically w/ timeout) Designed to support multicast Receiver-oriented Two messages: PATH and RESV Source transmits PATH messages every 30 seconds Destination responds with RESV message Merge requirements in case of multicast Can specify number of speakers 38 RSVP Example: skipped Sender 2 Sender 1 PATH R PATH R RSVP vs. ATM (Q.2931): skipped RSVP receiver generates reservation soft state (refresh/timeout) separate from route establishment QoS can change dynamically receiver heterogeneity RESV (merged) R R R RESV RESV Receiver B Receiver A ATM sender generates connection request hard state (explicit delete) concurrent with route establishment QoS is static for life of connection uniform QoS to all receivers 39 40
Differentiated Services (Diffserv) Problem with Integrated Services: scalability of flow-based Idea: support a small number of classes of packets. e.g., premium P(drop) best-effort 1.0 Mechanisms: packets: in and out bit e.g., edge routers: tag packets DSCP (6 bits) indicates the MaxP AvgLen PHB(per-hop behavior): e.g. EF; AF Min out Min in Max out Max in RIO (RED with In and Out) (at the intermediate router) is the root of AF (assured forwarding): for two classed, can generalize 41