Adaptive or Active Queue Management

Adaptive or Active Queue Management Prof. C. Tschudin, M. Sifalakis, M. Monti, U. Schnurrenberger University of Basel Cs321 - HS2014

Overview Queue management inside Internet s routers Issues arising with queue management What is Active Queue Management Old AQM RED, FRED, CHOKE, ARED, BLUE Bufferbloat: the new threat of the Internet New AQM Codel, PIE

Inside Internet Routers The forwarding plane of every packet handling router has Queues + a queue management policy: Allocate buffer space to incoming flows Absorbers of transient load (bursts of traffic) A scheduling discipline: Allocate bandwidth to en-queued flows Ensure flow multiplexing Collectively referred to as the router s Queue Management

Typical Internet Queue Management E.g. Single queue, FIFO schedule (+ Tail-drop policy) Single Class of Traffic Max Q length Incoming Flows Out FIFO (First in, First Out) Max Q length Incoming Flows Out FIFO + Tail-drop Queue Tail-drop, when full

Single Queue, FIFO + Tail-drop Problems No separation between different types of flows Aggressive flows get more packets through Bursty flows react differently than CBR flows Poor flow inter-leaving/mixing Lockout A few flows can monopolize the Queue space, not letting new flows to be admitted Synchronization to events End hosts react (congestion avoidance) at the same time to the same events (maybe at different scales) Periodic load-surges appear and become resident-periodic

Hash-based Classifier Typical Internet Queue Management E.g. Multi-queue, Fair schedule (+ tail-drop policy) Max Q length Incoming Flows (Deficit) Round Robin Out Fair Queueing: each incoming flow hashed to its own Q (Nagle) N Queues Any full Queue Tail Drops Stochastic FQ: All incoming flows hashed-muxed across N queues (more realistic for core routers)

Some so-called optimisations Router serves individual queues to exhaustion Play market-games: Credit the Deficit! (in DRR) Less frequent context switches for the scheduler Manufacturers can sell faster backplanes (win the benchmark) Router employs very large queues Reduce tail-drops during bursts In highly variable bandwidth links gives time to TCP to exit slow-start phase (successful flow admission) Telco s can advertise high-bandwidth in their networks Are they really optimisations?... Or just marketing games?

What (TCP) communication endpoints see Bandwidth delay product estimation: aka Pipe Sender perceived Pipe capacity = (Tx Rate) x (RTT/2) Adjust transmission rate accordingly every RTT En-path Queueing affects RTT (bigger pipe size) Tail-drops are signals to slow down (reduce Tx Win) Data Sender ACK Receiver

Video 1: netfpga

Issues that can emerge As Queues build up, RTT increases, congestion becomes hard to remediate (notice this has the opposite effect from the original intend) Takes longer for endpoints to react (TxWin adjustment) ACK spacing can get compressed increasing burstiness of transmissions The larger the capacity of the queue The more detached are the endpoints from the truth about congestion The more likely that the queues become persistent: bufferbloat (long standing delay is treated as the norm)

Issues that can emerge Add serving queues to exhaustion High variability in Avg. queue sizes (and thus delay) RTT estimation at end node does not stabilize Likely poor flow mixing More ACK compression Butterfly effects start to appear and dominate.. (see video) Overall the network can become slower and slower up to a halting point (collapse)

Video 2: Van Jacobson on flow mixing

Active Queue Management Not all problems were realised at once As Queues build up, RTT increases Takes longer for endpoints to react (TxWin adjustment) ACK spacing can get compressed leading to burstiness of transmissions Early AQM approaches aimed to address this problem only Provide feedback about imminent congestion While the endpoint can still react fast (i.e. before RTT increase due to the queue, blocks feedback) Approaches that we discuss in this lecture RED and variants, BLUE and variants, CHOKe, ECN

Active Queue Management Eventually the other issues became apparent Too long queues (with resident load) High-frequency congestion signal fluctuations (due to scheduling optimisations ) New AQMs try to serve their purpose around these nuisances CoDel, PIE (the ones we discuss here) Plus overcome a big shortcoming of early approaches Generic tuning of configuration parameters!

AQM Design Objectives Maximize throughput Keep lines busy at all times (queues not staying empty) Minimize delay Queues at steady state almost empty Serve transient load surges Queue size should reflect its ability absorb bursts, and not resident load No flow lockout Packet drops affecting admitted flows rather than new-coming ones

AQM as a Control System Objective Function (Parameter) Queue dynamics Arrows are not data traffic but rather the embodiment of signaling (E.g. the measured parameter is not necessarily the departure rate) Sender Action embedding in Queue input Bottleneck Router Queue Receiver Feedback or lack thereof (ACK ) Measure at router Routers can distinguish between propagation and persistent queuing delays Routers can decide on transient congestion, based on workload Act at the sender Convey action to the sender is the big challenge!

EARLY AQM

Lock-out Problem: Easier to solve Random drop policy Packet arriving when queue is full causes some random packet to be dropped Drop front policy On full queue, drop packet at head of queue SFQ + Tail drop Same effect as Random drop Solving the lock-out problem does not address the full-queues problem

Full Queues Problem: Bigger challenge Notify sender before queue becomes full (early drop) Notify = Drop packets takes > RTT to be sensed Notify = Mark packets takes <= 1 RTT but can be lost Notice that how fast the signal will arrive to the sender depends on Net weather (congestion/losses) Challenges When to notify.. Or how often Who to notify (cannot afford per flow monitoring)

Drop probability RED Model Max thresh Pkt Arrival Min thresh Maintain Exp Moving Avg (EWMA) of queue length Byte mode vs. Packet mode depending if Tx delay is a function of packet length or not Actual Queue Length Avg Queue Length For each packet arrival if (avgq < min th ) do nothing 1.0 if (max th avg) mark(packet) if (min th avgq < max th ) P max calculate probability P m min th max th Avg queue length mark(p m, packet) Marked packets are either dropped or ECN flagged (discussed soon)

RED Parameters EWMA of Queue size computed at every packet arrival (not periodically!) avgq now = w q * qlen now + (1-w q ) * avgq prev Special condition if queue was idle, I.e. qlen now = 0 Same as if it had been 100% link utilisation with 0 queue. Approx. that m small packets were processed m = (t now t last_arrival )/ pkt_size nominal avgq now = (1-w q ) m * avgq prev Packet marking probability is a function of % of the between thresholds utilitisation P m = P max * (avgq now min th ) / (max th min th )

Issues with RED: configurability avgq is an EWMA w q adjusts the lag and trend and window of averaging Short window: fast sensing but vulnerable to transients Long window: slow adaptation min th adjusts the power of the network Too close to 0: the queue is likely to have idle periods (bandwidth not used) Too far from 0: increases path latency, delays feedback signal to endpoints max th min th adjusts the frequency of marking Too small : the AQM becomes spasmodic in its reaction, forces flows to sync Must be larger that the typical avgq increase in an RTT A visual analogy: which vessel size for which sea condition? Think traffic bursts like the swell of the sea (height, length) Think of the AQM as a speed boat or a a tanker depending on swell

Issues with RED: configurability Average Queue size oscillation Difficult to control congestion when many flows esp. unresponsive ones Qlen max 8 flows Actual Qlen max 32 flows Actual Avg Avg 0 time 0 time

RED & variants FRED Fair RED or Flow RED: fairness among flow types Flow type differentiation based on queue use non-adaptive (UDP), fragile (sensitive to loss), robust per-active-flow (present in the queue buffer) accounting and loss-regulation All flows entitled to admit min q packets without loss Adjust min q based on avg. per flow occupancy (avgcq) of queue Set upper capacity per flow type to max q and count violations (strikes). Then for frequent violators lower max q = avgcq (to punish them more often)

RED & variants CHOKE CHOose and Kill unresponsive flows... Or.. CHOose and Keep responsive flows Compare incoming packet with random selected packet in queue Aggressive flows become more likely to select Max thresh Pkt Arrival Min thresh Select Randomly Flow ID Match No Apply RED (variant) Yes

RED & variants ARED Adaptive RED: reduces delay variance + introduces parameter auto-tune Adapt P max periodically, slowly, and with an AIMD policy Fix max th = 3 * min th Fix w q = 1 - exp (1/Link_capacity) Goal: To keep avgq const at about (max th +min th )/2

RED & variants SRED Stabilised RED eliminate need of avgq (and w q ) P m = f (inst. queue length, # of active flows, rank of flow) Zombie list: history of K seen flows with Hit counters. On packet arrival pick Zombie flow randomly if flows match else hits++ replace Zombie with prob. P r Statistical counting of flows based on Hit Freq. Rank of flow = Hit counter on match

RED & variants BLUE Putting past insights in new light Avoid parameter tuning nightmare Avoid effects of avg queue fluctuation on AQM Adaptive marking probability P m = f (packet loss, link idle events) [Pkt loss] if (t now t last_arrival > freeze_period) P m = P m + d 1 [Idle link] if (t now t last_arrival > freeze_period) P m = P m - d 2 d 1 >> d 2 : faster reaction to congestion up-rise than decrease freeze_period works as a sort-of A/D discretizer (step-hold) Filters out high-freq transient oscillations Adjusts parameter at packet arrivals times mod a fixed quantum SFBlue uses ideas of SFQ and FRED to discriminate flows

Explicit Congestion Notification Works with TCP traffic. Instead of packet dropping, packet marking An old idea called DEC-bit from DEC-net (early day TCP/IP competitor) Packet Dropping Data: 7 6 5 4 3 2 1 TCP Sender TCP Receiver ACK: 1 2 2 2 2 2 ECN Data: 7 6 5 4 3 2 1 TCP Sender TCP Receiver ACK: 1 2 3 4 5 6 7

ECN How marking works At the IP header: Signal from router to receiver Differentiated Services Flags 6 bits Reserved ECN 22 bits ECT CE ECT: ECN Capable Transport CE: Congestion Experienced VER 4 bits HLEN 4 bits Time to Live 8 bits Identification 16 bits DS 8 bits Protocol 8 bits Flags 3 bits Source IP address 32 bits Destination IP address 32 bits Options (if any) Data Total Length 16 bits Fragmentation offset 13 bits Header Checksum 16 bits ECT CE Interpretation 0 0 Not-ECT (Not ECN Capable Transport) 0 1 ECT(1) (ECN Capable Transport (1)) 1 0 ECT(0) (ECN Capable Transport(0)) 1 1 CE (Congestion Experienced)

ECN How marking works At the TCP header: Signal from receiver to sender Reserved 4 bits C W R E C E U R G A C K P S H R S T S Y N F I N Source port address 16 bits Destination port address 16 bits HLEN 4 bits Reserved 6 bits Checksum 16 bits Sequence Number 32 bits Acknowledgement Number 32 bits U R G A C K P S H R S T S Y N F I N Window size 16 bits Urgent pointer 16 bits CWR: Congestion Window Reduced Flag ECE: ECN-Echo Flag Options (if any) Data

ECN Vs. Packet drop as a feedback signal Packet drop effective even with full queues, while ECN makes only sense before queues get full Packet drop = 3 DUP ACK or timeout before sender acts ECN delivers the feedback faster Packet drop => retransmissions Judas kiss: communicate a signal through an impairment (B. Briscoe) ECN is just a signal: effects better goodput

MODERN AQM

Bufferbloat: The new Internet threat? What it is? Constant residue of packets in Queues that never goes away Adds constant delay component to the e2e path latency Where does it come from? A combination of the following Senders transmit at higher rate than the bottleneck link can sustain Excessively large queues increase e2e RTT, and delays TCP feedback Sender response is phase shifted compared to congestion epoch Why is it a threat? Excessive delays become resident even on high speed networks Confuses TCP s flow/congestion control algorithm Solutions? Modern AQM tries to address primarily this + the old problems

Illustrating Bufferbloat good queue operating at link speed (rate) Also good queue (typical of delayed ACK scheme, or when serving synchronised flows) Bloated queue, that cannot get rid of resident load Bufferbloat Both these queues have the same const Avg Length of N pkts over an RTT! What distinguishes them however is the Avg Min length (over a large enough window)!!! Large enough 1RTT

Codel Controlled delay Time-based model of queue dynamics instead of a spatial one Objective: Monitor how long the Min queue length remains above a threshold (desired Min) Less descriptive than Avg. Min queue length, Yet sufficient, and simpler computationally Metric: Sojourn time as a measure of instantaneous queue length How long packets stay in the queue: Time delta between a packet s departure and arrival time Works with a single queue or multiple queues Works for variable link rates (e.g. wireless links) well statistically speaking! Simple to measure, easy to implement

Codel Controlled delay On packet arrival: timestamp(packet) 1. dropping rate increases linearly with the measured RTT 2. if a last epoch of drops is not too long in the past, the last good drop rate is also remembered! On packet departure: sojourn = now packet.tstamp if (sojourn < Target) if (drp_mode == 1) drp_mode = 0 exit_drp = now if (now exit_drp >= Interval) drp_count = 0 else // sojourn > Target if (drp_mode==0 && now - exit_drp < Interval) drp_mode = 1 if (now >= next_drp) drp(packet) drp_count++ next_drp = now + Interval/sqrt(drp_count) else if (drp_mode = 0) // start drop drp_mode = 1 drp(packet) drp_count = 1 next_drp = now + Interval/sqrt(drp_count) else // already in drp_mode if (now >= next_drp) drp(packet) drp_count++ next_drp = now + Interval/sqrt(drp_count)

Codel Controlled delay On packet arrival: timestamp(packet) Sojourn falls below Target Only reset drop rate memory if Sojourn is below Target for Interval Sojourn above Target again after temporary improvement, resume last drop rate Sojourn above Target first time, after Interval, start dropping Sojourn continues to remain above Target, continue dropping On packet departure: sojourn = now packet.tstamp if (sojourn < Target) if (drp_mode == 1) drp_mode = 0 exit_drp = now if (now exit_drp >= Interval) drp_count = 0 else // sojourn > Target if (drp_mode==0 && now - exit_drp < Interval) drp_mode = 1 if (now >= next_drp) drp(packet) drp_count++ next_drp = now + Interval/sqrt(drp_count) else if (drp_mode = 0) // start drop drp_mode = 1 drp(packet) drp_count = 1 next_drp = now + Interval/sqrt(drp_count) else // already in drp_mode if (now >= next_drp) drp(packet) drp_count++ next_drp = now + Interval/sqrt(drp_count)

Codel Controlled delay Significantly less configuration magic involved Interval: const ( 1RTT) Target (delay): const max{ equiv of 1-2 packets worth of queue, 5% of worst case RTT} Drop/Mark rate: const acceleration in Interval inverse-square-root progression linear increase of drops per RTT dropping speed up is independent of queue accumulation speed!?!? fq_codel combines Codel with SFQ treats different traffic classes fair gives starting flows a good head start Sojourn measurement does require blocking of the queue! by contrast to queue length averaging Drop/Mark at the head of the queue not the tail

PIE Proportional Integral Enhanced On packet arrival: decide packet fate mark/drop(p_drop, pkt) On packet depart: estimate output rate if (q_len > pkt_threshold) byte_count = byte_count + pkt_bytes if (byte_count > pkt_threshold) inst_rate = byte_count / (now last) avg_rate = (1-w)*avg_rate + w*inst_rate last = now byte_count = 0 On Interval Expiration (periodically): update drop rate q_delay = interval * q_len / avg_rate (little s law) P_drop = P_drop + a*(q_delay ref_dealy) + b*(q_delay q_delay_old) q_delay = q_dealy_old P_drop < 1% a = A/8, b = B/8 P_drop < 10% a = A/2, b = B/2 else a = A, b = B

PIE Proportional Integral Enhanced On packet arrival: decide packet fate mark/drop(p_drop, pkt) Start counting bytes contributing to On packet depart: estimate output bufferbloat rate when threshold is reached if (q_len > pkt_threshold) byte_count = byte_count + pkt_bytes if (byte_count > pkt_threshold) inst_rate = byte_count / (now last) avg_rate = (1-w)*avg_rate + w*inst_rate last = now byte_count = 0 On Interval Expiration (periodically): update drop rate q_delay = interval * q_len / avg_rate (little s law) P_drop = P_drop + a*(q_delay ref_dealy) + b*(q_delay q_delay_old) q_delay = q_dealy_old Deviation from desired delay Delay change in 1 interval Exp weight mov. avg computation of rate Once buffebloat in bytes is counted compute queue drain rate P_drop < 1% a = A/8, b = B/8 P_drop < 10% a = A/2, b = B/2 else a = A, b = B

PIE Proportional Integral Enhanced Also controls delay instead of queue length like Codel Dropping at the tail of the queue to save buffer space Instead of head (Codel) 3 modes of operation for 3 different traffic classes Parameters a,b adjustment Quite lot of other magic numbers (contrast to Codel) Queue delay prediction based on queue size and smoothed output rate Instead of actual measurement (Codel) Drop probability takes into account deviation from nominal value and corrects/improves effects of previous action (direction/magnitude of change) Instead of binary accelarate-or-switch_off (Codel)

Some links on Bufferbloat How can I tell if I m suffering from bufferbloat? http://gettys.wordpress.com/2010/12/06/whose-house-is-ofglasse-must-not-throw-stones-at-another/ Can I do anything personally to reduce my suffering from bufferbloat? http://gettys.wordpress.com/2010/12/13/mitigations-andsolutions-of-bufferbloat-in-home-routers-and-operatingsystems/ Bufferbloat triggered the network neutrality debate http://gettys.wordpress.com/2010/12/07/bufferbloat-andnetwork-neutrality-back-to-the-past/

Questions?