Goals and Approach CS 194: Dstrbuted Systems Resource Allocaton Goal: acheve predcable performances Three steps: 1) Estmate applcaton s resource needs (not n ths lecture) 2) Admsson control 3) Resource allocaton Scott Shenker and Ion Stoca Computer Scence Dvson Department of Electrcal Engneerng and Computer Scences Unversty of Calforna, Berkeley Berkeley, CA 94720-1776 1 2 Type of Resources Allocaton Models CPU Storage: memory, dsk Bandwdth Devces (e.g., vde camera, speakers) Others: - Fle descrptors - Locks - Shared: multple applcatons can share the resource - E.g., CPU, memory, bandwdth Non-shared: only one applcaton can use the resource at a - E.g., devces 3 4 Not n ths Lecture In ths Lecture How applcaton determne ther resource needs How users pay for resources and how they negotate resources Dynamc allocaton,.e., applcaton allocates resources as t needs them Focus on bandwdth allocaton - CPU smlar - Storage allocaton usually done n fxed chunks Assume applcaton requests all resources at once 5 6 Page 1
Two Models Integrated s - Fne graned allocaton; per-flow allocaton Dfferentated servces - Coarse graned allocaton (both n and space) Integrated s Example Acheve per-flow bandwdth and delay guarantees - Example: guarantee 1MBps and < 100 ms delay to a flow Flow: a stream of packets between two applcatons or endponts 7 8 Integrated s Example Allocate resources - perform per-flow admsson control Integrated s Example Install per-flow state 9 10 Integrated s Example Integrated s Example: Data Path Install per flow state Per-flow classfcaton 11 12 Page 2
Integrated s Example: Data Path Per-flow buffer management Integrated s Example Per-flow schedulng 13 14 How Thngs Ft Together Classes Routng Messages Data In Routng Forwardng Table Route Lookup Classfer RSVP Admsson Control Per Flow QoS Table Scheduler Control Plane Data Plane RSVP messages Data Out Multple servce classes : contract between network and communcaton clent - End-to-end servce - Other servce scopes possble Three common servces - Best-effort ( elastc applcatons) - Hard real- ( real- applcatons) - Soft real- ( tolerant applcatons) 15 16 Hard Real Tme: Guaranteed s Soft Real Tme: Controlled Load contract - Network to clent: guarantee a determnstc upper bound on delay for each packet n a sesson - Clent to network: the sesson does not send more than t specfes Algorthm support - Admsson control based on worst-case analyss - Per flow classfcaton/schedulng at routers contract: - Network to clent: smlar performance as an unloaded best-effort network - Clent to network: the sesson does not send more than t specfes Algorthm Support - Admsson control based on measurement of aggregates - Schedulng for aggregate possble 17 18 Page 3
Role of RSVP n the Archtecture RSVP Desgn Features Sgnalng protocol for establshng per flow state Carry resource requests from hosts to routers Collect needed nformaton from routers to hosts At each hop - Consult admsson control and polcy module - Set up admsson state or nforms the requester of falure IP Multcast centrc desgn (not dscussed here ) ntated reservaton Dfferent reservaton styles Soft state nsde network Decouple routng from reservaton 19 20 RSVP Basc Operatons Route Pnnng : sends PATH message va the data delvery path - Set up the path state each router ncludng the address of prevous hop sends RESV message on the reverse path - Specfes the reservaton style, QoS desred - Set up the reservaton state at each router Thngs to notce - ntated reservaton - Decouple routng from reservaton - Two types of state: path and reservaton 21 Problem: asymmetrc routes - You may reserve resources on R S3 S5 S4 S1 S, but data travels on S S1 S2 S3 R! Soluton: use PATH to remember drect path from S to R,.e., perform route pnnng S IP routng PATH RESV S1 S4 S2 S3 S5 R 22 PATH and RESV messages Token Bucket and Arrval Curve PATH also specfes - Source traffc characterstcs Use token bucket - Reservaton style specfy whether a RESV message wll be forwarded to ths server RESV specfes - Queueng delay and bandwdth requrements - Source traffc characterstcs (from PATH) - Flter specfcaton,.e., what senders can use reservaton - Based on these routers perform reservaton Parameters - r average rate,.e., rate at whch tokens fll the bucket - b bucket depth - R maxmum lnk capacty or peak rate (optonal parameter) A bt s transmtted only when there s an avalable token Arrval curve maxmum number of bts transmtted wthn an nterval of of sze t r bps b bts bts b*r/(r-r) slope R Arrval curve slope r 23 <= R bps regulator 24 Page 4
How Is the Token Bucket Used? Can be enforced by - End-hosts (e.g., cable modems) - Routers (e.g., ngress routers n a Dffserv doman) Can be used to characterze the traffc sent by an end-host 3Kb Traffc Enforcement: Example r = 100 Kbps; b = 3 Kb; R = 500 Kbps (a) (b) 2.2Kb T = 0 : 1Kb packet arrves 2.4Kb (c) 3Kb T = 2ms : packet transmtted b = 3Kb 1Kb + 2ms*100Kbps = 2.2Kb (d) (e) 0.6Kb 25 T = 4ms : 3Kb packet arrves T = 10ms : packet needs to wat untl enough tokens are n the bucket! T = 16ms : packet transmtted 26 Source Traffc Characterzaton Arrval curve maxmum amount of bts transmtted durng an nterval of t Use token bucket to bound the arrval curve Source Traffc Characterzaton: Example Arrval curve maxmum amount of bts transmtted durng an nterval of t Use token bucket to bound the arrval curve bts (R=2,b=1,r=1) Arrval curve bps bts Arrval curve bps 4 3 2 2 t 1 0 1 2 3 4 5 1 1 2 3 4 5 t 27 28 QoS Guarantees: Per-hop Reservaton End-to-End Reservaton End-host: specfy - The arrval rate characterzed by token-bucket wth parameters (b,r,r) - The maxmum maxmum admssble delay D Router: allocate bandwdth r a and buffer space B a such that - No packet s dropped - No packet experences a delay larger than D bts b*r/(r-r) slope r B a D slope r a Arrval curve 29 When R gets PATH message t knows - Traffc characterstcs (tspec): (r,b,r) - Number of hops R sends back ths nformaton + worst-case delay n RESV Each router along path provde a per-hop delay guarantee and forward RESV wth updated nfo - In smplest case routers splt the delay S (b,r,r,0,0) PATH RESV (b,r,r) S1 S2 (b,r,r,2,d-d 1 ) (b,r,r,1,d-d 1 -d 2 ) num hops (b,r,r,3) S3 R (b,r,r,3,d) worst-case delay 30 Page 5
Dfferentated s (Dffserv) Buld around the concept of doman Doman a contguous regon of network under the same admnstratve ownershp Dfferentate between edge and core routers Edge routers - Perform per aggregate shapng or polcng - Mark packets wth a small number of bts; each bt encodng represents a class (subclass) Core routers - Process packets based on packet markng Far more scalable than Intserv, but provdes weaker servces 31 32 Dffserv Archtecture Dfferentated s Ingress routers - Polce/shape traffc - Set Dfferentated Code Pont (DSCP) n Dffserv (DS) feld Core routers - Implement Per Hop Behavor (PHB) for each DSCP - Process packets based on DSCP DS-1 DS-2 Two types of servce - Assured servce - Premum servce Plus, best-effort servce Ingress Egress Ingress Egress Edge router Core router 33 34 Assured [Clark & Wroclawsk 97] Assured Defned n terms of user profle, how much assured traffc s a user allowed to nject nto the network Network: provdes a lower loss rate than best-effort - In case of congeston best-effort packets are dropped frst User: sends no more assured traffc than ts profle - If t sends more, the excess traffc s converted to besteffort Large spatal granularty servce Theoretcally, user profle s defned rrespectve of destnaton - All other servces we learnt are end-to-end,.e., we know destnaton(s) apror Ths makes servce very useful, but hard to provson (why?) Traffc profle Ingress 35 36 Page 6
Premum [Jacobson 97] Edge Router Provdes the abstracton of a vrtual ppe between an ngress and an egress router Network: guarantees that premum packets are not dropped and they experence low delay User: does not send more than the sze of the ppe - If t sends more, excess traffc s delayed, and dropped when buffer overflows Data traffc Ingress Traffc condtoner Class 1 Traffc condtoner Class 2 Classfer Best-effort Marked traffc Scheduler 37 Per aggregate Classfcaton (e.g., user) 38 Assumptons Control Path Assume two bts - P-bt denotes premum traffc - A-bt denotes assured traffc Traffc condtoner (TC) mplement - Meterng - Markng - Shapng Each doman s assgned a Bandwdth Broker (BB) - Usually, used to perform ngress-egress bandwdth allocaton BB s responsble to perform admsson control n the entre doman BB not easy to mplement - Requre complete knowledge about doman - Sngle pont of falure, may be performance bottleneck - Desgnng BB stll a research problem 39 40 Example Comparson to Best-Effort and Intserv Acheve end-to-end bandwdth guarantee Dffserv Intserv 2 3 Per aggregate solaton Per aggregate guarantee Per flow solaton Per flow guarantee sender 1 9 BB 7 BB 5 BB 8 profle 6 profle 4 profle recever scope Complexty Scalablty Doman Long term setup Scalable (edge routers mantans per aggregate state; core routers per class state) End-to-end Per flow steup Not scalable (each router mantans per flow state) 41 42 Page 7
Weghted Far Queueng (WFQ) Far Rate Computaton: Example 1 The scheduler of choce to mplement bandwdth and CPU sharng Implements max-mn farness: each flow receves mn(r, f), where - r flow arrval rate - f lnk far rate (see next slde) If lnk congested, compute f such that mn( r, f ) = C f = 4 mn(8, 4) = 4 mn(6, 4) = 4 mn(2, 4) = 2 Weghted Far Queueng (WFQ) assocate a weght wth each flow 43 44 Far Rate Computaton: Example 2 Flud Flow System Assocate a weght w wth each flow If lnk congested, compute f such that (w 1 = 3) (w 2 = 1) (w 3 = 1) mn( r, f w ) = C f = 2 mn(8, 2*3) = 6 mn(6, 2*1) = 2 mn(2, 2*1) = 2 Flows can be served one bt at a WFQ can be mplemented usng bt-by-bt weghted round robn - Durng each round from each flow that has data to send, send a number of bts equal to the flow s weght Flow s guaranteed to be allocated a rate >= w*c/( k w k ) If k w k <= C, flow s guaranteed to be allocated a rate >= w 45 46 Flud Flow System: Example 1 Flud Flow System: Example 2 Flow 1 (w 1 = 1) Flow 2 (w 2 = 1) Flow 1 (arrval traffc) Flow 1 Flow 2 1 2 3 4 5 Packet Sze (bts) 1000 500 Packet nter-arrval (ms) 10 10 Rate (C) (Kbps) 100 50 Red flow has sends packets between 0 and 10 - Backlogged flow flow s queue not empty Other flows send packets contnuously All packets have the same sze flows weghts lnk 5 1 1 1 1 1 Flow 2 (arrval traffc) transmsson 1 2 3 4 5 6 n flud flow system C 1 2 3 4 5 1 2 3 4 5 6 0 10 20 30 40 50 60 70 80 Area (C x transmsson_) = packet sze (ms) 47 0 2 4 6 8 10 15 48 Page 8
Implementaton In Packet System Packet System: Example 1 Packet (Real) system: packet transmsson cannot be preempted. Soluton: serve packets n the order n whch they would have fnshed beng transmtted n the flud flow system n flud flow system 1 2 3 1 2 4 3 4 5 5 6 (ms) Select the frst packet that fnshes n the flud flow system Packet system 1 2 1 3 2 3 4 4 5 5 6 49 50 Packet System: Example 2 Implementaton Challenge n flud flow system 0 2 4 6 8 10 Select the frst packet that fnshes n the flud flow system Need to compute the fnsh of a packet n the flud flow system but the fnsh may change as new packets arrve! Need to update the fnsh s of all packets that are n servce n the flud flow system when a new packet arrves - But ths s very expensve; a hgh speed router may need to handle hundred of thousands of flows! Packet system 0 2 4 6 8 10 51 52 Example Soluton: Vrtual Tme Four flows, each wth weght 1 Flow 1 Flow 2 Flow 3 Flow 4 Fnsh s computed at 0 0 1 2 3 Fnsh s re-computed at Key Observaton: whle the fnsh s of packets may change when a new packet arrves, the order n whch packets fnsh doesn t! - Only the order s mportant for schedulng Soluton: nstead of the packet fnsh mantan the number of rounds needed to send the remanng bts of the packet (vrtual fnshng ) - Vrtual fnshng doesn t change when the packet arrves System vrtual ndex of the round n the bt-by-bt round robn scheme 0 1 2 3 4 53 54 Page 9
System Vrtual Tme: V(t) Measure servce, nstead of V(t) slope normalzed rate at whch every backlogged flow receves servce n the flud flow system - C lnk capacty - N(t) total weght of backlogged flows n flud flow system at t V(t) V ( t) C = t N ( t) System Vrtual Tme (V(t)): Example 1 V(t) ncreases nversely proportonally to the sum of the weghts of the backlogged flows Flow 1 (w1 = 1) Flow 2 (w2 = 1) 1 2 3 1 2 4 3 4 5 5 6 V(t) C/2 C 55 56 System Vrtual Tme: Example Far Queueng Implementaton Defne w1 = 4 w2 = 1 w3 = 1 w4 = 1 w5 = 1 k - - vrtual F fnshng of packet k of flow k - - arrval a of packet k of flow k - - length L of packet k of flow - w weght of flow The fnshng of packet k+1 of flow s V(t) C/4 C/8 C/4 F k + 1 = max( V ( a k + 1 ), F k ) + L k + 1 / w 0 4 8 12 16 57 58 Propertes of WFQ Herarchcal Lnk Sharng Guarantee that any packet s transmtted wthn packet_lengt/lnk_capacty of ts transmsson n the flud flow system - Can be used to provde guaranteed servces Acheve max-mn far allocaton - Can be used to protect well-behaved flows aganst malcous flows Lnk 155 Mbps 100 Mbps 55 Mbps Provder 1 Provder 2 50 Mbps 50 Mbps Berkeley Stanford. 20 Mbps 10 Mbps EECS Math Campus Resource contenton/sharng at dfferent levels Resource management polces should be set at dfferent levels, by dfferent enttes - Resource owner - provders - Organzatons - Applcatons semnar vdeo semnar audo WEB 59 60 Page 10
Packet Approxmaton of H-WFQ Flud Flow H-WFQ Packetzed H-WFQ 10 10 WFQ WFQ 6 4 6 4 WFQ WFQ WFQ WFQ 1 2 3 1 2 3 WFQ WFQ WFQ WFQ WFQ WFQ Idea 1 - Select packet fnshng frst n H- WFQ assumng there are no future arrvals - Problem: Fnsh order n system dependent on future arrvals Vrtual mplementaton won t work Idea 2 - Use a herarchy of WFQ to approxmate H-WFQ 61 Page 11