Datacenter Traffic Measurement and Classificatin Speaker: Lin Wang Research Advisr: Biswanath Mukherjee
Grup meeting 6/15/2017 Datacenter Traffic Measurement and Analysis Data Cllectin Cllect netwrk events frm each f 1500 servers Fr ver tw mnths. Traffic Characteristics Server pairs within the same rack mre likely t exchange mre bytes. 21% prbability t exchange data within the same rack. 0.5% prbability t exchange data in different racks. Kandula S, Sengupta S, Greenberg A, et al. The nature f data center traffic: measurements & analysis[c]//prceedings f the 9th ACM SIGCOMM cnference n Internet measurement cnference. ACM, 2009: 202-208.
Grup meeting 6/15/2017 Datacenter Traffic Measurement and Analysis Traffic Characteristics Server either talks t almst all the ther servers within the rack (the bump near 1 in figure left) r fewer than 25% f servers within the rack. Server either desn t talk t servers utside its rack (the spike at zer in figure right) r it talks t abut 1-10% f utside servers.
Grup meeting 6/15/2017 Datacenter Traffic Measurement and Analysis Traffic Characteristics Cmpare the rates f flws that verlap high utilizatin perids. Rates d nt change appreciably (see cdf belw). Errrs(e.g. flw timeuts r failure) is nt visible in flw rates. Hence we crrelate high utilizatin epchs directly with applicatin level lgs.
Grup meeting 6/15/2017 Datacenter Traffic Measurement and Analysis Traffic Characteristics Traffic mix changes frequently. The figure plts the duratins f millin flws (a day s wrth f flws) in the cluster. Mst flws cme and g (80% last less than 10s) and there are few lng running flws (less than 0.1% last lnger than 200s).
Grup meeting 6/15/2017 Datacenter Traffic Classificatin Why d we need t identify elephant flw? Previus paper shws that a large fractin f datacenter traffic is carried in a small fractin f flws. 90% f the flws carry less than 1MB f data >90% f bytes transferred are in flws greater than 100MB. Hash-based flw frwarding techniques (e.g. Equal-Cst Multi-Path (ECMP) ruting) wrks well nly fr mice flws and n elephant flws.
Grup meeting 6/15/2017 Mice flw VS elephant flw Small size packet Shrt flw Large number Shrt-lived Large size packet Large vlume flw Small number Lng lasting
Grup meeting 6/15/2017 Mice flw VS elephant flw If we nly care abut the number f packets in the queue, elephant flw transmissin is easy t be degraded. If we nly care abut the ttal size f packets in the queue, mice flw transmissin is easy t be degraded. < >
Grup meeting 6/15/2017 Datacenter Traffic Classificatin Slutin: Mahut, a lw-verhead yet effective traffic management system. End-hst-based elephant detectin. Advantages f detecting elephant flw at end hst. Netwrk behavir f a flw is affected by hw rapidly the end-pint applicatins are generating data, and this is nt biased by cngestin in the netwrk. In cntrast t in-netwrk mnitrs, the end hst OS has better visibility int the behavir f applicatins. In datacenters, it is pssible t augment the end hst OS; this is enabled by the single administrative dmain and sftware unifrmity typical f mdern datacenters. Use very little verhead. In cntrast, using an in-netwrk mechanism t mnitr is infeasible, even n an edge switch, and even mre s n a cre switch. Curtis A R, Kim W, Yalagandula P. Mahut: Lw-verhead datacenter traffic management using end-hst-based elephant detectin[c]//infocom, 2011 Prceedings IEEE. IEEE, 2011: 1629-1637.
Grup meeting 6/15/2017 Datacenter Traffic Classificatin Mahut algrithm: Usea shim layer in the end hsts t mnitr the scket buffers.
Grup meeting 6/15/2017 Datacenter Traffic Classificatin Simulatin parameters:
Grup meeting 6/15/2017 Datacenter Traffic Classificatin Simulatin results:
Grup meeting 6/15/2017 Machine Learning fr Traffic Flw Classificatin Machine Learning Algrithms: Naïve-Bayes (NBD, NBK) C4.5 Decisin Tree Bayesian Netwrk Naïve Bayes Tree Flw and Feature Definitins Limitatin: Packet paylad independent Transprt layer independent Cntext limited t a single flw (i.e. n feature spanning multiple flws) Simple t cmpute Williams N, Zander S, Armitage G. A preliminary perfrmance cmparisn f five machine learning algrithms fr practical IP traffic flw classificatin[j]. ACM SIGCOMM Cmputer Cmmunicatin Review, 2006, 36(5): 5-16.
Grup meeting 6/15/2017 Machine Learning fr Traffic Flw Classificatin Feature Candidates Prtcl Flw duratin Flw vlume in bytes and packets Packet length (minimum, mean, maximum and standard deviatin) Inter-arrival time between packets (minimum, mean, maximum and standard deviatin). Feature Reductin Use CFS and CON t chse features:
Grup meeting 6/15/2017 Machine Learning fr Traffic Flw Classificatin Simulatin results
Grup meeting 6/15/2017 Machine Learning fr Traffic Flw Classificatin Simulatin results
Grup meeting 6/15/2017 amlwang@ucdavis.edu