Amdahl s Law in the Datacenter Era! A Market for Fair Processor Allocation!

Size: px
Start display at page:

Download "Amdahl s Law in the Datacenter Era! A Market for Fair Processor Allocation!"

Transcription

1 Amdahl s Law in the Datacenter Era! A Market for Fair Processor Allocation! Seyed Majid Zahedi* (Duke University), Qiuyun Llull* (VMware/ Duke University), Benjamin C. Lee (Duke University)! *Equal Contributions! 1

2 Sharing in Federated Data Centers! Users pool resources in non-profit data centers! E.g., research groups within university! Users are entitled to portion of resources! Based on contributions to shared pool! 2

3 Challenges for Modern Data Centers! u 1 u 2 u 3 u 4 m 3 m 1 m 2 m 4 m 5 m 6 m 7 m 8 m 9 m 10 Computing resources are physically distributed across servers! Users run complex jobs with diverse characteristics! Users jobs are assigned to different servers! Users prefer specific allocations on specific servers! 3

4 <latexit sha1_base64="j7ox3knopfj/72fcwzryw2vyiqw=">aaab7nicbva9swnbej3zm8avqkxnyhbiey42ahewsuzam4hkchubvwtj3t65oyeei/krnhyqtv4eoztlf4abj0ithww83pthzl6qsghqdt+dldw19y3n3fz+e2d3b79wchhn4lqz7rfyxrozumolunxdgzi3e81pfejecabxe7/xwlursbrfycl9ipaucawjakvm2hgl8xh81iku3bi7bvkmltkpvkn9+wsaap3cr7sbszticpmkxrqqboj+rjukjvko304ntygb0b5vwapoxi2fte8dkvordekya1skyvt9pzhryjhhfnjoiglflhot8t+vlwj46wdcjslyxwalwlqsjmnkedivmjouq0so08leslifasrqrps3ivqwx14m3nn5quzwbrhvmcehx3acjajabvthbmrgaqmjj/aml8698+s8om+z1hvnpnmef+c8/wcteziy</latexit> sha1_base64="nhpxp/whim54upyttrhtmdvdgvm=">aaab7nicbvdlsgnbeoz1lrhfuy9ebomql2fxepuw8oixadcekixmtmatiboz6zyescqf4cwdile/x5s/ih6gk8dbewsaiqpuurvcldolxfftwvldw9/i5tclw9s7u3vf/ym7lrhjqe8snshmibxltfbfm81pm5uuxygnjxbwpfebd1qqlohbpuxpeooeybejwfupatqspb6ptzvfkltxp0dlxjutuhxvv7/yufnap/jr7ibexfrowrfslc9ndzbhqrnhdfrog0vttaa4r1uwchxtfwtte0foxcpdfcxsltboqv6eyhcs1daobwemdv8tehpxp69ldhqzzeykrlnbzosiw5fo0or51gwses2hlmaimb0vkt6wmggbucgg4c2+vez8s8pvxa3bmkowqx6o4bjk4mefvoegauadaq6p8awvzr3z5lw6b7pwfwc+cwh/4lz/apwckmi=</latexit> <latexit sha1_base64="q+ghma6roe0s8xkilzzqvzorxay=">aaab8nicbvbnswmxfhxbv2r9qvbojvget2xxi3orepfywbwf7lkyabyntbjlkhxluvbxepgg4tvf40hwr3gy2/agrqobyey93msildntxpflka2srq1vldcrw9s7u3vv/ynbnwskuj8kpfgdcgvkmas+yybttqoofhgn7wh0wfjto6o0s+sngac0fhggwcwinlykaohnmirz+x6b9kp1t+fogzajnyf1zu3z+weawr3qr9bpscaonirjrbuem5owx8owwumkemsappim8ib2lzvyub3m08wtdgyvpootzz80akr+3six0hosijtzznslxih+53uze5+hoznpzqgks0nxxpfjufea6jnfiefjszbrzgzfzigvjsbwvleleitfxib+aeoi4v7bmpowqxko4qhowimzamivtmahaik8wjo8ojnz5lw6b7prkjpfqcefoo8/cycuaa==</latexit> <latexit sha1_base64="nux5he22fct/qypbk6q4xzzpfdc=">aaab8nicbvdlssnafl2pr1pf1s7ddbbbhztejboruhfzwdhce8pkommhtizhzikwki0f4cafilu/xoxgh/gnrpy0xwjrgyhdofdyz5wg4uxp2/60skvlk6tr5fxkxubw9k51d+9gxakk1cuxj2unwipyjqirmea0k0iko4dtdjc6kpz2lzwkxejajxpqr3ggwmgi1kbyvajryrbmdz2w96p1u2fpgbajmyp1zu3j+/4rp271qu9epyzpriumhcvvdexe+xmwmhfo84qxkppgmsid2jvu4igqp5tkztghufoojkv5qqoj+nsjw5fs4ygwk0vgne8v4n9en9xhmz8xkasacji9fkyc6rgvbaa+k5ropjyee8lmvksgwgkitu0vu4iz/+vf4p40zhv2lsmjcvouyr8o4agcoiumxeilxccqwam8wbovwo/wi/u6hs1zs50a/ih19gomzzx7</latexit> <latexit sha1_base64="8xlx2t6xkelwxp5icggjlb6c8io=">aaab6xicbvbnt8jaej36ieah4thlrmliirqv6o3ei0cmvkigidtlwjzst83u1oq0/aqvhtr49r9589+4bq4kvmssl/dmmjmvsaxxxnw/ny3nre2d3djeubj/chhupa496irtdd2wiet1aqprcime4uzgl1vi40bgn5jcfn73czxmixww0xt9meash5xry6xoimjhte423dniomkusb1vq3raanaevr8go4rlmurdbnw633rt4+dugc4ezsqdtgnk2yrg2ldu0hi1n89pnzfzq4ximchb0pc5+nsip7hw0ziwnte1y73qfej/xj8z4bwfc5lmbivblaozquxcir/jictkrkwtouxxeythy6oomzadsg2hufryoveugzcn996gusrroasncayx0iqramedtmedbhe8wyu8ocj5cd6dj0xrhrocoye/cd5/aoq3jn0=</latexit> <latexit sha1_base64="qbntltonwrwpndkmmf1zvmr42lc=">aaab6xicbvbns8naej3ur374uevrs7aihqqkxtrbwyvhso0ttkfstpn06wytdjdccf0jxjyoepuhcd78n24/dtr6yodx3gwz84kum6ud59sqbgxube8us+xk7t7+qfww9qcstfl0amit2q2iqs4eepppjt1uiokdjp1gfdpzo48ofuvevz6k6mckeixklggjtfsrdqp1p+hmya8td0nqzvql3sqdf7yg1a/+mkfzjejttptquu6q/zxizsjhabmfkuwjhzmie4ykeqpy8/mpu/vukem7tkqpoe25+nsij7fskzgwnthri7xqzct/vf6mwys/zylnnaq6wbrm3najpfvbhjkjvpojiyrkzm616yhiqrvjp2xccfdfxifereo64dyzmjqwqbgo4qtowivlamittmadche8wqu8wtx6tt6s90vrwvrohmefwb8/wdopkq==</latexit> <latexit sha1_base64="qbntltonwrwpndkmmf1zvmr42lc=">aaab6xicbvbns8naej3ur374uevrs7aihqqkxtrbwyvhso0ttkfstpn06wytdjdccf0jxjyoepuhcd78n24/dtr6yodx3gwz84kum6ud59sqbgxube8us+xk7t7+qfww9qcstfl0amit2q2iqs4eepppjt1uiokdjp1gfdpzo48ofuvevz6k6mckeixklggjtfsrdqp1p+hmya8td0nqzvql3sqdf7yg1a/+mkfzjejttptquu6q/zxizsjhabmfkuwjhzmie4ykeqpy8/mpu/vukem7tkqpoe25+nsij7fskzgwnthri7xqzct/vf6mwys/zylnnaq6wbrm3najpfvbhjkjvpojiyrkzm616yhiqrvjp2xccfdfxifereo64dyzmjqwqbgo4qtowivlamittmadche8wqu8wtx6tt6s90vrwvrohmefwb8/wdopkq==</latexit> <latexit sha1_base64="j7ox3knopfj/72fcwzryw2vyiqw=">aaab7nicbva9swnbej3zm8avqkxnyhbiey42ahewsuzam4hkchubvwtj3t65oyeei/krnhyqtv4eoztlf4abj0ithww83pthzl6qsghqdt+dldw19y3n3fz+e2d3b79wchhn4lqz7rfyxrozumolunxdgzi3e81pfejecabxe7/xwlursbrfycl9ipaucawjakvm2hgl8xh81iku3bi7bvkmltkpvkn9+wsaap3cr7sbszticpmkxrqqboj+rjukjvko304ntygb0b5vwapoxi2fte8dkvordekya1skyvt9pzhryjhhfnjoiglflhot8t+vlwj46wdcjslyxwalwlqsjmnkedivmjouq0so08leslifasrqrps3ivqwx14m3nn5quzwbrhvmcehx3acjajabvthbmrgaqmjj/aml8698+s8om+z1hvnpnmef+c8/wcteziy</latexit> sha1_base64="nhpxp/whim54upyttrhtmdvdgvm=">aaab7nicbvdlsgnbeoz1lrhfuy9ebomql2fxepuw8oixadcekixmtmatiboz6zyescqf4cwdile/x5s/ih6gk8dbewsaiqpuurvcldolxfftwvldw9/i5tclw9s7u3vf/ym7lrhjqe8snshmibxltfbfm81pm5uuxygnjxbwpfebd1qqlohbpuxpeooeybejwfupatqspb6ptzvfkltxp0dlxjutuhxvv7/yufnap/jr7ibexfrowrfslc9ndzbhqrnhdfrog0vttaa4r1uwchxtfwtte0foxcpdfcxsltboqv6eyhcs1daobwemdv8tehpxp69ldhqzzeykrlnbzosiw5fo0or51gwses2hlmaimb0vkt6wmggbucgg4c2+vez8s8pvxa3bmkowqx6o4bjk4mefvoegauadaq6p8awvzr3z5lw6b7pwfwc+cwh/4lz/apwckmi=</latexit> <latexit sha1_base64="qkxjkfrzggvhx519ka1nvrqmzka=">aaab8nicbvbns8nafhzxs9avao9efovgqsre1fvbi8ckxhaaujbbl3bpzhn2n0ijbx+ffw8qxv01hgt/iic3bq/aorawzlzhm50wfvwb1/1yvlbx1jc2s1vl7z3dvf3kwegdtjlf0gejsfq7pbofl+gbbgs2u4u0dgw2wtfv4bfuuwmeyfsztreb04hkewfuwckiymqgyzrjj096lzpbd6cgy8sbk1qj+vn9aadnxuuj6ccsi1eajqjwhc9nttenynamcfiomo0pzsm6wi6lksaou/k084scwkvpoktzjw2zqr83chprpy5do1lk1iteif7ndtitxxrzltpmogszq1emieliuqdpc4xmilellclusxi2pioyy2sq2xk8xs8ve/+sfll3b2wzdzihbedwdkfgwtk04bqa4aodfb7hgv6czhlyxp232eikm9+pwh847z9wlprv</latexit> <latexit sha1_base64="pomo6cyeo/pqmhvbduxnjuw3ifi=">aaab8nicbvdlssnafl3xweur2qwbybfcsendqlucg5cvjc00ouymn+3qystmtiqssvej3lhqcevxubd8a7/blzo2c209mha4517umrmkncntoj/w0vlk6tp6aao8ubw9s1vz279vcsopujtmsewercfnal3nnmdoipfeacd2mlos/pydssvicaphcforgqgwmkq0ktwvinoyhbn2wn6r1jy6m4g9sbozumtwp77vv/ktvq/y7vvjmkyonoveqw7dsbsfeakz5zixvvrhquiidlbrqcarkj+bzm7ti6p07tcw5gltt9tfgxmjlbphgzksmqp5rxd/87qpds/9jikk1sjo9fcyclvhdlga3wcsqezjqwivzgs16zbiqrwpqwxkamx/ezg4p/wlunntymjcfcu4gem4hgacqrouoauuuejgaz7g2uqtr+vfep2ollmznsr8gfx2a2/bleg=</latexit> <latexit sha1_base64="pomo6cyeo/pqmhvbduxnjuw3ifi=">aaab8nicbvdlssnafl3xweur2qwbybfcsendqlucg5cvjc00ouymn+3qystmtiqssvej3lhqcevxubd8a7/blzo2c209mha4517umrmkncntoj/w0vlk6tp6aao8ubw9s1vz279vcsopujtmsewercfnal3nnmdoipfeacd2mlos/pydssvicaphcforgqgwmkq0ktwvinoyhbn2wn6r1jy6m4g9sbozumtwp77vv/ktvq/y7vvjmkyonoveqw7dsbsfeakz5zixvvrhquiidlbrqcarkj+bzm7ti6p07tcw5gltt9tfgxmjlbphgzksmqp5rxd/87qpds/9jikk1sjo9fcyclvhdlga3wcsqezjqwivzgs16zbiqrwpqwxkamx/ezg4p/wlunntymjcfcu4gem4hgacqrouoauuuejgaz7g2uqtr+vfep2ollmznsr8gfx2a2/bleg=</latexit> Management Properties! Work Conservation! Never leave servers idle if there are unsatisfied user demands! Sharing Incentives! Guarantee users at least the utility from their entitlements! u i ( ) u <latexit sha1_base64="nux5he22fct/qypbk6q4xzzpfdc=">aaab8nicbvdlssnafl2pr1pf1s7ddbbbhztejboruhfzwdhce8pkommhtizhzikwki0f4cafilu/xoxgh/gnrpy0xwjrgyhdofdyz5wg4uxp2/60skvlk6tr5fxkxubw9k51d+9gxakk1cuxj2unwipyjqirmea0k0iko4dtdjc6kpz2lzwkxejajxpqr3ggwmgi1kbyvajryrbmdz2w96p1u2fpgbajmyp1zu3j+/4rp271qu9epyzpriumhcvvdexe+xmwmhfo84qxkppgmsid2jvu4igqp5tkztghufoojkv5qqoj+nsjw5fs4ygwk0vgne8v4n9en9xhmz8xkasacji9fkyc6rgvbaa+k5ropjyee8lmvksgwgkitu0vu4iz/+vf4p40zhv2lsmjcvouyr8o4agcoiumxeilxccqwam8wbovwo/wi/u6hs1zs50a/ih19gomzzx7</latexit> i ( ) x i e i 4

5 Management Questions! How can we model users demands for processors?! How can we fairly allocate processors?! Fig. downtownrecruiting.com! 5

6 Roadmap! Model user utilities! Operationalize Amdahl s Law for data center workloads! Propose Amdahl utility using Karp-Flatt metric! Design market mechanism! Design market for processor allocation! Propose Amdahl bidding procedure using closed-form equations! Find market equilibrium to guarantee fair division! Conclude! 6

7 Amdahl s Law! [G. Amdahl 1967]! Architects use it to estimate upper bounds on speedups! Speedup(x) = T 1 T x = T 1 (1 F )T 1 + F T 1 x = x x(1 F )+F 7

8 <latexit sha1_base64="f9zvgm5q2uujumfw66uwofcxkic=">aaacaxicbvdlsgnbeoynrxhfuu+ih8egjihh14t6uakb4dgcawljemyns8mq2qczs5kwbc/+ihcpkl79c2/e/bqnj4mmfjquvd10d7krz1kz5perwlhcwl5jr2bw1jc2t7lbo3cyjawhngl5kooulpszgnqkku7rkadydzmtub3yyk/duyfzgnyqquqdh3cc5jgclzza2t2z7xfqjwp6apokp0z6eeukujiudfvznfk0x0dzxjqsxonqopwnanvw9rpzdkns00arjqvswgaknaqlxqinw0wzljtcpic7tkfpgh0qnwt8whadaawnvfdochqaq78neuxlofbd3elj1zwz3kj8z2veyjt3ehzesaibmszyyo5uiez5odytlcg+0aqtwfstihsxdkpp1di6bgv25xlinxyviuandqmee6rhhw4hdxacqqmuoqo2ehiaj3ibv+predbejpdja8qyzuzchxgfp08slyy=</latexit> sha1_base64="1dlfmwgm00v+igotyomlaizgrte=">aaacaxicbvdlssnafj34rpuvdswkdbahrsxjn+pckrskywrwftpqj9njo3qyctmtaqnbjb/ixowkw//cnr/gfzh9llt1wixdofdy7z1uykhulvvlzm0vlc4tp1bsq2vrg5vm1vatdckbsruhlbb1f0nckcdvrruj9vaq5lum1nxeaejx7omqnoa3ahasx0cdtj2kkdjsy9yv2x4oxscmjxco+0ncz9on5dxxowmzgstvjqbnit0hmellfum7chbxazmfzxaai59whrmssmfboxjijbtfjctpzirjihapduhdu458ip149eicj7tshl4gdhefr+rvirj5ug58v3f6shxltdcu//makflonjjymfke4/eil2jqbxcyb2xtqbbia00qfltfcnex6tcuti2tq7cnx54l1ul+pg9d6zckyiwu2aohiatscaqk4apuqbvg8acewat4nr6nz+pneb+3zhmtmr3wb8bhd0awl9s=</latexit> What Portion of Code is Parallelizable?! [Allen Karp and Horace Flatt 1990]! Expert programmers may not know! Fortunately, we can measure speedup! Measure! Set! Calculate! s(x) = x x(1 F )+F Karp-Flatt! 1 1 F = 1 1?! Metric! s(x) x 8 Fig. criticallyrated.files.wordpress.com

9 Karp-Flatt Metric in Practice! decision correlation gradient svm F Number of Cores F Number of Cores Number of Cores F Number of Cores For many Spark and PARSEC workloads, Karp-Flatt has low variance! Abundant, fine-grained parallelism! Few serial bottlenecks! Constant Karp-Flatt metric indicates accuracy of Amdahl s Law! 9

10 Limitations of Karp-Flatt Metric! Graph processing: overhead increases with x Small dataset: limited parallelism Var(F) Parallel Fraction (Variance) Application ID Multi-threaded: Scheduling and intensive inter-thread communication overhead There are some exceptions! High correlation between serial and parallel portion! High scheduling and inter-thread communication! Very limited parallelism! 10

11 Amdahl s Law in Practice! Execution Time Prediction Accuracy for Decision Tree Execution Time (s) Measured Estimated Core Allocation Measured performance tracks estimated speedup! Amdahl s Law can drive processor allocation! 11

12 !!! <latexit sha1_base64="ifpnje6nxycxowpjq/81wmswyco=">aaacphicbvc7tsmwfl3htxkvgggwqeifoupygagpgourjapitykc16fu7ssyhacywrfwhsz8axsbcwmgvmbclihxlax7do65uj4ntdlt2nufnahhkdgx8ynj0tt0zoxcex7hrcwzjlroep7isxaryllm65ppts9ssbeiot0nu/v9+ekllyol8bhupbqp8exmikawtlrqps4cvvef1u0wmtcby9frlvijiynxvsyc09n18nmj8qvase5+o4pwus7aev6vka/kq27vlqr9bd4xwk3tbszfasbhuh7wwwnjbi014viphuemummw1ixwmpf8tneuky6+oa0lyyyoaprcfy7wlnncusltizuq2o8bbguleik0yr5p9xvwj/+bntidbtcni9nm05gmdkuzrzpb/shri0lkno9zgilk9q+itlgnttvaszye77flv6c+wd2pukc2jbomagkwyauq4mew1oaadqeobo7gcv7g1bl3np03530ghxk+dhbhrzkfnxnxsrc=</latexit> sha1_base64="dhojcdhqws/7sorj6lncsqykgxc=">aaacphicbvc7tsmwfhv4lviqmmjgusevhiphaqakchbgiruuqsmr4zrfycer7uark/wm/ailc//axsbcaiivgtdlanorwffo3hn1fy4fmyqvbt9by+mtk1ptuzn87nz8wmjhafleronapi4jfolth0ncaejqiipgtmnbepczafixh/1544oisaowpnoxaxhucwlamvkg8gq1xkmllyn17ge669f0e+5dnxaia1cm3nmx+056pnl67wl6kd7irjw6wdtmr4psr1c0y3zwcbg4p6byodhau727b1s9wppbjndcsagwq1i2httwly2eopirno8mksqix6ioaroyik5ks2fuu7hhmdymimfeqgdg/t7qievz475r9n3k/7m+owrwtfsw29i0jbnfqjw4fcqmqgj2o4rtkghwrgcawokav0j8jkx0ygsenye4/y0pg/p2ea9sh5swkmbqobak1kejogahvmarqii6woabvia38g49wq/wh/u5ki5zpzsr4e9zx99d0brq</latexit> <latexit sha1_base64="fb45zasxggtvlwmazb5mikaw550=">aaaclhicdvbdsxtbfl1rbzumh27toy+dorapdbsxgh0odsjiy6rnfziqzid3dxt2g5m7krdsh+pl/4oifvdx1d/hbnzcw9odwz1zzr3m3bokshryvgtn6cny02fpay/ql1+9frpivl39zpjmcxyircx6ooaglyxxqjiuhqcaerqoparod0v/6ak1kun8leypjin+estqck5wmrh7i8iz5v9sxgmwfpncnhxn2ajsfbqfmou8uhx5/qkyd6zpf6z4xqm1crtea2d7q93zyl7l87p+2y9ju9vz7ddfkiuavc97llj0j+7vajqilmkyholgdh0vpxhonumhskipmompf+f8biewxjxcm84x2xbsvvwmley0ptgxhfr7rm4jy+zrydsjtqfmb68u/+unmwq3x7mm04wwftvdyayyjaymjk2lrkfqbgkxwtq/mnhkburka67beh5tyv5pbu3wtss7tgh0oein1madmubdf3pwah0ygidvcanxcop8ch46t85d1brkpm68gz/g3d8a1tgqpa==</latexit> <latexit sha1_base64="e1owbqg67/jcchk0grg0zs3mkxg=">aaaclhicdvbdsxtbfj31qzatmuqjl4nsiihhdw1jfbadsvhr0kafjitzyd1kdpadmbtiwpbf+oslf0ukfbdfv/+f0nmsgoo9mnwz59zlzd1elivg276zpqznzuc+zh8sffq8slhu/rj8rknecwjzsebq1gmapaihjqilnmykwobjopho93p/5akuflh4e8cx9ai2diuvoemj9cshxyrlth/eaimkzvqpomsql5oysdv1fenpccvsb5nkn2nf2sr4xppvl6/b1z1m3a3vqv217ybjojlxg7xtgnwmkmo9txew9hg1u3xul//qdikebbail0zrjmph2euzqselzkvuoifm/jwnownoyalqvxsybua/gmva/uizeykdqc8nuhzopq480xkwhom3xi6+53us9ju9virxghdy4ie/krqjmkdhb0ibrzk2hhelzf8phzetezqasyae503p/0nbre5u7e8mjbypme9wyrqpeic0sisckipsjpxck1tyr/5yn9zv6691x7rowu8zk+qvrid/riasdg==</latexit> <latexit sha1_base64="e1owbqg67/jcchk0grg0zs3mkxg=">aaaclhicdvbdsxtbfj31qzatmuqjl4nsiihhdw1jfbadsvhr0kafjitzyd1kdpadmbtiwpbf+oslf0ukfbdfv/+f0nmsgoo9mnwz59zlzd1elivg276zpqznzuc+zh8sffq8slhu/rj8rknecwjzsebq1gmapaihjqilnmykwobjopho93p/5akuflh4e8cx9ai2diuvoemj9cshxyrlth/eaimkzvqpomsql5oysdv1fenpccvsb5nkn2nf2sr4xppvl6/b1z1m3a3vqv217ybjojlxg7xtgnwmkmo9txew9hg1u3xul//qdikebbail0zrjmph2euzqselzkvuoifm/jwnownoyalqvxsybua/gmva/uizeykdqc8nuhzopq480xkwhom3xi6+53us9ju9virxghdy4ie/krqjmkdhb0ibrzk2hhelzf8phzetezqasyae503p/0nbre5u7e8mjbypme9wyrqpeic0sisckipsjpxck1tyr/5yn9zv6691x7rowu8zk+qvrid/riasdg==</latexit> Amdahl Utility! Measures normalized progress across servers! Work completed for user i Speedup ij (x ij )= in unit of time on one core of server j u i (x i )= P m j=1 w ij s ij (x ij ) P m j=1 w ij x ij F ij +(1 F ij )x ij 12

13 Roadmap! Model user utilities! Operationalize Amdahl s Law for data center workloads! Propose Amdahl utility using Karp-Flatt metric! Design market mechanism! Design market for processor allocation! Propose Amdahl bidding procedure using closed-form equations! Find market equilibrium to guarantee fair division! Conclude! 13

14 !!! <latexit sha1_base64="jv1doaxj8l0xdk8fgkjhper2gs4=">aaacyxicbvhbbtqwehuchrkgtcuxf4svvueosnqbhpbw4skxscyttf4ixzvzurwd1j6gjaz8jddoxpgqvnsvki0jwxp6741m5rlqlxsy5z+j+mhdruept58kt58939ln9/a/uqazaiaiuy09r7gdjq1mukkc89yc15wcs+rq40o/+w7wycz8wb6fmeyli2spoaaqthtwwuiad9egw8v7n0ncdxncer3my2ygh7qr5rhthc+q2i+hur5+y9hfk8twbwku06w//fam37welqwxl0mbiieybde0kmxcwmxvzsntuz7l66l3qbebi7kp0zl9wean6dqyfio7ny3yfmeew5rcwzcwzkhlxrvfwdrawzw4mv9hnnbxgznturhhgarr9nah59q5xlfbutrv3dvw5p+0ayf1+5mxpu0qjlgzvhekyknxedo5tcbq9qfwywxylyolbrna8ctjckg4e/j9mdnotrl88/fopn6ksu0oyetyraryjozjj3jkjksqx9fwtbptrr/jje7j/rtrhg16xpb/kj74awrzto4=</latexit> Market for Fair Allocation! Users receive budgets in proportion to their entitlements! Market sets prices for processors on each server! Users demand processors that maximize their utility! max. u i (x i ), s.t. mx x ij p j apple b i j=1 At equilibrium prices, market clears! nx x ij = C j i=1 14

15 !! <latexit sha1_base64="tougdpwschzghr9mig6oc5yuu8c=">aaacknicbvc7tsmwfl3htxkvgfksebifquq6afsfc2orkcc1ves4dhicxng3qbwvnt9h4vcyyickhyepwu078dqspanz7th1pygswqdr9pyr0bhxicmp6clm7nz8qnfx6dgkqwa8zhkz6noagi5fzosoupjtptmnaslpgsv9vn9yzbursxyehcvbet2lrsgyrsv5xb3az8rfdwo3vbjpkp0otejtxgnmwty5u37w90vdu5tcmipa7tbxkvnfnbfs5ib/itcka9xk/eyhant84noznba04jeysy1pek7cvky1ciz5t9bmdveuxdiz3ra0phe3rsy/tuvwrdimyalti5hk6vderinjolfgjyok5+a31xf/8xophjuttmqqrr6zwaiwlct20s+otixmdgxhesq0sh8l7jxqytdww7alel9p/kvqlfju2t20zvrhgclygvxyaa+2oqohuim6mhiaj3ifn+frexf6zvtgdmqzzpbhb5zplwoxqpq=</latexit> <latexit sha1_base64="2pvliw9qwwuqkqnkowepn0wy+9u=">aaacknicbvbnswmxfmz6wetx1ztegijyhblbi3orevgoyk3qliwbzm1sdjcmb9wythf/ir78kx70oexw5a8x3fagrqobyeynl288kbgg2+5ze5nt0zozmbns/mli0njuzfvcr7girewjeallj2gmemjkwegws6kyctzbkl7ruo9xbpnsparpos1zpsbxifc5jwakn3fkuqm/7uzanpphnakicrgu6rsfiz86xekmft/f6d6lgh4e7oe5fn7nbdkfowuej86qbjwkd7sf619pp27utdaiabywekggwlcdw0i9iqo4faytrcwasujb5ipvdq1jwhq9sw/t4g2jnlafkfncwkn6o5gqqot24jnjgebtj3p98t+vgon/ue94kgngir0s8mobtr/94ncdk0zbta0hvhhzv0ybrbekpt6skcezpxmcliufw4j9zsoooqeyaantoh3koh1uqifofjurry/obb2jd+vzern61udgdmiaztbqh1jfpyp9rc0=</latexit> <latexit sha1_base64="2pvliw9qwwuqkqnkowepn0wy+9u=">aaacknicbvbnswmxfmz6wetx1ztegijyhblbi3orevgoyk3qliwbzm1sdjcmb9wythf/ir78kx70oexw5a8x3fagrqobyeynl288kbgg2+5ze5nt0zozmbns/mli0njuzfvcr7girewjeallj2gmemjkwegws6kyctzbkl7ruo9xbpnsparpos1zpsbxifc5jwakn3fkuqm/7uzanpphnakicrgu6rsfiz86xekmft/f6d6lgh4e7oe5fn7nbdkfowuej86qbjwkd7sf619pp27utdaiabywekggwlcdw0i9iqo4faytrcwasujb5ipvdq1jwhq9sw/t4g2jnlafkfncwkn6o5gqqot24jnjgebtj3p98t+vgon/ue94kgngir0s8mobtr/94ncdk0zbta0hvhhzv0ybrbekpt6skcezpxmcliufw4j9zsoooqeyaantoh3koh1uqifofjurry/obb2jd+vzern61udgdmiaztbqh1jfpyp9rc0=</latexit> <latexit sha1_base64="57tnqvnpae4kca3itbcl07yl4a4=">aaacdhicbzc7tgjbfibp4g3xhlrateqtbgdxri1isggsmrehadzmdgmmzm5uzmznygzlgxtfxczcjymvd2dn2zhccgx/zjiv/zknz87vhzwpbdvfvmppewv1lb2e2djc2t7j7u7dqccshnziwapz8lcinala00xz2gglxb7had0bvsb1+h2vigxiwo9c2vzxt7aui1gby80ehe4gr09qcbvu5lsxkznjbsws5bkejkiiku7azebsgj0rwgrnbrly7r74aqbvn/vv6gqk8qnqhgolmo4d6nampwae0yttihqnmrnihm0afninqh1prknqsxe6qbti84rge/f3rix9pua+zzp9rptqvjy2/6s1i909b8dmhjgmgkwxdsoodidg0aaok5ropjkaiwtmr4j0scremwazjgrn/urfqj0wlgr2lzmrl2gqnbzaietbgtmowyvuoqyehuajxudversertfrfdqasmyz+/bh1ucpbn+b/q==</latexit> sha1_base64="ighuhzc5fdxkcyiwumqqotcz/vg=">aaacdhicbzc7tsmwfiydrqxcaowsfggplg3cagyvinvhlbkhldosoa7bunwcyhaqqigjcwuvwsiacayghocnt8g9dndys5y+/ecchz8/ibmvyra/jyxfpewv1dxafn1jc2vb3nm9lleimpfwxcjrd5akjhlikaoyqcecodbgpbymkqn67zyissn+pyyxaywoy2mhyqs05zuhsd8vqgnyhk2zhh5ky052k/imbpr7gszbit/3tcsu2mpbexcmylnwxemd9n+rvvnvbec4cqlxmcepg44dq1akhkkykszftcsjer6glmlo5cgkspwor8ngkxbasbmj/bicy/f3ripckydhodtdphpytjyy/6s1etu5a6wux4kihe8wdrigvqrh0ca2fqqrntsaskd6rxd3keby6qdzogrn9ur58e6k50x70rfcf0yua/vgabsaa06bcy5afxgag3vwcj7bi/fgpbmvxtukdcgyzuybpzi+fwayi52m</latexit> Amdahl Bidding Procedure! Users iteratively bid for processors using closed-form equation! q b ij (t + 1) / f ij p j (t) w ij s ij (x ij (t)) Market sets prices based on bids! p j (t) = P n i=1 b ij/c j Iterate until prices are stationary! 15 Fig. searchengineland.com

16 ! Properties of Amdahl Bidding Procedure! Allocations are work-conserving! Market guarantees sharing incentives! Users bid truthfully in large, competitive systems! Market does all of these with low overhead! 16

17 Mechanisms for Evaluation! Proportional Sharing (PS)! Allocate cores in proportion to entitlements on each server! Upper-Bound (UB)! Allocate cores to maximize system progress! 17

18 Sharing Incentives! Normalized Performance PS AB UB Class 1 Class 2 Class 3 Class 4 Class 5 PS provides SI by definition! UB treats users unfairly, starves users with low entitlements! AB provides SI with market equilibrium, low overhead! 18

19 System Performance! Normalized Performance PS AB UB 4 App/Ser 8 App/Ser 12 App/Ser 16 App/Ser 20 App/Ser 24 App/Ser PS ignores demands! UB achieves highest performance! AB outperforms PS, low overhead! 19

20 Computational Overheads! Updating bids: 0.1ms! Termination check and setting prices: 0.85ms! Convergence rate of 10 iterations, on average! Overhead of 12.5ms with 0.25ms network delay! 20

21 Summary and Future Direction! Amdahl utility measures progress using Karp-Flatt metric! Amdahl bidding procedure finds market equilibrium! Allocations are work-conserving! Market guarantees sharing incentives! Users bid truthfully in large, competitive systems! 21

22 Thank You! 22

Amdahl s Law in the Datacenter Era: A Market for Fair Processor Allocation

Amdahl s Law in the Datacenter Era: A Market for Fair Processor Allocation Amdahl s Law in the Datacenter Era: A Market for Fair Processor Allocation Seyed Majid Zahedi, Qiuyun Llull, Benjamin C. Lee Duke University, Durham, NC, USA VMware Inc., Palo Alto, CA, USA {seyedmajid.zahedi,

More information

Example: CPU-bound process that would run for 100 quanta continuously 1, 2, 4, 8, 16, 32, 64 (only 37 required for last run) Needs only 7 swaps

Example: CPU-bound process that would run for 100 quanta continuously 1, 2, 4, 8, 16, 32, 64 (only 37 required for last run) Needs only 7 swaps Interactive Scheduling Algorithms Continued o Priority Scheduling Introduction Round-robin assumes all processes are equal often not the case Assign a priority to each process, and always choose the process

More information

Parallel Programming with MPI and OpenMP

Parallel Programming with MPI and OpenMP Parallel Programming with MPI and OpenMP Michael J. Quinn (revised by L.M. Liebrock) Chapter 7 Performance Analysis Learning Objectives Predict performance of parallel programs Understand barriers to higher

More information

Performance analysis. Performance analysis p. 1

Performance analysis. Performance analysis p. 1 Performance analysis Performance analysis p. 1 An example of time measurements Dark grey: time spent on computation, decreasing with p White: time spent on communication, increasing with p Performance

More information

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Structure Page Nos. 2.0 Introduction 4 2. Objectives 5 2.2 Metrics for Performance Evaluation 5 2.2. Running Time 2.2.2 Speed Up 2.2.3 Efficiency 2.3 Factors

More information

The typical speedup curve - fixed problem size

The typical speedup curve - fixed problem size Performance analysis Goals are 1. to be able to understand better why your program has the performance it has, and 2. what could be preventing its performance from being better. The typical speedup curve

More information

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Multi-Core Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 4 " Due: 11:49pm, Saturday " Two late days with penalty! Exam I " Grades out on

More information

Introduction to parallel computing

Introduction to parallel computing Introduction to parallel computing 3. Parallel Software Zhiao Shi (modifications by Will French) Advanced Computing Center for Education & Research Vanderbilt University Last time Parallel hardware Multi-core

More information

Mark Sandstrom ThroughPuter, Inc.

Mark Sandstrom ThroughPuter, Inc. Hardware Implemented Scheduler, Placer, Inter-Task Communications and IO System Functions for Many Processors Dynamically Shared among Multiple Applications Mark Sandstrom ThroughPuter, Inc mark@throughputercom

More information

Computer Architecture: Parallel Processing Basics. Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Parallel Processing Basics. Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Parallel Processing Basics Prof. Onur Mutlu Carnegie Mellon University Readings Required Hill, Jouppi, Sohi, Multiprocessors and Multicomputers, pp. 551-560 in Readings in Computer

More information

Outline. CSC 447: Parallel Programming for Multi- Core and Cluster Systems

Outline. CSC 447: Parallel Programming for Multi- Core and Cluster Systems CSC 447: Parallel Programming for Multi- Core and Cluster Systems Performance Analysis Instructor: Haidar M. Harmanani Spring 2018 Outline Performance scalability Analytical performance measures Amdahl

More information

Datacenter Simulation Methodologies Case Studies

Datacenter Simulation Methodologies Case Studies This work is supported by NSF grants CCF-1149252, CCF-1337215, and STARnet, a Semiconductor Research Corporation Program, sponsored by MARCO and DARPA. Datacenter Simulation Methodologies Case Studies

More information

Balancing Fairness and Efficiency in Tiered Storage Systems with Bottleneck-Aware Allocation

Balancing Fairness and Efficiency in Tiered Storage Systems with Bottleneck-Aware Allocation Balancing Fairness and Efficiency in Tiered Storage Systems with Bottleneck-Aware Allocation Hui Wang, Peter Varman Rice University FAST 14, Feb 2014 Tiered Storage Tiered storage: HDs and SSDs q Advantages:

More information

Analyzing Performance Asymmetric Multicore Processors for Latency Sensitive Datacenter Applications

Analyzing Performance Asymmetric Multicore Processors for Latency Sensitive Datacenter Applications Analyzing erformance Asymmetric Multicore rocessors for Latency Sensitive Datacenter Applications Vishal Gupta Georgia Institute of Technology vishal@cc.gatech.edu Ripal Nathuji Microsoft Research ripaln@microsoft.com

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

1.3 Data processing; data storage; data movement; and control.

1.3 Data processing; data storage; data movement; and control. CHAPTER 1 OVERVIEW ANSWERS TO QUESTIONS 1.1 Computer architecture refers to those attributes of a system visible to a programmer or, put another way, those attributes that have a direct impact on the logical

More information

COL862: Low Power Computing Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques

COL862: Low Power Computing Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques COL862: Low Power Computing Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques Authors: Huazhe Zhang and Henry Hoffmann, Published: ASPLOS '16 Proceedings

More information

Scheduling. Multiple levels of scheduling decisions. Classes of Schedulers. Scheduling Goals II: Fairness. Scheduling Goals I: Performance

Scheduling. Multiple levels of scheduling decisions. Classes of Schedulers. Scheduling Goals II: Fairness. Scheduling Goals I: Performance Scheduling CSE 451: Operating Systems Spring 2012 Module 10 Scheduling Ed Lazowska lazowska@cs.washington.edu Allen Center 570 In discussing processes and threads, we talked about context switching an

More information

CSE 451: Operating Systems Winter Module 10 Scheduling

CSE 451: Operating Systems Winter Module 10 Scheduling CSE 451: Operating Systems Winter 2017 Module 10 Scheduling Mark Zbikowski mzbik@cs.washington.edu Allen Center 476 2013 Gribble, Lazowska, Levy, Zahorjan Scheduling In discussing processes and threads,

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe

More information

Extracting Performance and Scalability Metrics from TCP. Baron Schwartz April 2012

Extracting Performance and Scalability Metrics from TCP. Baron Schwartz April 2012 Extracting Performance and Scalability Metrics from TCP Baron Schwartz April 2012 Agenda Fundamental Metrics of Performance Capturing TCP Data Part 1: Black-Box Performance Analysis Detecting Stalls and

More information

Outline. Speedup & Efficiency Amdahl s Law Gustafson s Law Sun & Ni s Law. Khoa Khoa học và Kỹ thuật Máy tính - ĐHBK TP.HCM

Outline. Speedup & Efficiency Amdahl s Law Gustafson s Law Sun & Ni s Law. Khoa Khoa học và Kỹ thuật Máy tính - ĐHBK TP.HCM Speedup Thoai am Outline Speedup & Efficiency Amdahl s Law Gustafson s Law Sun & i s Law Speedup & Efficiency Speedup: S = Time(the most efficient sequential Efficiency: E = S / algorithm) / Time(parallel

More information

Understanding Parallelism and the Limitations of Parallel Computing

Understanding Parallelism and the Limitations of Parallel Computing Understanding Parallelism and the Limitations of Parallel omputing Understanding Parallelism: Sequential work After 16 time steps: 4 cars Scalability Laws 2 Understanding Parallelism: Parallel work After

More information

Staged Memory Scheduling

Staged Memory Scheduling Staged Memory Scheduling Rachata Ausavarungnirun, Kevin Chang, Lavanya Subramanian, Gabriel H. Loh*, Onur Mutlu Carnegie Mellon University, *AMD Research June 12 th 2012 Executive Summary Observation:

More information

Outline. Speedup & Efficiency Amdahl s Law Gustafson s Law Sun & Ni s Law. Khoa Khoa học và Kỹ thuật Máy tính - ĐHBK TP.HCM

Outline. Speedup & Efficiency Amdahl s Law Gustafson s Law Sun & Ni s Law. Khoa Khoa học và Kỹ thuật Máy tính - ĐHBK TP.HCM Speedup Thoai am Outline Speedup & Efficiency Amdahl s Law Gustafson s Law Sun & i s Law Speedup & Efficiency Speedup: S = T seq T par - T seq : Time(the most efficient sequential algorithm) - T par :

More information

Fair Network Bandwidth Allocation in IaaS Datacenters via a Cooperative Game Approach

Fair Network Bandwidth Allocation in IaaS Datacenters via a Cooperative Game Approach SUBMITTED TO IEEE/ACM TRANSACTIONS ON NETWORKING 1 Fair Network Bandwidth Allocation in IaaS Datacenters via a Cooperative Game Approach Jian Guo, Fangming Liu, Member, IEEE, John C.S. Lui, Fellow, IEEE,

More information

Multidimensional Scheduling (Polytope Scheduling Problem) Competitive Algorithms from Competitive Equilibria

Multidimensional Scheduling (Polytope Scheduling Problem) Competitive Algorithms from Competitive Equilibria Multidimensional Scheduling (Polytope Scheduling Problem) Competitive Algorithms from Competitive Equilibria Sungjin Im University of California, Merced (UC Merced) Janardhan Kulkarni (MSR) Kamesh Munagala

More information

Parallel Programming Concepts. Parallel Algorithms. Peter Tröger

Parallel Programming Concepts. Parallel Algorithms. Peter Tröger Parallel Programming Concepts Parallel Algorithms Peter Tröger Sources: Ian Foster. Designing and Building Parallel Programs. Addison-Wesley. 1995. Mattson, Timothy G.; S, Beverly A.; ers,; Massingill,

More information

Computer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015

Computer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015 18-447 Computer Architecture Lecture 27: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015 Assignments Lab 7 out Due April 17 HW 6 Due Friday (April 10) Midterm II April

More information

Managing Performance Variance of Applications Using Storage I/O Control

Managing Performance Variance of Applications Using Storage I/O Control Performance Study Managing Performance Variance of Applications Using Storage I/O Control VMware vsphere 4.1 Application performance can be impacted when servers contend for I/O resources in a shared storage

More information

Using Alluxio to Improve the Performance and Consistency of HDFS Clusters

Using Alluxio to Improve the Performance and Consistency of HDFS Clusters ARTICLE Using Alluxio to Improve the Performance and Consistency of HDFS Clusters Calvin Jia Software Engineer at Alluxio Learn how Alluxio is used in clusters with co-located compute and storage to improve

More information

Parallel Programming with MPI and OpenMP

Parallel Programming with MPI and OpenMP Parallel Programming with MPI and OpenMP Michael J. Quinn Chapter 7 Performance Analysis Learning Objectives n Predict performance of parallel programs n Understand barriers to higher performance Outline

More information

LogCA: A High-Level Performance Model for Hardware Accelerators

LogCA: A High-Level Performance Model for Hardware Accelerators Everything should be made as simple as possible, but not simpler Albert Einstein LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison

More information

Announcements. Reading. Project #1 due in 1 week at 5:00 pm Scheduling Chapter 6 (6 th ed) or Chapter 5 (8 th ed) CMSC 412 S14 (lect 5)

Announcements. Reading. Project #1 due in 1 week at 5:00 pm Scheduling Chapter 6 (6 th ed) or Chapter 5 (8 th ed) CMSC 412 S14 (lect 5) Announcements Reading Project #1 due in 1 week at 5:00 pm Scheduling Chapter 6 (6 th ed) or Chapter 5 (8 th ed) 1 Relationship between Kernel mod and User Mode User Process Kernel System Calls User Process

More information

What s New in VMware vsphere 4.1 Performance. VMware vsphere 4.1

What s New in VMware vsphere 4.1 Performance. VMware vsphere 4.1 What s New in VMware vsphere 4.1 Performance VMware vsphere 4.1 T E C H N I C A L W H I T E P A P E R Table of Contents Scalability enhancements....................................................................

More information

Chapter 13 Strong Scaling

Chapter 13 Strong Scaling Chapter 13 Strong Scaling Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction Variables

More information

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013 18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013 Readings: Multiprocessing Required Amdahl, Validity of the single processor

More information

Harnessing Multicore Hardware to Accelerate PrimeTime Static Timing Analysis

Harnessing Multicore Hardware to Accelerate PrimeTime Static Timing Analysis White Paper Harnessing Multicore Hardware to Accelerate PrimeTime Static Timing Analysis December 2009 Steve Hollands R&D Manager, Synopsys George Mekhtarian PMM, Synopsys Why Multicore Analysis? In the

More information

Parallel Programs. EECC756 - Shaaban. Parallel Random-Access Machine (PRAM) Example: Asynchronous Matrix Vector Product on a Ring

Parallel Programs. EECC756 - Shaaban. Parallel Random-Access Machine (PRAM) Example: Asynchronous Matrix Vector Product on a Ring Parallel Programs Conditions of Parallelism: Data Dependence Control Dependence Resource Dependence Bernstein s Conditions Asymptotic Notations for Algorithm Analysis Parallel Random-Access Machine (PRAM)

More information

The Computational Sprinting Game

The Computational Sprinting Game The Computational Sprinting Game Songchun Fan Seyed Majid Zahedi * Benjamin C. Lee Duke University {songchun.fan, seyedmajid.zahedi, benjamin.c.lee}@duke.edu Abstract Computational sprinting is a class

More information

Overview Computer Networking What is QoS? Queuing discipline and scheduling. Traffic Enforcement. Integrated services

Overview Computer Networking What is QoS? Queuing discipline and scheduling. Traffic Enforcement. Integrated services Overview 15-441 15-441 Computer Networking 15-641 Lecture 19 Queue Management and Quality of Service Peter Steenkiste Fall 2016 www.cs.cmu.edu/~prs/15-441-f16 What is QoS? Queuing discipline and scheduling

More information

Designing for Performance. Patrick Happ Raul Feitosa

Designing for Performance. Patrick Happ Raul Feitosa Designing for Performance Patrick Happ Raul Feitosa Objective In this section we examine the most common approach to assessing processor and computer system performance W. Stallings Designing for Performance

More information

Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion Stoica. University of California, Berkeley nsdi 11

Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion Stoica. University of California, Berkeley nsdi 11 Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion Stoica University of California, Berkeley nsdi 11

More information

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Lecture 3: Processes Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Process in General 3.3 Process Concept Process is an active program in execution; process

More information

Bottleneck Identification and Scheduling in Multithreaded Applications. José A. Joao M. Aater Suleman Onur Mutlu Yale N. Patt

Bottleneck Identification and Scheduling in Multithreaded Applications. José A. Joao M. Aater Suleman Onur Mutlu Yale N. Patt Bottleneck Identification and Scheduling in Multithreaded Applications José A. Joao M. Aater Suleman Onur Mutlu Yale N. Patt Executive Summary Problem: Performance and scalability of multithreaded applications

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology The Computer Revolution Progress in computer technology Underpinned by Moore

More information

Last Class: Processes

Last Class: Processes Last Class: Processes A process is the unit of execution. Processes are represented as Process Control Blocks in the OS PCBs contain process state, scheduling and memory management information, etc A process

More information

Multiprocessor scheduling

Multiprocessor scheduling Chapter 10 Multiprocessor scheduling When a computer system contains multiple processors, a few new issues arise. Multiprocessor systems can be categorized into the following: Loosely coupled or distributed.

More information

2 TEST: A Tracer for Extracting Speculative Threads

2 TEST: A Tracer for Extracting Speculative Threads EE392C: Advanced Topics in Computer Architecture Lecture #11 Polymorphic Processors Stanford University Handout Date??? On-line Profiling Techniques Lecture #11: Tuesday, 6 May 2003 Lecturer: Shivnath

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing This document consists of two parts. The first part introduces basic concepts and issues that apply generally in discussions of parallel computing. The second part consists

More information

15-740/ Computer Architecture Lecture 4: Pipelining. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 4: Pipelining. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 4: Pipelining Prof. Onur Mutlu Carnegie Mellon University Last Time Addressing modes Other ISA-level tradeoffs Programmer vs. microarchitect Virtual memory Unaligned

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming David Lifka lifka@cac.cornell.edu May 23, 2011 5/23/2011 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor or computer to complete

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming January 14, 2015 www.cac.cornell.edu What is Parallel Programming? Theoretically a very simple concept Use more than one processor to complete a task Operationally

More information

15-740/ Computer Architecture Lecture 20: Main Memory II. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 20: Main Memory II. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 20: Main Memory II Prof. Onur Mutlu Carnegie Mellon University Today SRAM vs. DRAM Interleaving/Banking DRAM Microarchitecture Memory controller Memory buses

More information

Determining the Number of CPUs for Query Processing

Determining the Number of CPUs for Query Processing Determining the Number of CPUs for Query Processing Fatemah Panahi Elizabeth Soechting CS747 Advanced Computer Systems Analysis Techniques The University of Wisconsin-Madison fatemeh@cs.wisc.edu, eas@cs.wisc.edu

More information

Performance evaluation. Performance evaluation. CS/COE0447: Computer Organization. It s an everyday process

Performance evaluation. Performance evaluation. CS/COE0447: Computer Organization. It s an everyday process Performance evaluation It s an everyday process CS/COE0447: Computer Organization and Assembly Language Chapter 4 Sangyeun Cho Dept. of Computer Science When you buy food Same quantity, then you look at

More information

ECE Spring 2017 Exam 2

ECE Spring 2017 Exam 2 ECE 56300 Spring 2017 Exam 2 All questions are worth 5 points. For isoefficiency questions, do not worry about breaking costs down to t c, t w and t s. Question 1. Innovative Big Machines has developed

More information

Exercise 1 Due 02.November 2010, 12:15pm

Exercise 1 Due 02.November 2010, 12:15pm Computer Architecture Exercise 1 Due 02.November 2010, 12:15pm Part 1. Case Study - Chip Fabrication Cost There are many factors involved in the price of a computer chip. New, smaller technologies give

More information

IMPLEMENTING TASK AND RESOURCE ALLOCATION ALGORITHM BASED ON NON-COOPERATIVE GAME THEORY IN CLOUD COMPUTING

IMPLEMENTING TASK AND RESOURCE ALLOCATION ALGORITHM BASED ON NON-COOPERATIVE GAME THEORY IN CLOUD COMPUTING DOI: http://dx.doi.org/10.26483/ijarcs.v9i1.5389 ISSN No. 0976 5697 Volume 9, No. 1, January-February 2018 International Journal of Advanced Research in Computer Science RESEARCH PAPER Available Online

More information

Parallel Computing Concepts. CSInParallel Project

Parallel Computing Concepts. CSInParallel Project Parallel Computing Concepts CSInParallel Project July 26, 2012 CONTENTS 1 Introduction 1 1.1 Motivation................................................ 1 1.2 Some pairs of terms...........................................

More information

Erasure Codes for Heterogeneous Networked Storage Systems

Erasure Codes for Heterogeneous Networked Storage Systems Erasure Codes for Heterogeneous Networked Storage Systems Lluís Pàmies i Juárez Lluís Pàmies i Juárez lpjuarez@ntu.edu.sg . Introduction Outline 2. Distributed Storage Allocation Problem 3. Homogeneous

More information

Why do we care about parallel?

Why do we care about parallel? Threads 11/15/16 CS31 teaches you How a computer runs a program. How the hardware performs computations How the compiler translates your code How the operating system connects hardware and software The

More information

Exploring Practical Benefits of Asymmetric Multicore Processors

Exploring Practical Benefits of Asymmetric Multicore Processors Exploring Practical Benefits of Asymmetric Multicore Processors Jon Hourd, Chaofei Fan, Jiasi Zeng, Qiang(Scott) Zhang Micah J Best, Alexandra Fedorova, Craig Mustard {jlh4, cfa18, jza48, qsz, mbest, fedorova,

More information

Analytical Modeling of Parallel Programs

Analytical Modeling of Parallel Programs Analytical Modeling of Parallel Programs Alexandre David Introduction to Parallel Computing 1 Topic overview Sources of overhead in parallel programs. Performance metrics for parallel systems. Effect of

More information

CSCE Operating Systems Scheduling. Qiang Zeng, Ph.D. Fall 2018

CSCE Operating Systems Scheduling. Qiang Zeng, Ph.D. Fall 2018 CSCE 311 - Operating Systems Scheduling Qiang Zeng, Ph.D. Fall 2018 Resource Allocation Graph describing the traffic jam CSCE 311 - Operating Systems 2 Conditions for Deadlock Mutual Exclusion Hold-and-Wait

More information

Towards Energy-Proportional Datacenter Memory with Mobile DRAM

Towards Energy-Proportional Datacenter Memory with Mobile DRAM Towards Energy-Proportional Datacenter Memory with Mobile DRAM Krishna Malladi 1 Frank Nothaft 1 Karthika Periyathambi Benjamin Lee 2 Christos Kozyrakis 1 Mark Horowitz 1 Stanford University 1 Duke University

More information

Chapter 7. Multicores, Multiprocessors, and Clusters. Goal: connecting multiple computers to get higher performance

Chapter 7. Multicores, Multiprocessors, and Clusters. Goal: connecting multiple computers to get higher performance Chapter 7 Multicores, Multiprocessors, and Clusters Introduction Goal: connecting multiple computers to get higher performance Multiprocessors Scalability, availability, power efficiency Job-level (process-level)

More information

Measuring Performance. Speed-up, Amdahl s Law, Gustafson s Law, efficiency, benchmarks

Measuring Performance. Speed-up, Amdahl s Law, Gustafson s Law, efficiency, benchmarks Measuring Performance Speed-up, Amdahl s Law, Gustafson s Law, efficiency, benchmarks Why Measure Performance? Performance tells you how you are doing and whether things can be improved appreciably When

More information

HOW TO WRITE PARALLEL PROGRAMS AND UTILIZE CLUSTERS EFFICIENTLY

HOW TO WRITE PARALLEL PROGRAMS AND UTILIZE CLUSTERS EFFICIENTLY HOW TO WRITE PARALLEL PROGRAMS AND UTILIZE CLUSTERS EFFICIENTLY Sarvani Chadalapaka HPC Administrator University of California Merced, Office of Information Technology schadalapaka@ucmerced.edu it.ucmerced.edu

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients

More information

Lecture 27 Programming parallel hardware" Suggested reading:" (see next slide)"

Lecture 27 Programming parallel hardware Suggested reading: (see next slide) Lecture 27 Programming parallel hardware" Suggested reading:" (see next slide)" 1" Suggested Readings" Readings" H&P: Chapter 7 especially 7.1-7.8" Introduction to Parallel Computing" https://computing.llnl.gov/tutorials/parallel_comp/"

More information

K-means Clustering of Proportional Data Using L1 Distance

K-means Clustering of Proportional Data Using L1 Distance K-means Clustering of Proportional Data Using L1 Distance Hisashi Kashima IBM Research, Tokyo Research Laboratory Jianying Hu Bonnie Ray Moninder Singh IBM Research, T.J. Watson Research Center Copyright

More information

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

IX: A Protected Dataplane Operating System for High Throughput and Low Latency IX: A Protected Dataplane Operating System for High Throughput and Low Latency Belay, A. et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Reviewed by Chun-Yu and Xinghao Li Summary In this

More information

CS 293S Parallelism and Dependence Theory

CS 293S Parallelism and Dependence Theory CS 293S Parallelism and Dependence Theory Yufei Ding Reference Book: Optimizing Compilers for Modern Architecture by Allen & Kennedy Slides adapted from Louis-Noël Pouche, Mary Hall End of Moore's Law

More information

PARALLEL AND DISTRIBUTED COMPUTING

PARALLEL AND DISTRIBUTED COMPUTING PARALLEL AND DISTRIBUTED COMPUTING 2013/2014 1 st Semester 2 nd Exam January 29, 2014 Duration: 2h00 - No extra material allowed. This includes notes, scratch paper, calculator, etc. - Give your answers

More information

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication

More information

Research Article Average Bandwidth Allocation Model of WFQ

Research Article Average Bandwidth Allocation Model of WFQ Modelling and Simulation in Engineering Volume 2012, Article ID 301012, 7 pages doi:10.1155/2012/301012 Research Article Average Bandwidth Allocation Model of WFQ TomášBaloghandMartinMedvecký Institute

More information

Homework # 1 Due: Feb 23. Multicore Programming: An Introduction

Homework # 1 Due: Feb 23. Multicore Programming: An Introduction C O N D I T I O N S C O N D I T I O N S Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.86: Parallel Computing Spring 21, Agarwal Handout #5 Homework #

More information

CPU THREAD PRIORITIZATION USING A DYNAMIC QUANTUM TIME ROUND-ROBIN ALGORITHM

CPU THREAD PRIORITIZATION USING A DYNAMIC QUANTUM TIME ROUND-ROBIN ALGORITHM CPU THREAD PRIORITIZATION USING A DYNAMIC QUANTUM TIME ROUND-ROBIN ALGORITHM Maysoon A. Mohammed 1, 2, Mazlina Abdul Majid 1, Balsam A. Mustafa 1 and Rana Fareed Ghani 3 1 Faculty of Computer System &

More information

Parallel Systems. Part 7: Evaluation of Computers and Programs. foils by Yang-Suk Kee, X. Sun, T. Fahringer

Parallel Systems. Part 7: Evaluation of Computers and Programs. foils by Yang-Suk Kee, X. Sun, T. Fahringer Parallel Systems Part 7: Evaluation of Computers and Programs foils by Yang-Suk Kee, X. Sun, T. Fahringer How To Evaluate Computers and Programs? Learning objectives: Predict performance of parallel programs

More information

CS 6453: Parameter Server. Soumya Basu March 7, 2017

CS 6453: Parameter Server. Soumya Basu March 7, 2017 CS 6453: Parameter Server Soumya Basu March 7, 2017 What is a Parameter Server? Server for large scale machine learning problems Machine learning tasks in a nutshell: Feature Extraction (1, 1, 1) (2, -1,

More information

SoftNAS Cloud Performance Evaluation on Microsoft Azure

SoftNAS Cloud Performance Evaluation on Microsoft Azure SoftNAS Cloud Performance Evaluation on Microsoft Azure November 30, 2016 Contents SoftNAS Cloud Overview... 3 Introduction... 3 Executive Summary... 4 Key Findings for Azure:... 5 Test Methodology...

More information

Performance, Power, Die Yield. CS301 Prof Szajda

Performance, Power, Die Yield. CS301 Prof Szajda Performance, Power, Die Yield CS301 Prof Szajda Administrative HW #1 assigned w Due Wednesday, 9/3 at 5:00 pm Performance Metrics (How do we compare two machines?) What to Measure? Which airplane has the

More information

Multiprocessor and Real-Time Scheduling. Chapter 10

Multiprocessor and Real-Time Scheduling. Chapter 10 Multiprocessor and Real-Time Scheduling Chapter 10 1 Roadmap Multiprocessor Scheduling Real-Time Scheduling Linux Scheduling Unix SVR4 Scheduling Windows Scheduling Classifications of Multiprocessor Systems

More information

Memshare: a Dynamic Multi-tenant Key-value Cache

Memshare: a Dynamic Multi-tenant Key-value Cache Memshare: a Dynamic Multi-tenant Key-value Cache ASAF CIDON*, DANIEL RUSHTON, STEPHEN M. RUMBLE, RYAN STUTSMAN *STANFORD UNIVERSITY, UNIVERSITY OF UTAH, GOOGLE INC. 1 Cache is 100X Faster Than Database

More information

Chapter 18 - Multicore Computers

Chapter 18 - Multicore Computers Chapter 18 - Multicore Computers Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ Luis Tarrataca Chapter 18 - Multicore Computers 1 / 28 Table of Contents I 1 2 Where to focus your study Luis Tarrataca

More information

Why Study Multimedia? Operating Systems. Multimedia Resource Requirements. Continuous Media. Influences on Quality. An End-To-End Problem

Why Study Multimedia? Operating Systems. Multimedia Resource Requirements. Continuous Media. Influences on Quality. An End-To-End Problem Why Study Multimedia? Operating Systems Operating System Support for Multimedia Improvements: Telecommunications Environments Communication Fun Outgrowth from industry telecommunications consumer electronics

More information

High Performance Computing Systems

High Performance Computing Systems High Performance Computing Systems Shared Memory Doug Shook Shared Memory Bottlenecks Trips to memory Cache coherence 2 Why Multicore? Shared memory systems used to be purely the domain of HPC... What

More information

Analytical Modeling of Parallel Programs

Analytical Modeling of Parallel Programs 2014 IJEDR Volume 2, Issue 1 ISSN: 2321-9939 Analytical Modeling of Parallel Programs Hardik K. Molia Master of Computer Engineering, Department of Computer Engineering Atmiya Institute of Technology &

More information

SoftNAS Cloud Performance Evaluation on AWS

SoftNAS Cloud Performance Evaluation on AWS SoftNAS Cloud Performance Evaluation on AWS October 25, 2016 Contents SoftNAS Cloud Overview... 3 Introduction... 3 Executive Summary... 4 Key Findings for AWS:... 5 Test Methodology... 6 Performance Summary

More information

Design of Parallel Algorithms. Course Introduction

Design of Parallel Algorithms. Course Introduction + Design of Parallel Algorithms Course Introduction + CSE 4163/6163 Parallel Algorithm Analysis & Design! Course Web Site: http://www.cse.msstate.edu/~luke/courses/fl17/cse4163! Instructor: Ed Luke! Office:

More information

Parallel Computation/Program Issues

Parallel Computation/Program Issues Parallel Computation/Program Issues Dependency Analysis: Types of dependency Dependency Graphs Bernstein s s Conditions of Parallelism Asymptotic Notations for Algorithm Complexity Analysis Parallel Random-Access

More information

New Optimal Load Allocation for Scheduling Divisible Data Grid Applications

New Optimal Load Allocation for Scheduling Divisible Data Grid Applications New Optimal Load Allocation for Scheduling Divisible Data Grid Applications M. Othman, M. Abdullah, H. Ibrahim, and S. Subramaniam Department of Communication Technology and Network, University Putra Malaysia,

More information

Workloads Programmierung Paralleler und Verteilter Systeme (PPV)

Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Workloads 2 Hardware / software execution environment

More information

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Motivation And Intro Programming Model Spark Data Transformation Model Construction Model Training Model Inference Execution Model Data Parallel Training

More information

April 3, 2012 T.C. Havens

April 3, 2012 T.C. Havens April 3, 2012 T.C. Havens Different training parameters MLP with different weights, number of layers/nodes, etc. Controls instability of classifiers (local minima) Similar strategies can be used to generate

More information

DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA

DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA M. GAUS, G. R. JOUBERT, O. KAO, S. RIEDEL AND S. STAPEL Technical University of Clausthal, Department of Computer Science Julius-Albert-Str. 4, 38678

More information

Operating System Support for Multimedia. Slides courtesy of Tay Vaughan Making Multimedia Work

Operating System Support for Multimedia. Slides courtesy of Tay Vaughan Making Multimedia Work Operating System Support for Multimedia Slides courtesy of Tay Vaughan Making Multimedia Work Why Study Multimedia? Improvements: Telecommunications Environments Communication Fun Outgrowth from industry

More information

High Performance Computing

High Performance Computing The Need for Parallelism High Performance Computing David McCaughan, HPC Analyst SHARCNET, University of Guelph dbm@sharcnet.ca Scientific investigation traditionally takes two forms theoretical empirical

More information

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620 Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved

More information