Towards Fair and Efficient SMP Virtual Machine Scheduling
|
|
- Alice Baker
- 5 years ago
- Views:
Transcription
1 Towards Fair and Efficient SMP Virtual Machine Scheduling Jia Rao and Xiaobo Zhou University of Colorado, Colorado Springs
2 Executive Summary Problem: unfairness and inefficiency in consolidating SMP VMs Existing VM schedulers favor VMs w/ more virtual CPUs Fairness mechanisms hurt parallel performance Flex is a scheduling framework that: Adaptively adjusts vcpu weights for VM-level fairness Flexibly schedules vcpus to minimize unnecessary spinning Results: 5% error to the ideally fair allocation, 30%+ performance improvement for parallel workloads, ~1% overhead in Xen
3 SMP VM Consolidation Abundant hardware parallelism in DC SMP VMs are prevalent in the cloud 26 out of 29 instance types in Amazon EC2 have more than two vcpus Support parallel applications Heterogeneous consolidation is common, e.g., Amazon EC2
4 Unfair VM CPU Allocation CPU allocation vcpu0 vcpu1 vcpu2 NOT proportional to VM weights VMs w/ more vcpus gain advantage Common issue in all hypervisors vcpu0 Normalized CPU consumption 3-vCPU VM 4-vCPU VM 1.59 vcpu1 vcpu2 pcpu0 pcpu1 pcpu2 pcpu3 Fair share = two CPU cores 1.42 vcpu Xen KVM VMware
5 Causes of Unfairness Per-CPU scheduling Independent scheduler on each CPU VM3 vcpu1 Each allocates CPU based on relative vcpu weights vcpu0 vcpu0 vcpu1 vcpu1 vcpu2 vcpu2 VM3 vcpu0 vcpu3 Per-vCPU weight dependent on VM weight and the number of vcpus Scalable but hard for VM-level fairness pcpu0 pcpu1 pcpu2 pcpu3 Equal weight VMs Box size reflects per-vcpu weight Allocations on vcpu => VM-level fairness IFF Same total weight on each CPU
6 Existing Solutions Capping VM CPU consumption (Cap) Requires pre-calculation of the fair share Introduce significant inefficiencies Non-work-conserving Load balancing (LB) to parallel applications Tries to achieve equal weights on CPUs Balance weight vs. balance run queue length
7 Cap on Busy-waiting-based Workloads Busy-waiting synchronization Dynamic task assignment Tasks stay in a busy loop waiting for lock release Avoids contexts switches CPU time (%) Homo. VM Heter. VM Useful work Busy waiting 20 Capping exacerbates the LHP issue in 0 lu sp ft ua bt cg ep mg Lock holder preemption (LHP) Preemption of vcpu holding locks virtualized environments Since spinning wastes CPU cycles, capping may mistakenly preempt vcpus that are doing useful work. Long synchronization latency
8 LB on Blocking-based Workloads Blocking synchronization Tasks go to sleep if failing to acquire the lock Avoids wasted CPU cycles vcpu stacking CPU time (%) LB exacerbates the vcpu stacking issue 20 0 bodytrack PIN LB Running Blocked canneal dedup facesim raytrace vips x264 vcpus belonging to one VM pile on the same CPU No parallelism + Long sync latency blackscholes streamcluster swaptions Since blocking vcpus frequently switch between READY and RUNNING states, they are more likely victims of work-stealing based load balancing. Gradually, stolen vcpus pile on a few CPUs
9 Related Work Fairness in multicore systems [Li-PPoPP09] - no VM-level fairness Minimizing sync latency in SMP VMs [Sukwong-Eurosys11] - avoids vcpu stacking Flex: non-intrusive, lightweight and applicable [Kim-ASPLOS13], [Weng-HPDC11] - not effective in user-level synchronization Pause loop exit (PLE) - needs hardware support Spin detection to different implementations [Wells-PACT06] - store-based spin detection, not accurate to apps with different store rates, e.g., LU in NAS parallel benchmark
10 Flex for Fairness and Efficiency Flexible vcpu weight (FlexW) Monitors VM CPU consumption Calculates fair shares based on VM weights Adjusts vcpu weights to compensate the difference Flexible vcpu scheduling (FlexS) Stops spinning vcpus to avoid wasted CPU cycles Switches the preempted vcpu with one on another CPU that is doing useful work Ensures that no vcpus from the same VM stack on one CPU
11 FlexW Design Determine the fair share P - number of shared CPUs, wi - VM weight Ideally fair share according to generalized processor sharing (GPS) each do S i,gp S (t 1,t 2 )= P w i (t 2 t 1 ) P w j Adjust VM weights wi r - VM weight P calculate the lag lag i (t 1,t 2 )= S i,gp S (t 1,t 2 ) S i (t 1,t 2 ) S i,gp S (t 1,t 2 ) r r P compensate the lag with real-time weights i,gp S wi r = wi r + w i lag i (t 1,t 2 ) if FAIR WIND
12 FlexS Design Identifying busy-waiting vcpu Non-intrusive identification without application knowledge Common pattern in different spin implementations - Spin loops contain a few instructions BPI Solo Spinning vcpu non-spinning vcpu lu sp cg povray - Spin loops are executed many times Spin loops show high branch per instruction (BPI) and low branch miss prediction rate (BMPR) BPMR Solo Spinning vcpu non-spinning vcpu lu sp cg povray
13 FlexS Design (cont ) Eliminating busy-waiting time Periodically update a vcpu s BPI and BMPR Busy-waiting vcpu voluntarily yields CPU Find a sibling vcpu to complete the unfinished time slice Switch the two vcpu to avoid vcpu stacking and run queue weight changes
14 Practical Considerations Starvation VMs demanding less than its share will have ever increasing real-time weight Solution: reset real-time weight every 10s Infeasible weight -> peak CPU demand less than the fair share Solution: peak demand as the fair share False positive in identifying spinning vcpu Solution: reset BPI and BMPR every 10s Inter-CPU locking overhead due to vcpu migrations Solution: only try twice when looking for siblings to switch- the power of two choices
15 Implementation Implement Flex in Xen s credit scheduler weight -> credit FlexW in the system-wide csched_acct() routine, adjusts VM credit refill based on real-time weights, invoked every 30ms FlexS in the per-cpu schedule() function, adds load_balance_switch() to exchange work with sibling vcpus Identify spinning vcpu in vcpu_acct() when Xen charges credit to the current running vcpu
16 Evaluation Methodology Questions: VM-level fairness? and parallel performance? Workload NAS Parallel benchmark (OpenMP, busy-waiting sync) PARSEC (Pthreads, blocking sync) Background interfering loops isolate from cache contention Scheduling strategies for comparison Xen default credit scheduler Balance+cap+CO - [Sukwong-Eurosys11] Demand+cap - [Kim-ASPLOS13]
17 VM-level Fairness Heterogeneous VMs: 1vCPU, 2vCPU, 3vCPU, 4vCPU Each running while(1) loop Relative lag = ness and differentiation. S i,gp S(t 1,t 2 ) S i (t 1,t 2 ) S i,gp S (t 1,t 2 ) Lower is better Relative lag (%) Xen Flex Number of VMs Flex: significant improvement over Xen with no more than 5% unfairness
18 VM Differentiation Proportional CPU share vCPU VM 3-vCPU VM 2-vCPU VM 3:2:1 3:1:2 2:1:3 2:3:1 1:2:3 1:3:2 VM weights Flex realizes proportional share among VMs
19 Parallel Performance Challenge: Flex allocates less CPU time to the 4vCPU VM than Xen Loop NAS Loop NAS Loop NAS NAS pcpu0 pcpu1 pcpu2 pcpu3 1.5 Xen Balance+cap+CO FlexW FlexW+FlexS Normalized runtime Observation: FlexW alone does NOT guarantee good perf. Reason: Imbalance wastes CPU time Results: FlexW+FlexS performs closely to Xen and balance+cap+co lu sp ft ua bt cg ep mg Lower is better
20 Parallel Performance NAS NAS NAS Xen FlexW Balance+cap+CO FlexW+FlexS Loop Loop Loop Loop 1.5 Normalized runtime pcpu0 pcpu1 pcpu2 pcpu3 Expected: Flex performs better than Xen Reason: Flex allocates more CPU time to the 3vCPU VM than Xen lu sp ft ua bt cg ep mg Lower is better
21 Mix of Parallel Workloads LU NAS LU NAS LU NAS NAS pcpu0 pcpu1 pcpu2 pcpu3 Results: Flex outperforms balance+cap by 30.4% Xen Balance+cap Flex Lower is better 1.5 Normalized runtime Dynamic task assignment lu sp ft ua bt cg ep mg Foreground NAS 0 Background lu
22 Overhead Execution time (micro-second) System-wide csched_acct() Xen Flex Per-CPU func. schedule() Xen Flex FlexW overhead: VM weight adjustment Overhead increases with # of VMs performed by the idle VM not affecting parallel performance FlexS overhead: vcpu stealing constant overhead, not increases with # of VMs frequency of schedule() - 30ms less than1% overhead
23 Conclusions & Future Work Fairness-efficiency tradeoff Straightforward solutions to unfairness lead to poor efficiency Flex: a holistic solution Adaptively adjusts weight for fairness Flexibly schedule vcpus to minimize wasted work Problem: NOT quite effective for apps with dynamic task assignment Future work Cross-layer application-cloud coordination
24 Thank you! Questions?
25 Backup Slides Begin Here
26 Parallel Performance (blocking) Loop Loop Loop PARSEC PARSEC PARSEC PARSEC Xen FlexW Demand+cap FlexW+FlexS pcpu0 pcpu1 pcpu2 pcpu3 1.5 Observation: Both Flex and demand+cap improve perf. 1 Reason: Avoiding vcpu stacking helps a lot 0.5 Conclusion: Flex does not incur much penalty to blocking sync.-based apps 0 blackscholes bodytrack caneal dedup facesim raytrace streamcluster swaptions vips x264
Improving Virtual Machine Scheduling in NUMA Multicore Systems
Improving Virtual Machine Scheduling in NUMA Multicore Systems Jia Rao, Xiaobo Zhou University of Colorado, Colorado Springs Kun Wang, Cheng-Zhong Xu Wayne State University http://cs.uccs.edu/~jrao/ Multicore
More informationScheduler Activations for Interference-Resilient SMP Virtual Machine Scheduling
Scheduler Activations for Interference-Resilient SMP Virtual Machine Scheduling Yong Zhao 1, Kun Suo 1, Luwei Cheng 2, and Jia Rao 1 1 The University of Texas at Arlington, {yong.zhao, kuo.suo, jia.rao}@uta.edu
More informationPreserving I/O Prioritization in Virtualized OSes
Preserving I/O Prioritization in Virtualized OSes Kun Suo 1, Yong Zhao 1, Jia Rao 1, Luwei Cheng 2, Xiaobo Zhou 3, Francis C. M. Lau 4 The University of Texas at Arlington 1, Facebook 2, University of
More informationTowards Efficient Work-stealing in Virtualized Environments
Towards Efficient Work-stealing in Virtualized Environments Yaqiong Peng, Song Wu, Hai Jin Services Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science and Technology
More informationRT- Xen: Real- Time Virtualiza2on. Chenyang Lu Cyber- Physical Systems Laboratory Department of Computer Science and Engineering
RT- Xen: Real- Time Virtualiza2on Chenyang Lu Cyber- Physical Systems Laboratory Department of Computer Science and Engineering Embedded Systems Ø Consolidate 100 ECUs à ~10 multicore processors. Ø Integrate
More informationMicro-sliced Virtual Processors to Hide the Effect of Discontinuous CPU Availability for Consolidated Systems
Micro-sliced Virtual Processors to Hide the Effect of Discontinuous CPU Availability for Consolidated Systems Jeongseob Ahn, Chang Hyun Park, and Jaehyuk Huh Computer Science Department, KAIST {jeongseob,
More informationRT#Xen:(Real#Time( Virtualiza2on(for(the(Cloud( Chenyang(Lu( Cyber-Physical(Systems(Laboratory( Department(of(Computer(Science(and(Engineering(
RT#Xen:(Real#Time( Virtualiza2on(for(the(Cloud( Chenyang(Lu( Cyber-Physical(Systems(Laboratory( Department(of(Computer(Science(and(Engineering( Real#Time(Virtualiza2on(! Cars are becoming real-time mini-clouds!!
More informationSchedule Processes, not VCPUs
Schedule Processes, not VCPUs Xiang Song, Jicheng Shi, Haibo Chen, Binyu Zang Institute of Parallel and Distributed Systems School of Software, Shanghai Jiao Tong University Abstract Multiprocessor virtual
More informationPerformance Optimization on Huawei Public and Private Cloud
Performance Optimization on Huawei Public and Private Cloud Jinsong Liu Lei Gong Agenda Optimization for LHP Balance scheduling RTC optimization 2 Agenda
More informationVirtual Snooping: Filtering Snoops in Virtualized Multi-cores
Appears in the 43 rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-43) Virtual Snooping: Filtering Snoops in Virtualized Multi-cores Daehoon Kim, Hwanju Kim, and Jaehyuk Huh Computer
More informationRT- Xen: Real- Time Virtualiza2on from embedded to cloud compu2ng
RT- Xen: Real- Time Virtualiza2on from embedded to cloud compu2ng Chenyang Lu Cyber- Physical Systems Laboratory Department of Computer Science and Engineering Real- Time Virtualiza2on for Cars Ø Consolidate
More informationIN the cloud, the number of virtual CPUs (VCPUs) in a virtual
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 28, NO. 7, JULY 2017 1811 APPLES: Efficiently Handling Spin-lock Synchronization on Virtualized Platforms Jianchen Shan, Xiaoning Ding, and Narain
More informationReal-time scheduling for virtual machines in SK Telecom
Real-time scheduling for virtual machines in SK Telecom Eunkyu Byun Cloud Computing Lab., SK Telecom Sponsored by: & & Cloud by Virtualization in SKT Provide virtualized ICT infra to customers like Amazon
More informationPreemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization
Preemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization Wei Chen, Jia Rao*, and Xiaobo Zhou University of Colorado, Colorado Springs * University of Texas at Arlington Data Center
More informationReal-Time Internet of Things
Real-Time Internet of Things Chenyang Lu Cyber-Physical Systems Laboratory h7p://www.cse.wustl.edu/~lu/ Internet of Things Ø Convergence of q Miniaturized devices: integrate processor, sensors and radios.
More informationSampled Simulation of Multi-Threaded Applications
Sampled Simulation of Multi-Threaded Applications Trevor E. Carlson, Wim Heirman, Lieven Eeckhout Department of Electronics and Information Systems, Ghent University, Belgium Intel ExaScience Lab, Belgium
More informationPoris: A Scheduler for Parallel Soft Real-Time Applications in Virtualized Environments
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. X, NO. Y, MONTH YEAR 1 Poris: A Scheduler for Parallel Soft Real-Time Applications in Virtualized Environments Song Wu, Member, I EEE, Like Zhou,
More informationvprobe: Scheduling Virtual Machines on NUMA Systems
vprobe: Scheduling Virtual Machines on NUMA Systems Song Wu, Huahua Sun, Like Zhou, Qingtian Gan, Hai Jin Service Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science
More informationCFS-v: I/O Demand-driven VM Scheduler in KVM
CFS-v: Demand-driven VM Scheduler in KVM Hyotaek Shim and Sung-Min Lee (hyotaek.shim, sung.min.lee@samsung.com) Software R&D Center, Samsung Electronics 2014. 10. 16 Problem in Server Consolidation 2/16
More informationVMware vsphere 4: The CPU Scheduler in VMware ESX 4 W H I T E P A P E R
VMware vsphere 4: The CPU Scheduler in VMware ESX 4 W H I T E P A P E R Table of Contents 1 Introduction..................................................... 3 2 ESX CPU Scheduler Overview......................................
More informationIntroduction Contech s Task Graph Representation Parallel Program Instrumentation (Break) Analysis and Usage of a Contech Task Graph Hands-on
Introduction Contech s Task Graph Representation Parallel Program Instrumentation (Break) Analysis and Usage of a Contech Task Graph Hands-on Exercises 2 Contech is An LLVM compiler pass to instrument
More informationApplication-Specific Configuration Selection in the Cloud: Impact of Provider Policy and Potential of Systematic Testing
Application-Specific Configuration Selection in the Cloud: Impact of Provider Policy and Potential of Systematic Testing Mohammad Hajjat +, Ruiqi Liu*, Yiyang Chang +, T.S. Eugene Ng*, Sanjay Rao + + Purdue
More informationDemand-Based Coordinated Scheduling for SMP VMs
Demand-Based Coordinated Scheduling for SMP VMs Hwanju Kim, Sangwook Kim, Jinkyu Jeong, Joonwon Lee, Seungryoul Maeng Computer Science Department, Korea Advanced Institute of Science and Technology (KAIST),
More informationPARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites
PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites Christian Bienia (Princeton University), Sanjeev Kumar (Intel), Kai Li (Princeton University) Outline Overview What
More informationCPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou ( Zhejiang University
Operating Systems (Fall/Winter 2018) CPU Scheduling Yajin Zhou (http://yajin.org) Zhejiang University Acknowledgement: some pages are based on the slides from Zhi Wang(fsu). Review Motivation to use threads
More informationPROPORTIONAL fairness in CPU scheduling mandates
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. XX, MAY 218 1 GVTS: Global Virtual Time Fair Scheduling to Support Strict on Many Cores Changdae Kim, Seungbeom Choi, Jaehyuk Huh, Member,
More informationSynchronization-Aware Scheduling for Virtual Clusters in Cloud
TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS Synchronization-Aware Scheduling for Virtual Clusters in Cloud Song Wu, Member, IEEE, Haibao Chen, Sheng Di, Member, IEEE, Bingbing Zhou, Zhenjiang Xie,
More informationAdaptive Power Profiling for Many-Core HPC Architectures
Adaptive Power Profiling for Many-Core HPC Architectures Jaimie Kelley, Christopher Stewart The Ohio State University Devesh Tiwari, Saurabh Gupta Oak Ridge National Laboratory State-of-the-Art Schedulers
More informationCS 326: Operating Systems. CPU Scheduling. Lecture 6
CS 326: Operating Systems CPU Scheduling Lecture 6 Today s Schedule Agenda? Context Switches and Interrupts Basic Scheduling Algorithms Scheduling with I/O Symmetric multiprocessing 2/7/18 CS 326: Operating
More informationAbstract. Testing Parameters. Introduction. Hardware Platform. Native System
Abstract In this paper, we address the latency issue in RT- XEN virtual machines that are available in Xen 4.5. Despite the advantages of applying virtualization to systems, the default credit scheduler
More informationReducing Network Contention with Mixed Workloads on Modern Multicore Clusters
Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters Matthew Koop 1 Miao Luo D. K. Panda matthew.koop@nasa.gov {luom, panda}@cse.ohio-state.edu 1 NASA Center for Computational
More informationLecture Topics. Announcements. Today: Uniprocessor Scheduling (Stallings, chapter ) Next: Advanced Scheduling (Stallings, chapter
Lecture Topics Today: Uniprocessor Scheduling (Stallings, chapter 9.1-9.3) Next: Advanced Scheduling (Stallings, chapter 10.1-10.4) 1 Announcements Self-Study Exercise #10 Project #8 (due 11/16) Project
More informationPerformance Evaluation of OpenMP Applications on Virtualized Multicore Machines
Performance Evaluation of OpenMP Applications on Virtualized Multicore Machines Jie Tao 1 Karl Fuerlinger 2 Holger Marten 1 jie.tao@kit.edu karl.fuerlinger@nm.ifi.lmu.de holger.marten@kit.edu 1 : Steinbuch
More informationData Criticality in Network-On-Chip Design. Joshua San Miguel Natalie Enright Jerger
Data Criticality in Network-On-Chip Design Joshua San Miguel Natalie Enright Jerger Network-On-Chip Efficiency Efficiency is the ability to produce results with the least amount of waste. Wasted time Wasted
More informationAnalysis of Virtual Machine Scalability based on Queue Spinlock
, pp.15-19 http://dx.doi.org/10.14257/astl.2017.148.04 Analysis of Virtual Machine Scalability based on Queue Spinlock Seunghyub Jeon, Seung-Jun Cha, Yeonjeong Jung, Jinmee Kim and Sungin Jung Electronics
More informationExploring the Throughput-Fairness Trade-off on Asymmetric Multicore Systems
Exploring the Throughput-Fairness Trade-off on Asymmetric Multicore Systems J.C. Sáez, A. Pousa, F. Castro, D. Chaver y M. Prieto Complutense University of Madrid, Universidad Nacional de la Plata-LIDI
More informationBottleneck Identification and Scheduling in Multithreaded Applications. José A. Joao M. Aater Suleman Onur Mutlu Yale N. Patt
Bottleneck Identification and Scheduling in Multithreaded Applications José A. Joao M. Aater Suleman Onur Mutlu Yale N. Patt Executive Summary Problem: Performance and scalability of multithreaded applications
More informationMbench: Benchmarking a Multicore Operating System Using Mixed Workloads
Mbench: Benchmarking a Multicore Operating System Using Mixed Workloads Gang Lu and Xinlong Lin Institute of Computing Technology, Chinese Academy of Sciences BPOE-6, Sep 4, 2015 Backgrounds Fast evolution
More informationIdentifying Ad-hoc Synchronization for Enhanced Race Detection
Identifying Ad-hoc Synchronization for Enhanced Race Detection IPD Tichy Lehrstuhl für Programmiersysteme IPDPS 20 April, 2010 Ali Jannesari / Walter F. Tichy KIT die Kooperation von Forschungszentrum
More informationImproving CPU Performance of Xen Hypervisor in Virtualized Environment
ISSN: 2393-8528 Contents lists available at www.ijicse.in International Journal of Innovative Computer Science & Engineering Volume 5 Issue 3; May-June 2018; Page No. 14-19 Improving CPU Performance of
More informationOvercoming Virtualization Overheads for Large-vCPU Virtual Machines
Overcoming Virtualization Overheads for Large-vCPU Virtual Machines Ozgur Kilic, Spoorti Doddamani, Aprameya Bhat, Hardik Bagdi, Kartik Gopalan Contact: {okilic1,sdoddam1,abhat3,hbagdi1,kartik}@binghamton.edu
More informationPerformance & Scalability Testing in Virtual Environment Hemant Gaidhani, Senior Technical Marketing Manager, VMware
Performance & Scalability Testing in Virtual Environment Hemant Gaidhani, Senior Technical Marketing Manager, VMware 2010 VMware Inc. All rights reserved About the Speaker Hemant Gaidhani Senior Technical
More informationMARACAS: A Real-Time Multicore VCPU Scheduling Framework
: A Real-Time Framework Computer Science Department Boston University Overview 1 2 3 4 5 6 7 Motivation platforms are gaining popularity in embedded and real-time systems concurrent workload support less
More informationMOST PROGRESS MADE ALGORITHM: COMBATING SYNCHRONIZATION INDUCED PERFORMANCE LOSS ON SALVAGED CHIP MULTI-PROCESSORS
MOST PROGRESS MADE ALGORITHM: COMBATING SYNCHRONIZATION INDUCED PERFORMANCE LOSS ON SALVAGED CHIP MULTI-PROCESSORS by Jacob J. Dutson A thesis submitted in partial fulfillment of the requirements for the
More informationChapter 5 C. Virtual machines
Chapter 5 C Virtual machines Virtual Machines Host computer emulates guest operating system and machine resources Improved isolation of multiple guests Avoids security and reliability problems Aids sharing
More informationWALL: A Writeback-Aware LLC Management for PCM-based Main Memory Systems
: A Writeback-Aware LLC Management for PCM-based Main Memory Systems Bahareh Pourshirazi *, Majed Valad Beigi, Zhichun Zhu *, and Gokhan Memik * University of Illinois at Chicago Northwestern University
More informationEnhancement of Xen s Scheduler for MapReduce Workloads
Enhancement of Xen s Scheduler for MapReduce Workloads Hui Kang (1) Yao Chen (1) Jennifer Wong (1) Radu Sion (1) Jason Wu (2) (1) CS Department SUNY Stony Brook University (2) CS Department Cornell University
More informationvcache: Architectural Support for Transparent and Isolated Virtual LLCs in Virtualized Environments
vcache: Architectural Support for Transparent and Isolated Virtual LLCs in Virtualized Environments Daehoon Kim *, Hwanju Kim, Nam Sung Kim *, and Jaehyuk Huh * University of Illinois at Urbana-Champaign,
More informationChallenges of Memory Management on Modern NUMA Systems
DOI:1.1145/2814328 Article development led by queue.acm.org Optimizing NUMA systems applications with Carrefour. BY FABIEN GAUD, BAPTISTE LEPERS, JUSTIN FUNSTON, MOHAMMAD DASHTI, ALEXANDRA FEDOROVA, VIVIEN
More informationEnergy Models for DVFS Processors
Energy Models for DVFS Processors Thomas Rauber 1 Gudula Rünger 2 Michael Schwind 2 Haibin Xu 2 Simon Melzner 1 1) Universität Bayreuth 2) TU Chemnitz 9th Scheduling for Large Scale Systems Workshop July
More informationCS Computer Systems. Lecture 6: Process Scheduling
CS 5600 Computer Systems Lecture 6: Process Scheduling Scheduling Basics Simple Schedulers Priority Schedulers Fair Share Schedulers Multi-CPU Scheduling Case Study: The Linux Kernel 2 Setting the Stage
More informationA Case for High Performance Computing with Virtual Machines
A Case for High Performance Computing with Virtual Machines Wei Huang*, Jiuxing Liu +, Bulent Abali +, and Dhabaleswar K. Panda* *The Ohio State University +IBM T. J. Waston Research Center Presentation
More informationTask Scheduling of Real- Time Media Processing with Hardware-Assisted Virtualization Heikki Holopainen
Task Scheduling of Real- Time Media Processing with Hardware-Assisted Virtualization Heikki Holopainen Aalto University School of Electrical Engineering Degree Programme in Communications Engineering Supervisor:
More informationFault Tolerant Java Virtual Machine. Roy Friedman and Alon Kama Technion Haifa, Israel
Fault Tolerant Java Virtual Machine Roy Friedman and Alon Kama Technion Haifa, Israel Objective Create framework for transparent fault-tolerance Support legacy applications Intended for long-lived, highly
More informationReal-Time Cache Management for Multi-Core Virtualization
Real-Time Cache Management for Multi-Core Virtualization Hyoseung Kim 1,2 Raj Rajkumar 2 1 University of Riverside, California 2 Carnegie Mellon University Benefits of Multi-Core Processors Consolidation
More informationEmbracing Heterogeneity with Dynamic Core Boosting
Embracing Heterogeneity with Dynamic Core Boosting Hyoun Kyu Cho and Scott Mahlke Advanced Computer Architecture Laboratory University of Michigan - Ann Arbor, MI {netforce, mahlke}@umich.edu ABSTRACT
More informationEnergy-centric DVFS Controlling Method for Multi-core Platforms
Energy-centric DVFS Controlling Method for Multi-core Platforms Shin-gyu Kim, Chanho Choi, Hyeonsang Eom, Heon Y. Yeom Seoul National University, Korea MuCoCoS 2012 Salt Lake City, Utah Abstract Goal To
More informationLECTURE 3:CPU SCHEDULING
LECTURE 3:CPU SCHEDULING 1 Outline Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time CPU Scheduling Operating Systems Examples Algorithm Evaluation 2 Objectives
More informationEmpirical Evaluation of Latency-Sensitive Application Performance in the Cloud
Empirical Evaluation of Latency-Sensitive Application Performance in the Cloud Sean Barker and Prashant Shenoy University of Massachusetts Amherst Department of Computer Science Cloud Computing! Cloud
More informationDiagnosing Virtualization Overhead for Multi-threaded Computation on Multicore Platforms
215 IEEE 7th International Conference on Cloud Computing Technology and Science Diagnosing Virtualization Overhead for Multi-threaded Computation on Multicore Platforms Xiaoning Ding Jianchen Shan Department
More informationGetting Started with
/************************************************************************* * LaCASA Laboratory * * Authors: Aleksandar Milenkovic with help of Mounika Ponugoti * * Email: milenkovic@computer.org * * Date:
More informationW4118: advanced scheduling
W4118: advanced scheduling Instructor: Junfeng Yang References: Modern Operating Systems (3 rd edition), Operating Systems Concepts (8 th edition), previous W4118, and OS at MIT, Stanford, and UWisc Outline
More informationAn interface to implement NUMA policies in the Xen hypervisor
An interface to implement NUMA policies in the Xen hypervisor Gauthier Voron Sorbonne Universités, UPMC Univ Paris 06, CNRS, INRIA, LIP6 Gaël Thomas Télécom SudParis, Université Paris-Saclay Vivien Quéma
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 27 Virtualization Slides based on Various sources 1 1 Virtualization Why we need virtualization? The concepts and
More informationEnhancing Linux Scheduler Scalability
Enhancing Linux Scheduler Scalability Mike Kravetz IBM Linux Technology Center Hubertus Franke, Shailabh Nagar, Rajan Ravindran IBM Thomas J. Watson Research Center {mkravetz,frankeh,nagar,rajancr}@us.ibm.com
More informationCPU Scheduling. The scheduling problem: When do we make decision? - Have K jobs ready to run - Have N 1 CPUs - Which jobs to assign to which CPU(s)
1/32 CPU Scheduling The scheduling problem: - Have K jobs ready to run - Have N 1 CPUs - Which jobs to assign to which CPU(s) When do we make decision? 2/32 CPU Scheduling Scheduling decisions may take
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 23 Virtual memory Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 FAQ Is a page replaces when
More informationMicrosoft Applications on Nutanix
Microsoft Applications on Nutanix Lukas Lundell Sachin Chheda Chris Brown #nextconf #AW105 Agenda Why MS Exchange or any vbca on Nutanix Exchange Solution Design Methodology SharePoint and Unified Communications
More informationvscope: A Fine-Grained Approach to Schedule vcpus in NUMA Systems
2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems
More informationPerformance Prediction and Evaluation of Parallel Applications in KVM, Xen, and VMware
Performance Prediction and Evaluation of Parallel Applications in KVM, Xen, and VMware Cheol-Ho Hong 1, Beom-Joon Kim 2, Young-Pil Kim 1, Hyunchan Park 1, and Chuck Yoo 1 1 Korea University, Seoul, South
More informationRAMP-White / FAST-MP
RAMP-White / FAST-MP Hari Angepat and Derek Chiou Electrical and Computer Engineering University of Texas at Austin Supported in part by DOE, NSF, SRC,Bluespec, Intel, Xilinx, IBM, and Freescale RAMP-White
More informationOur Many-core Benchmarks Do Not Use That Many Cores Paul D. Bryan Jesse G. Beu Thomas M. Conte Paolo Faraboschi Daniel Ortega
Our Many-core Benchmarks Do Not Use That Many Cores Paul D. Bryan Jesse G. Beu Thomas M. Conte Paolo Faraboschi Daniel Ortega Georgia Institute of Technology HP Labs, Exascale Computing Lab {paul.bryan,
More informationSystematic Cooperation in P2P Grids
29th October 2008 Cyril Briquet Doctoral Dissertation in Computing Science Department of EE & CS (Montefiore Institute) University of Liège, Belgium Application class: Bags of Tasks Bag of Task = set of
More informationCPU Scheduling. CSE 2431: Introduction to Operating Systems Reading: Chapter 6, [OSC] (except Sections )
CPU Scheduling CSE 2431: Introduction to Operating Systems Reading: Chapter 6, [OSC] (except Sections 6.7.2 6.8) 1 Contents Why Scheduling? Basic Concepts of Scheduling Scheduling Criteria A Basic Scheduling
More informationChapter 6: CPU Scheduling. Operating System Concepts 9 th Edition
Chapter 6: CPU Scheduling Silberschatz, Galvin and Gagne 2013 Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Thread Scheduling Multiple-Processor Scheduling Real-Time
More informationWhat s An OS? Cyclic Executive. Interrupts. Advantages Simple implementation Low overhead Very predictable
What s An OS? Provides environment for executing programs Process abstraction for multitasking/concurrency scheduling Hardware abstraction layer (device drivers) File systems Communication Do we need an
More informationDisclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme
SER1815BU DRS Advancements: What's New and What Is Being Cooked Up in Resource Management Land VMworld 2017 Thomas Bryant, VMware, Inc - @kix1979 Maarten Wiggers, VMware, Inc Content: Not for publication
More informationFrequently asked questions from the previous class survey
CS 370: OPERATING SYSTEMS [CPU SCHEDULING] Shrideep Pallickara Computer Science Colorado State University L15.1 Frequently asked questions from the previous class survey Could we record burst times in
More informationImplementing Efficient and Scalable Flow Control Schemes in MPI over InfiniBand
Implementing Efficient and Scalable Flow Control Schemes in MPI over InfiniBand Jiuxing Liu and Dhabaleswar K. Panda Computer Science and Engineering The Ohio State University Presentation Outline Introduction
More informationSet Cooperative Cache for Virtual Machine Relocation
International Conference on Logistics Engineering, Management and Computer Science (LEMCS 2015) Set Cooperative Cache for Virtual Machine Relocation Cong Hu Anhui Electric Power Corporation, Information
More informationOptimizing Replication, Communication, and Capacity Allocation in CMPs
Optimizing Replication, Communication, and Capacity Allocation in CMPs Zeshan Chishti, Michael D Powell, and T. N. Vijaykumar School of ECE Purdue University Motivation CMP becoming increasingly important
More informationVARIABILITY IN OPERATING SYSTEMS
VARIABILITY IN OPERATING SYSTEMS Brian Kocoloski Assistant Professor in CSE Dept. October 8, 2018 1 CLOUD COMPUTING Current estimate is that 94% of all computation will be performed in the cloud by 2021
More informationSoftware-Controlled Transparent Management of Heterogeneous Memory Resources in Virtualized Systems
Software-Controlled Transparent Management of Heterogeneous Memory Resources in Virtualized Systems Min Lee Vishal Gupta Karsten Schwan College of Computing Georgia Institute of Technology {minlee,vishal,schwan}@cc.gatech.edu
More informationA Simulation Framework to Evaluate Virtual CPU Scheduling Algorithms
2013 IEEE 33rd International Conference on Distributed Computing Systems Workshops A Simulation Framework to Evaluate Virtual CPU Scheduling Algorithms Cuong Pham, Qingkun Li, Zachary Estrada, Zbigniew
More informationKEYNOTE FOR
THE OS SCHEDULER: A PERFORMANCE-CRITICAL COMPONENT IN LINUX CLUSTER ENVIRONMENTS KEYNOTE FOR BPOE-9 @ASPLOS2018 THE NINTH WORKSHOP ON BIG DATA BENCHMARKS, PERFORMANCE, OPTIMIZATION AND EMERGING HARDWARE
More informationProcess- Concept &Process Scheduling OPERATING SYSTEMS
OPERATING SYSTEMS Prescribed Text Book Operating System Principles, Seventh Edition By Abraham Silberschatz, Peter Baer Galvin and Greg Gagne PROCESS MANAGEMENT Current day computer systems allow multiple
More informationChapter 6: CPU Scheduling. Operating System Concepts 9 th Edition
Chapter 6: CPU Scheduling Silberschatz, Galvin and Gagne 2013 Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Thread Scheduling Multiple-Processor Scheduling Real-Time
More informationOperating Systems CMPSCI 377 Spring Mark Corner University of Massachusetts Amherst
Operating Systems CMPSCI 377 Spring 2017 Mark Corner University of Massachusetts Amherst Multilevel Feedback Queues (MLFQ) Multilevel feedback queues use past behavior to predict the future and assign
More informationGPU Consolidation for Cloud Games: Are We There Yet?
GPU Consolidation for Cloud Games: Are We There Yet? Hua-Jun Hong 1, Tao-Ya Fan-Chiang 1, Che-Run Lee 1, Kuan-Ta Chen 2, Chun-Ying Huang 3, Cheng-Hsin Hsu 1 1 Department of Computer Science, National Tsing
More informationApproaches to Performance Evaluation On Shared Memory and Cluster Architectures
Approaches to Performance Evaluation On Shared Memory and Cluster Architectures Peter Strazdins (and the CC-NUMA Team), CC-NUMA Project, Department of Computer Science, The Australian National University
More informationRecap. Run to completion in order of arrival Pros: simple, low overhead, good for batch jobs Cons: short jobs can stuck behind the long ones
Recap First-Come, First-Served (FCFS) Run to completion in order of arrival Pros: simple, low overhead, good for batch jobs Cons: short jobs can stuck behind the long ones Round-Robin (RR) FCFS with preemption.
More informationMultitasking and scheduling
Multitasking and scheduling Guillaume Salagnac Insa-Lyon IST Semester Fall 2017 2/39 Previously on IST-OPS: kernel vs userland pplication 1 pplication 2 VM1 VM2 OS Kernel rchitecture Hardware Each program
More informationThe RCU-Reader Preemption Problem in VMs
The RCU-Reader Preemption Problem in VMs Aravinda Prasad1, K Gopinath1, Paul E. McKenney2 1 2 Indian Institute of Science (IISc), Bangalore IBM Linux Technology Center, Beaverton 2017 USENIX Annual Technical
More informationCharacterizing Multi-threaded Applications based on Shared-Resource Contention
Characterizing Multi-threaded Applications based on Shared-Resource Contention Tanima Dey Wei Wang Jack W. Davidson Mary Lou Soffa Department of Computer Science University of Virginia Charlottesville,
More informationChapter 5: CPU Scheduling. Operating System Concepts 8 th Edition,
Chapter 5: CPU Scheduling Operating System Concepts 8 th Edition, Hanbat National Univ. Computer Eng. Dept. Y.J.Kim 2009 Chapter 5: Process Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms
More informationPreserving I/O Prioritization in Virtualized OSes
Preserving I/O Prioritization in Virtualized OSes Kun Suo 1, Yong Zhao 1, Jia Rao 1, Luwei Cheng 2, Xiaobo Zhou 3 and Francis C. M. Lau 4 1 University of Texas at Arlington, {kun.suo, yong.zhao, jia.rao}@uta.edu
More informationUniprocessor Scheduling. Basic Concepts Scheduling Criteria Scheduling Algorithms. Three level scheduling
Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three level scheduling 2 1 Types of Scheduling 3 Long- and Medium-Term Schedulers Long-term scheduler Determines which programs
More informationCPU Scheduling. The scheduling problem: When do we make decision? - Have K jobs ready to run - Have N 1 CPUs - Which jobs to assign to which CPU(s)
CPU Scheduling The scheduling problem: - Have K jobs ready to run - Have N 1 CPUs - Which jobs to assign to which CPU(s) When do we make decision? 1 / 31 CPU Scheduling new admitted interrupt exit terminated
More informationMediaTek CorePilot 2.0. Delivering extreme compute performance with maximum power efficiency
MediaTek CorePilot 2.0 Heterogeneous Computing Technology Delivering extreme compute performance with maximum power efficiency In July 2013, MediaTek delivered the industry s first mobile system on a chip
More informationExample: CPU-bound process that would run for 100 quanta continuously 1, 2, 4, 8, 16, 32, 64 (only 37 required for last run) Needs only 7 swaps
Interactive Scheduling Algorithms Continued o Priority Scheduling Introduction Round-robin assumes all processes are equal often not the case Assign a priority to each process, and always choose the process
More information