Towards Fair and Efficient SMP Virtual Machine Scheduling

Size: px
Start display at page:

Download "Towards Fair and Efficient SMP Virtual Machine Scheduling"

Transcription

1 Towards Fair and Efficient SMP Virtual Machine Scheduling Jia Rao and Xiaobo Zhou University of Colorado, Colorado Springs

2 Executive Summary Problem: unfairness and inefficiency in consolidating SMP VMs Existing VM schedulers favor VMs w/ more virtual CPUs Fairness mechanisms hurt parallel performance Flex is a scheduling framework that: Adaptively adjusts vcpu weights for VM-level fairness Flexibly schedules vcpus to minimize unnecessary spinning Results: 5% error to the ideally fair allocation, 30%+ performance improvement for parallel workloads, ~1% overhead in Xen

3 SMP VM Consolidation Abundant hardware parallelism in DC SMP VMs are prevalent in the cloud 26 out of 29 instance types in Amazon EC2 have more than two vcpus Support parallel applications Heterogeneous consolidation is common, e.g., Amazon EC2

4 Unfair VM CPU Allocation CPU allocation vcpu0 vcpu1 vcpu2 NOT proportional to VM weights VMs w/ more vcpus gain advantage Common issue in all hypervisors vcpu0 Normalized CPU consumption 3-vCPU VM 4-vCPU VM 1.59 vcpu1 vcpu2 pcpu0 pcpu1 pcpu2 pcpu3 Fair share = two CPU cores 1.42 vcpu Xen KVM VMware

5 Causes of Unfairness Per-CPU scheduling Independent scheduler on each CPU VM3 vcpu1 Each allocates CPU based on relative vcpu weights vcpu0 vcpu0 vcpu1 vcpu1 vcpu2 vcpu2 VM3 vcpu0 vcpu3 Per-vCPU weight dependent on VM weight and the number of vcpus Scalable but hard for VM-level fairness pcpu0 pcpu1 pcpu2 pcpu3 Equal weight VMs Box size reflects per-vcpu weight Allocations on vcpu => VM-level fairness IFF Same total weight on each CPU

6 Existing Solutions Capping VM CPU consumption (Cap) Requires pre-calculation of the fair share Introduce significant inefficiencies Non-work-conserving Load balancing (LB) to parallel applications Tries to achieve equal weights on CPUs Balance weight vs. balance run queue length

7 Cap on Busy-waiting-based Workloads Busy-waiting synchronization Dynamic task assignment Tasks stay in a busy loop waiting for lock release Avoids contexts switches CPU time (%) Homo. VM Heter. VM Useful work Busy waiting 20 Capping exacerbates the LHP issue in 0 lu sp ft ua bt cg ep mg Lock holder preemption (LHP) Preemption of vcpu holding locks virtualized environments Since spinning wastes CPU cycles, capping may mistakenly preempt vcpus that are doing useful work. Long synchronization latency

8 LB on Blocking-based Workloads Blocking synchronization Tasks go to sleep if failing to acquire the lock Avoids wasted CPU cycles vcpu stacking CPU time (%) LB exacerbates the vcpu stacking issue 20 0 bodytrack PIN LB Running Blocked canneal dedup facesim raytrace vips x264 vcpus belonging to one VM pile on the same CPU No parallelism + Long sync latency blackscholes streamcluster swaptions Since blocking vcpus frequently switch between READY and RUNNING states, they are more likely victims of work-stealing based load balancing. Gradually, stolen vcpus pile on a few CPUs

9 Related Work Fairness in multicore systems [Li-PPoPP09] - no VM-level fairness Minimizing sync latency in SMP VMs [Sukwong-Eurosys11] - avoids vcpu stacking Flex: non-intrusive, lightweight and applicable [Kim-ASPLOS13], [Weng-HPDC11] - not effective in user-level synchronization Pause loop exit (PLE) - needs hardware support Spin detection to different implementations [Wells-PACT06] - store-based spin detection, not accurate to apps with different store rates, e.g., LU in NAS parallel benchmark

10 Flex for Fairness and Efficiency Flexible vcpu weight (FlexW) Monitors VM CPU consumption Calculates fair shares based on VM weights Adjusts vcpu weights to compensate the difference Flexible vcpu scheduling (FlexS) Stops spinning vcpus to avoid wasted CPU cycles Switches the preempted vcpu with one on another CPU that is doing useful work Ensures that no vcpus from the same VM stack on one CPU

11 FlexW Design Determine the fair share P - number of shared CPUs, wi - VM weight Ideally fair share according to generalized processor sharing (GPS) each do S i,gp S (t 1,t 2 )= P w i (t 2 t 1 ) P w j Adjust VM weights wi r - VM weight P calculate the lag lag i (t 1,t 2 )= S i,gp S (t 1,t 2 ) S i (t 1,t 2 ) S i,gp S (t 1,t 2 ) r r P compensate the lag with real-time weights i,gp S wi r = wi r + w i lag i (t 1,t 2 ) if FAIR WIND

12 FlexS Design Identifying busy-waiting vcpu Non-intrusive identification without application knowledge Common pattern in different spin implementations - Spin loops contain a few instructions BPI Solo Spinning vcpu non-spinning vcpu lu sp cg povray - Spin loops are executed many times Spin loops show high branch per instruction (BPI) and low branch miss prediction rate (BMPR) BPMR Solo Spinning vcpu non-spinning vcpu lu sp cg povray

13 FlexS Design (cont ) Eliminating busy-waiting time Periodically update a vcpu s BPI and BMPR Busy-waiting vcpu voluntarily yields CPU Find a sibling vcpu to complete the unfinished time slice Switch the two vcpu to avoid vcpu stacking and run queue weight changes

14 Practical Considerations Starvation VMs demanding less than its share will have ever increasing real-time weight Solution: reset real-time weight every 10s Infeasible weight -> peak CPU demand less than the fair share Solution: peak demand as the fair share False positive in identifying spinning vcpu Solution: reset BPI and BMPR every 10s Inter-CPU locking overhead due to vcpu migrations Solution: only try twice when looking for siblings to switch- the power of two choices

15 Implementation Implement Flex in Xen s credit scheduler weight -> credit FlexW in the system-wide csched_acct() routine, adjusts VM credit refill based on real-time weights, invoked every 30ms FlexS in the per-cpu schedule() function, adds load_balance_switch() to exchange work with sibling vcpus Identify spinning vcpu in vcpu_acct() when Xen charges credit to the current running vcpu

16 Evaluation Methodology Questions: VM-level fairness? and parallel performance? Workload NAS Parallel benchmark (OpenMP, busy-waiting sync) PARSEC (Pthreads, blocking sync) Background interfering loops isolate from cache contention Scheduling strategies for comparison Xen default credit scheduler Balance+cap+CO - [Sukwong-Eurosys11] Demand+cap - [Kim-ASPLOS13]

17 VM-level Fairness Heterogeneous VMs: 1vCPU, 2vCPU, 3vCPU, 4vCPU Each running while(1) loop Relative lag = ness and differentiation. S i,gp S(t 1,t 2 ) S i (t 1,t 2 ) S i,gp S (t 1,t 2 ) Lower is better Relative lag (%) Xen Flex Number of VMs Flex: significant improvement over Xen with no more than 5% unfairness

18 VM Differentiation Proportional CPU share vCPU VM 3-vCPU VM 2-vCPU VM 3:2:1 3:1:2 2:1:3 2:3:1 1:2:3 1:3:2 VM weights Flex realizes proportional share among VMs

19 Parallel Performance Challenge: Flex allocates less CPU time to the 4vCPU VM than Xen Loop NAS Loop NAS Loop NAS NAS pcpu0 pcpu1 pcpu2 pcpu3 1.5 Xen Balance+cap+CO FlexW FlexW+FlexS Normalized runtime Observation: FlexW alone does NOT guarantee good perf. Reason: Imbalance wastes CPU time Results: FlexW+FlexS performs closely to Xen and balance+cap+co lu sp ft ua bt cg ep mg Lower is better

20 Parallel Performance NAS NAS NAS Xen FlexW Balance+cap+CO FlexW+FlexS Loop Loop Loop Loop 1.5 Normalized runtime pcpu0 pcpu1 pcpu2 pcpu3 Expected: Flex performs better than Xen Reason: Flex allocates more CPU time to the 3vCPU VM than Xen lu sp ft ua bt cg ep mg Lower is better

21 Mix of Parallel Workloads LU NAS LU NAS LU NAS NAS pcpu0 pcpu1 pcpu2 pcpu3 Results: Flex outperforms balance+cap by 30.4% Xen Balance+cap Flex Lower is better 1.5 Normalized runtime Dynamic task assignment lu sp ft ua bt cg ep mg Foreground NAS 0 Background lu

22 Overhead Execution time (micro-second) System-wide csched_acct() Xen Flex Per-CPU func. schedule() Xen Flex FlexW overhead: VM weight adjustment Overhead increases with # of VMs performed by the idle VM not affecting parallel performance FlexS overhead: vcpu stealing constant overhead, not increases with # of VMs frequency of schedule() - 30ms less than1% overhead

23 Conclusions & Future Work Fairness-efficiency tradeoff Straightforward solutions to unfairness lead to poor efficiency Flex: a holistic solution Adaptively adjusts weight for fairness Flexibly schedule vcpus to minimize wasted work Problem: NOT quite effective for apps with dynamic task assignment Future work Cross-layer application-cloud coordination

24 Thank you! Questions?

25 Backup Slides Begin Here

26 Parallel Performance (blocking) Loop Loop Loop PARSEC PARSEC PARSEC PARSEC Xen FlexW Demand+cap FlexW+FlexS pcpu0 pcpu1 pcpu2 pcpu3 1.5 Observation: Both Flex and demand+cap improve perf. 1 Reason: Avoiding vcpu stacking helps a lot 0.5 Conclusion: Flex does not incur much penalty to blocking sync.-based apps 0 blackscholes bodytrack caneal dedup facesim raytrace streamcluster swaptions vips x264

Improving Virtual Machine Scheduling in NUMA Multicore Systems

Improving Virtual Machine Scheduling in NUMA Multicore Systems Improving Virtual Machine Scheduling in NUMA Multicore Systems Jia Rao, Xiaobo Zhou University of Colorado, Colorado Springs Kun Wang, Cheng-Zhong Xu Wayne State University http://cs.uccs.edu/~jrao/ Multicore

More information

Scheduler Activations for Interference-Resilient SMP Virtual Machine Scheduling

Scheduler Activations for Interference-Resilient SMP Virtual Machine Scheduling Scheduler Activations for Interference-Resilient SMP Virtual Machine Scheduling Yong Zhao 1, Kun Suo 1, Luwei Cheng 2, and Jia Rao 1 1 The University of Texas at Arlington, {yong.zhao, kuo.suo, jia.rao}@uta.edu

More information

Preserving I/O Prioritization in Virtualized OSes

Preserving I/O Prioritization in Virtualized OSes Preserving I/O Prioritization in Virtualized OSes Kun Suo 1, Yong Zhao 1, Jia Rao 1, Luwei Cheng 2, Xiaobo Zhou 3, Francis C. M. Lau 4 The University of Texas at Arlington 1, Facebook 2, University of

More information

Towards Efficient Work-stealing in Virtualized Environments

Towards Efficient Work-stealing in Virtualized Environments Towards Efficient Work-stealing in Virtualized Environments Yaqiong Peng, Song Wu, Hai Jin Services Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science and Technology

More information

RT- Xen: Real- Time Virtualiza2on. Chenyang Lu Cyber- Physical Systems Laboratory Department of Computer Science and Engineering

RT- Xen: Real- Time Virtualiza2on. Chenyang Lu Cyber- Physical Systems Laboratory Department of Computer Science and Engineering RT- Xen: Real- Time Virtualiza2on Chenyang Lu Cyber- Physical Systems Laboratory Department of Computer Science and Engineering Embedded Systems Ø Consolidate 100 ECUs à ~10 multicore processors. Ø Integrate

More information

Micro-sliced Virtual Processors to Hide the Effect of Discontinuous CPU Availability for Consolidated Systems

Micro-sliced Virtual Processors to Hide the Effect of Discontinuous CPU Availability for Consolidated Systems Micro-sliced Virtual Processors to Hide the Effect of Discontinuous CPU Availability for Consolidated Systems Jeongseob Ahn, Chang Hyun Park, and Jaehyuk Huh Computer Science Department, KAIST {jeongseob,

More information

RT#Xen:(Real#Time( Virtualiza2on(for(the(Cloud( Chenyang(Lu( Cyber-Physical(Systems(Laboratory( Department(of(Computer(Science(and(Engineering(

RT#Xen:(Real#Time( Virtualiza2on(for(the(Cloud( Chenyang(Lu( Cyber-Physical(Systems(Laboratory( Department(of(Computer(Science(and(Engineering( RT#Xen:(Real#Time( Virtualiza2on(for(the(Cloud( Chenyang(Lu( Cyber-Physical(Systems(Laboratory( Department(of(Computer(Science(and(Engineering( Real#Time(Virtualiza2on(! Cars are becoming real-time mini-clouds!!

More information

Schedule Processes, not VCPUs

Schedule Processes, not VCPUs Schedule Processes, not VCPUs Xiang Song, Jicheng Shi, Haibo Chen, Binyu Zang Institute of Parallel and Distributed Systems School of Software, Shanghai Jiao Tong University Abstract Multiprocessor virtual

More information

Performance Optimization on Huawei Public and Private Cloud

Performance Optimization on Huawei Public and Private Cloud Performance Optimization on Huawei Public and Private Cloud Jinsong Liu Lei Gong Agenda Optimization for LHP Balance scheduling RTC optimization 2 Agenda

More information

Virtual Snooping: Filtering Snoops in Virtualized Multi-cores

Virtual Snooping: Filtering Snoops in Virtualized Multi-cores Appears in the 43 rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-43) Virtual Snooping: Filtering Snoops in Virtualized Multi-cores Daehoon Kim, Hwanju Kim, and Jaehyuk Huh Computer

More information

RT- Xen: Real- Time Virtualiza2on from embedded to cloud compu2ng

RT- Xen: Real- Time Virtualiza2on from embedded to cloud compu2ng RT- Xen: Real- Time Virtualiza2on from embedded to cloud compu2ng Chenyang Lu Cyber- Physical Systems Laboratory Department of Computer Science and Engineering Real- Time Virtualiza2on for Cars Ø Consolidate

More information

IN the cloud, the number of virtual CPUs (VCPUs) in a virtual

IN the cloud, the number of virtual CPUs (VCPUs) in a virtual IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 28, NO. 7, JULY 2017 1811 APPLES: Efficiently Handling Spin-lock Synchronization on Virtualized Platforms Jianchen Shan, Xiaoning Ding, and Narain

More information

Real-time scheduling for virtual machines in SK Telecom

Real-time scheduling for virtual machines in SK Telecom Real-time scheduling for virtual machines in SK Telecom Eunkyu Byun Cloud Computing Lab., SK Telecom Sponsored by: & & Cloud by Virtualization in SKT Provide virtualized ICT infra to customers like Amazon

More information

Preemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization

Preemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization Preemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization Wei Chen, Jia Rao*, and Xiaobo Zhou University of Colorado, Colorado Springs * University of Texas at Arlington Data Center

More information

Real-Time Internet of Things

Real-Time Internet of Things Real-Time Internet of Things Chenyang Lu Cyber-Physical Systems Laboratory h7p://www.cse.wustl.edu/~lu/ Internet of Things Ø Convergence of q Miniaturized devices: integrate processor, sensors and radios.

More information

Sampled Simulation of Multi-Threaded Applications

Sampled Simulation of Multi-Threaded Applications Sampled Simulation of Multi-Threaded Applications Trevor E. Carlson, Wim Heirman, Lieven Eeckhout Department of Electronics and Information Systems, Ghent University, Belgium Intel ExaScience Lab, Belgium

More information

Poris: A Scheduler for Parallel Soft Real-Time Applications in Virtualized Environments

Poris: A Scheduler for Parallel Soft Real-Time Applications in Virtualized Environments IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. X, NO. Y, MONTH YEAR 1 Poris: A Scheduler for Parallel Soft Real-Time Applications in Virtualized Environments Song Wu, Member, I EEE, Like Zhou,

More information

vprobe: Scheduling Virtual Machines on NUMA Systems

vprobe: Scheduling Virtual Machines on NUMA Systems vprobe: Scheduling Virtual Machines on NUMA Systems Song Wu, Huahua Sun, Like Zhou, Qingtian Gan, Hai Jin Service Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science

More information

CFS-v: I/O Demand-driven VM Scheduler in KVM

CFS-v: I/O Demand-driven VM Scheduler in KVM CFS-v: Demand-driven VM Scheduler in KVM Hyotaek Shim and Sung-Min Lee (hyotaek.shim, sung.min.lee@samsung.com) Software R&D Center, Samsung Electronics 2014. 10. 16 Problem in Server Consolidation 2/16

More information

VMware vsphere 4: The CPU Scheduler in VMware ESX 4 W H I T E P A P E R

VMware vsphere 4: The CPU Scheduler in VMware ESX 4 W H I T E P A P E R VMware vsphere 4: The CPU Scheduler in VMware ESX 4 W H I T E P A P E R Table of Contents 1 Introduction..................................................... 3 2 ESX CPU Scheduler Overview......................................

More information

Introduction Contech s Task Graph Representation Parallel Program Instrumentation (Break) Analysis and Usage of a Contech Task Graph Hands-on

Introduction Contech s Task Graph Representation Parallel Program Instrumentation (Break) Analysis and Usage of a Contech Task Graph Hands-on Introduction Contech s Task Graph Representation Parallel Program Instrumentation (Break) Analysis and Usage of a Contech Task Graph Hands-on Exercises 2 Contech is An LLVM compiler pass to instrument

More information

Application-Specific Configuration Selection in the Cloud: Impact of Provider Policy and Potential of Systematic Testing

Application-Specific Configuration Selection in the Cloud: Impact of Provider Policy and Potential of Systematic Testing Application-Specific Configuration Selection in the Cloud: Impact of Provider Policy and Potential of Systematic Testing Mohammad Hajjat +, Ruiqi Liu*, Yiyang Chang +, T.S. Eugene Ng*, Sanjay Rao + + Purdue

More information

Demand-Based Coordinated Scheduling for SMP VMs

Demand-Based Coordinated Scheduling for SMP VMs Demand-Based Coordinated Scheduling for SMP VMs Hwanju Kim, Sangwook Kim, Jinkyu Jeong, Joonwon Lee, Seungryoul Maeng Computer Science Department, Korea Advanced Institute of Science and Technology (KAIST),

More information

PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites

PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites Christian Bienia (Princeton University), Sanjeev Kumar (Intel), Kai Li (Princeton University) Outline Overview What

More information

CPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou ( Zhejiang University

CPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou (  Zhejiang University Operating Systems (Fall/Winter 2018) CPU Scheduling Yajin Zhou (http://yajin.org) Zhejiang University Acknowledgement: some pages are based on the slides from Zhi Wang(fsu). Review Motivation to use threads

More information

PROPORTIONAL fairness in CPU scheduling mandates

PROPORTIONAL fairness in CPU scheduling mandates IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. XX, MAY 218 1 GVTS: Global Virtual Time Fair Scheduling to Support Strict on Many Cores Changdae Kim, Seungbeom Choi, Jaehyuk Huh, Member,

More information

Synchronization-Aware Scheduling for Virtual Clusters in Cloud

Synchronization-Aware Scheduling for Virtual Clusters in Cloud TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS Synchronization-Aware Scheduling for Virtual Clusters in Cloud Song Wu, Member, IEEE, Haibao Chen, Sheng Di, Member, IEEE, Bingbing Zhou, Zhenjiang Xie,

More information

Adaptive Power Profiling for Many-Core HPC Architectures

Adaptive Power Profiling for Many-Core HPC Architectures Adaptive Power Profiling for Many-Core HPC Architectures Jaimie Kelley, Christopher Stewart The Ohio State University Devesh Tiwari, Saurabh Gupta Oak Ridge National Laboratory State-of-the-Art Schedulers

More information

CS 326: Operating Systems. CPU Scheduling. Lecture 6

CS 326: Operating Systems. CPU Scheduling. Lecture 6 CS 326: Operating Systems CPU Scheduling Lecture 6 Today s Schedule Agenda? Context Switches and Interrupts Basic Scheduling Algorithms Scheduling with I/O Symmetric multiprocessing 2/7/18 CS 326: Operating

More information

Abstract. Testing Parameters. Introduction. Hardware Platform. Native System

Abstract. Testing Parameters. Introduction. Hardware Platform. Native System Abstract In this paper, we address the latency issue in RT- XEN virtual machines that are available in Xen 4.5. Despite the advantages of applying virtualization to systems, the default credit scheduler

More information

Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters

Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters Matthew Koop 1 Miao Luo D. K. Panda matthew.koop@nasa.gov {luom, panda}@cse.ohio-state.edu 1 NASA Center for Computational

More information

Lecture Topics. Announcements. Today: Uniprocessor Scheduling (Stallings, chapter ) Next: Advanced Scheduling (Stallings, chapter

Lecture Topics. Announcements. Today: Uniprocessor Scheduling (Stallings, chapter ) Next: Advanced Scheduling (Stallings, chapter Lecture Topics Today: Uniprocessor Scheduling (Stallings, chapter 9.1-9.3) Next: Advanced Scheduling (Stallings, chapter 10.1-10.4) 1 Announcements Self-Study Exercise #10 Project #8 (due 11/16) Project

More information

Performance Evaluation of OpenMP Applications on Virtualized Multicore Machines

Performance Evaluation of OpenMP Applications on Virtualized Multicore Machines Performance Evaluation of OpenMP Applications on Virtualized Multicore Machines Jie Tao 1 Karl Fuerlinger 2 Holger Marten 1 jie.tao@kit.edu karl.fuerlinger@nm.ifi.lmu.de holger.marten@kit.edu 1 : Steinbuch

More information

Data Criticality in Network-On-Chip Design. Joshua San Miguel Natalie Enright Jerger

Data Criticality in Network-On-Chip Design. Joshua San Miguel Natalie Enright Jerger Data Criticality in Network-On-Chip Design Joshua San Miguel Natalie Enright Jerger Network-On-Chip Efficiency Efficiency is the ability to produce results with the least amount of waste. Wasted time Wasted

More information

Analysis of Virtual Machine Scalability based on Queue Spinlock

Analysis of Virtual Machine Scalability based on Queue Spinlock , pp.15-19 http://dx.doi.org/10.14257/astl.2017.148.04 Analysis of Virtual Machine Scalability based on Queue Spinlock Seunghyub Jeon, Seung-Jun Cha, Yeonjeong Jung, Jinmee Kim and Sungin Jung Electronics

More information

Exploring the Throughput-Fairness Trade-off on Asymmetric Multicore Systems

Exploring the Throughput-Fairness Trade-off on Asymmetric Multicore Systems Exploring the Throughput-Fairness Trade-off on Asymmetric Multicore Systems J.C. Sáez, A. Pousa, F. Castro, D. Chaver y M. Prieto Complutense University of Madrid, Universidad Nacional de la Plata-LIDI

More information

Bottleneck Identification and Scheduling in Multithreaded Applications. José A. Joao M. Aater Suleman Onur Mutlu Yale N. Patt

Bottleneck Identification and Scheduling in Multithreaded Applications. José A. Joao M. Aater Suleman Onur Mutlu Yale N. Patt Bottleneck Identification and Scheduling in Multithreaded Applications José A. Joao M. Aater Suleman Onur Mutlu Yale N. Patt Executive Summary Problem: Performance and scalability of multithreaded applications

More information

Mbench: Benchmarking a Multicore Operating System Using Mixed Workloads

Mbench: Benchmarking a Multicore Operating System Using Mixed Workloads Mbench: Benchmarking a Multicore Operating System Using Mixed Workloads Gang Lu and Xinlong Lin Institute of Computing Technology, Chinese Academy of Sciences BPOE-6, Sep 4, 2015 Backgrounds Fast evolution

More information

Identifying Ad-hoc Synchronization for Enhanced Race Detection

Identifying Ad-hoc Synchronization for Enhanced Race Detection Identifying Ad-hoc Synchronization for Enhanced Race Detection IPD Tichy Lehrstuhl für Programmiersysteme IPDPS 20 April, 2010 Ali Jannesari / Walter F. Tichy KIT die Kooperation von Forschungszentrum

More information

Improving CPU Performance of Xen Hypervisor in Virtualized Environment

Improving CPU Performance of Xen Hypervisor in Virtualized Environment ISSN: 2393-8528 Contents lists available at www.ijicse.in International Journal of Innovative Computer Science & Engineering Volume 5 Issue 3; May-June 2018; Page No. 14-19 Improving CPU Performance of

More information

Overcoming Virtualization Overheads for Large-vCPU Virtual Machines

Overcoming Virtualization Overheads for Large-vCPU Virtual Machines Overcoming Virtualization Overheads for Large-vCPU Virtual Machines Ozgur Kilic, Spoorti Doddamani, Aprameya Bhat, Hardik Bagdi, Kartik Gopalan Contact: {okilic1,sdoddam1,abhat3,hbagdi1,kartik}@binghamton.edu

More information

Performance & Scalability Testing in Virtual Environment Hemant Gaidhani, Senior Technical Marketing Manager, VMware

Performance & Scalability Testing in Virtual Environment Hemant Gaidhani, Senior Technical Marketing Manager, VMware Performance & Scalability Testing in Virtual Environment Hemant Gaidhani, Senior Technical Marketing Manager, VMware 2010 VMware Inc. All rights reserved About the Speaker Hemant Gaidhani Senior Technical

More information

MARACAS: A Real-Time Multicore VCPU Scheduling Framework

MARACAS: A Real-Time Multicore VCPU Scheduling Framework : A Real-Time Framework Computer Science Department Boston University Overview 1 2 3 4 5 6 7 Motivation platforms are gaining popularity in embedded and real-time systems concurrent workload support less

More information

MOST PROGRESS MADE ALGORITHM: COMBATING SYNCHRONIZATION INDUCED PERFORMANCE LOSS ON SALVAGED CHIP MULTI-PROCESSORS

MOST PROGRESS MADE ALGORITHM: COMBATING SYNCHRONIZATION INDUCED PERFORMANCE LOSS ON SALVAGED CHIP MULTI-PROCESSORS MOST PROGRESS MADE ALGORITHM: COMBATING SYNCHRONIZATION INDUCED PERFORMANCE LOSS ON SALVAGED CHIP MULTI-PROCESSORS by Jacob J. Dutson A thesis submitted in partial fulfillment of the requirements for the

More information

Chapter 5 C. Virtual machines

Chapter 5 C. Virtual machines Chapter 5 C Virtual machines Virtual Machines Host computer emulates guest operating system and machine resources Improved isolation of multiple guests Avoids security and reliability problems Aids sharing

More information

WALL: A Writeback-Aware LLC Management for PCM-based Main Memory Systems

WALL: A Writeback-Aware LLC Management for PCM-based Main Memory Systems : A Writeback-Aware LLC Management for PCM-based Main Memory Systems Bahareh Pourshirazi *, Majed Valad Beigi, Zhichun Zhu *, and Gokhan Memik * University of Illinois at Chicago Northwestern University

More information

Enhancement of Xen s Scheduler for MapReduce Workloads

Enhancement of Xen s Scheduler for MapReduce Workloads Enhancement of Xen s Scheduler for MapReduce Workloads Hui Kang (1) Yao Chen (1) Jennifer Wong (1) Radu Sion (1) Jason Wu (2) (1) CS Department SUNY Stony Brook University (2) CS Department Cornell University

More information

vcache: Architectural Support for Transparent and Isolated Virtual LLCs in Virtualized Environments

vcache: Architectural Support for Transparent and Isolated Virtual LLCs in Virtualized Environments vcache: Architectural Support for Transparent and Isolated Virtual LLCs in Virtualized Environments Daehoon Kim *, Hwanju Kim, Nam Sung Kim *, and Jaehyuk Huh * University of Illinois at Urbana-Champaign,

More information

Challenges of Memory Management on Modern NUMA Systems

Challenges of Memory Management on Modern NUMA Systems DOI:1.1145/2814328 Article development led by queue.acm.org Optimizing NUMA systems applications with Carrefour. BY FABIEN GAUD, BAPTISTE LEPERS, JUSTIN FUNSTON, MOHAMMAD DASHTI, ALEXANDRA FEDOROVA, VIVIEN

More information

Energy Models for DVFS Processors

Energy Models for DVFS Processors Energy Models for DVFS Processors Thomas Rauber 1 Gudula Rünger 2 Michael Schwind 2 Haibin Xu 2 Simon Melzner 1 1) Universität Bayreuth 2) TU Chemnitz 9th Scheduling for Large Scale Systems Workshop July

More information

CS Computer Systems. Lecture 6: Process Scheduling

CS Computer Systems. Lecture 6: Process Scheduling CS 5600 Computer Systems Lecture 6: Process Scheduling Scheduling Basics Simple Schedulers Priority Schedulers Fair Share Schedulers Multi-CPU Scheduling Case Study: The Linux Kernel 2 Setting the Stage

More information

A Case for High Performance Computing with Virtual Machines

A Case for High Performance Computing with Virtual Machines A Case for High Performance Computing with Virtual Machines Wei Huang*, Jiuxing Liu +, Bulent Abali +, and Dhabaleswar K. Panda* *The Ohio State University +IBM T. J. Waston Research Center Presentation

More information

Task Scheduling of Real- Time Media Processing with Hardware-Assisted Virtualization Heikki Holopainen

Task Scheduling of Real- Time Media Processing with Hardware-Assisted Virtualization Heikki Holopainen Task Scheduling of Real- Time Media Processing with Hardware-Assisted Virtualization Heikki Holopainen Aalto University School of Electrical Engineering Degree Programme in Communications Engineering Supervisor:

More information

Fault Tolerant Java Virtual Machine. Roy Friedman and Alon Kama Technion Haifa, Israel

Fault Tolerant Java Virtual Machine. Roy Friedman and Alon Kama Technion Haifa, Israel Fault Tolerant Java Virtual Machine Roy Friedman and Alon Kama Technion Haifa, Israel Objective Create framework for transparent fault-tolerance Support legacy applications Intended for long-lived, highly

More information

Real-Time Cache Management for Multi-Core Virtualization

Real-Time Cache Management for Multi-Core Virtualization Real-Time Cache Management for Multi-Core Virtualization Hyoseung Kim 1,2 Raj Rajkumar 2 1 University of Riverside, California 2 Carnegie Mellon University Benefits of Multi-Core Processors Consolidation

More information

Embracing Heterogeneity with Dynamic Core Boosting

Embracing Heterogeneity with Dynamic Core Boosting Embracing Heterogeneity with Dynamic Core Boosting Hyoun Kyu Cho and Scott Mahlke Advanced Computer Architecture Laboratory University of Michigan - Ann Arbor, MI {netforce, mahlke}@umich.edu ABSTRACT

More information

Energy-centric DVFS Controlling Method for Multi-core Platforms

Energy-centric DVFS Controlling Method for Multi-core Platforms Energy-centric DVFS Controlling Method for Multi-core Platforms Shin-gyu Kim, Chanho Choi, Hyeonsang Eom, Heon Y. Yeom Seoul National University, Korea MuCoCoS 2012 Salt Lake City, Utah Abstract Goal To

More information

LECTURE 3:CPU SCHEDULING

LECTURE 3:CPU SCHEDULING LECTURE 3:CPU SCHEDULING 1 Outline Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time CPU Scheduling Operating Systems Examples Algorithm Evaluation 2 Objectives

More information

Empirical Evaluation of Latency-Sensitive Application Performance in the Cloud

Empirical Evaluation of Latency-Sensitive Application Performance in the Cloud Empirical Evaluation of Latency-Sensitive Application Performance in the Cloud Sean Barker and Prashant Shenoy University of Massachusetts Amherst Department of Computer Science Cloud Computing! Cloud

More information

Diagnosing Virtualization Overhead for Multi-threaded Computation on Multicore Platforms

Diagnosing Virtualization Overhead for Multi-threaded Computation on Multicore Platforms 215 IEEE 7th International Conference on Cloud Computing Technology and Science Diagnosing Virtualization Overhead for Multi-threaded Computation on Multicore Platforms Xiaoning Ding Jianchen Shan Department

More information

Getting Started with

Getting Started with /************************************************************************* * LaCASA Laboratory * * Authors: Aleksandar Milenkovic with help of Mounika Ponugoti * * Email: milenkovic@computer.org * * Date:

More information

W4118: advanced scheduling

W4118: advanced scheduling W4118: advanced scheduling Instructor: Junfeng Yang References: Modern Operating Systems (3 rd edition), Operating Systems Concepts (8 th edition), previous W4118, and OS at MIT, Stanford, and UWisc Outline

More information

An interface to implement NUMA policies in the Xen hypervisor

An interface to implement NUMA policies in the Xen hypervisor An interface to implement NUMA policies in the Xen hypervisor Gauthier Voron Sorbonne Universités, UPMC Univ Paris 06, CNRS, INRIA, LIP6 Gaël Thomas Télécom SudParis, Université Paris-Saclay Vivien Quéma

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 27 Virtualization Slides based on Various sources 1 1 Virtualization Why we need virtualization? The concepts and

More information

Enhancing Linux Scheduler Scalability

Enhancing Linux Scheduler Scalability Enhancing Linux Scheduler Scalability Mike Kravetz IBM Linux Technology Center Hubertus Franke, Shailabh Nagar, Rajan Ravindran IBM Thomas J. Watson Research Center {mkravetz,frankeh,nagar,rajancr}@us.ibm.com

More information

CPU Scheduling. The scheduling problem: When do we make decision? - Have K jobs ready to run - Have N 1 CPUs - Which jobs to assign to which CPU(s)

CPU Scheduling. The scheduling problem: When do we make decision? - Have K jobs ready to run - Have N 1 CPUs - Which jobs to assign to which CPU(s) 1/32 CPU Scheduling The scheduling problem: - Have K jobs ready to run - Have N 1 CPUs - Which jobs to assign to which CPU(s) When do we make decision? 2/32 CPU Scheduling Scheduling decisions may take

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 23 Virtual memory Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 FAQ Is a page replaces when

More information

Microsoft Applications on Nutanix

Microsoft Applications on Nutanix Microsoft Applications on Nutanix Lukas Lundell Sachin Chheda Chris Brown #nextconf #AW105 Agenda Why MS Exchange or any vbca on Nutanix Exchange Solution Design Methodology SharePoint and Unified Communications

More information

vscope: A Fine-Grained Approach to Schedule vcpus in NUMA Systems

vscope: A Fine-Grained Approach to Schedule vcpus in NUMA Systems 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems

More information

Performance Prediction and Evaluation of Parallel Applications in KVM, Xen, and VMware

Performance Prediction and Evaluation of Parallel Applications in KVM, Xen, and VMware Performance Prediction and Evaluation of Parallel Applications in KVM, Xen, and VMware Cheol-Ho Hong 1, Beom-Joon Kim 2, Young-Pil Kim 1, Hyunchan Park 1, and Chuck Yoo 1 1 Korea University, Seoul, South

More information

RAMP-White / FAST-MP

RAMP-White / FAST-MP RAMP-White / FAST-MP Hari Angepat and Derek Chiou Electrical and Computer Engineering University of Texas at Austin Supported in part by DOE, NSF, SRC,Bluespec, Intel, Xilinx, IBM, and Freescale RAMP-White

More information

Our Many-core Benchmarks Do Not Use That Many Cores Paul D. Bryan Jesse G. Beu Thomas M. Conte Paolo Faraboschi Daniel Ortega

Our Many-core Benchmarks Do Not Use That Many Cores Paul D. Bryan Jesse G. Beu Thomas M. Conte Paolo Faraboschi Daniel Ortega Our Many-core Benchmarks Do Not Use That Many Cores Paul D. Bryan Jesse G. Beu Thomas M. Conte Paolo Faraboschi Daniel Ortega Georgia Institute of Technology HP Labs, Exascale Computing Lab {paul.bryan,

More information

Systematic Cooperation in P2P Grids

Systematic Cooperation in P2P Grids 29th October 2008 Cyril Briquet Doctoral Dissertation in Computing Science Department of EE & CS (Montefiore Institute) University of Liège, Belgium Application class: Bags of Tasks Bag of Task = set of

More information

CPU Scheduling. CSE 2431: Introduction to Operating Systems Reading: Chapter 6, [OSC] (except Sections )

CPU Scheduling. CSE 2431: Introduction to Operating Systems Reading: Chapter 6, [OSC] (except Sections ) CPU Scheduling CSE 2431: Introduction to Operating Systems Reading: Chapter 6, [OSC] (except Sections 6.7.2 6.8) 1 Contents Why Scheduling? Basic Concepts of Scheduling Scheduling Criteria A Basic Scheduling

More information

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition Chapter 6: CPU Scheduling Silberschatz, Galvin and Gagne 2013 Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Thread Scheduling Multiple-Processor Scheduling Real-Time

More information

What s An OS? Cyclic Executive. Interrupts. Advantages Simple implementation Low overhead Very predictable

What s An OS? Cyclic Executive. Interrupts. Advantages Simple implementation Low overhead Very predictable What s An OS? Provides environment for executing programs Process abstraction for multitasking/concurrency scheduling Hardware abstraction layer (device drivers) File systems Communication Do we need an

More information

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme SER1815BU DRS Advancements: What's New and What Is Being Cooked Up in Resource Management Land VMworld 2017 Thomas Bryant, VMware, Inc - @kix1979 Maarten Wiggers, VMware, Inc Content: Not for publication

More information

Frequently asked questions from the previous class survey

Frequently asked questions from the previous class survey CS 370: OPERATING SYSTEMS [CPU SCHEDULING] Shrideep Pallickara Computer Science Colorado State University L15.1 Frequently asked questions from the previous class survey Could we record burst times in

More information

Implementing Efficient and Scalable Flow Control Schemes in MPI over InfiniBand

Implementing Efficient and Scalable Flow Control Schemes in MPI over InfiniBand Implementing Efficient and Scalable Flow Control Schemes in MPI over InfiniBand Jiuxing Liu and Dhabaleswar K. Panda Computer Science and Engineering The Ohio State University Presentation Outline Introduction

More information

Set Cooperative Cache for Virtual Machine Relocation

Set Cooperative Cache for Virtual Machine Relocation International Conference on Logistics Engineering, Management and Computer Science (LEMCS 2015) Set Cooperative Cache for Virtual Machine Relocation Cong Hu Anhui Electric Power Corporation, Information

More information

Optimizing Replication, Communication, and Capacity Allocation in CMPs

Optimizing Replication, Communication, and Capacity Allocation in CMPs Optimizing Replication, Communication, and Capacity Allocation in CMPs Zeshan Chishti, Michael D Powell, and T. N. Vijaykumar School of ECE Purdue University Motivation CMP becoming increasingly important

More information

VARIABILITY IN OPERATING SYSTEMS

VARIABILITY IN OPERATING SYSTEMS VARIABILITY IN OPERATING SYSTEMS Brian Kocoloski Assistant Professor in CSE Dept. October 8, 2018 1 CLOUD COMPUTING Current estimate is that 94% of all computation will be performed in the cloud by 2021

More information

Software-Controlled Transparent Management of Heterogeneous Memory Resources in Virtualized Systems

Software-Controlled Transparent Management of Heterogeneous Memory Resources in Virtualized Systems Software-Controlled Transparent Management of Heterogeneous Memory Resources in Virtualized Systems Min Lee Vishal Gupta Karsten Schwan College of Computing Georgia Institute of Technology {minlee,vishal,schwan}@cc.gatech.edu

More information

A Simulation Framework to Evaluate Virtual CPU Scheduling Algorithms

A Simulation Framework to Evaluate Virtual CPU Scheduling Algorithms 2013 IEEE 33rd International Conference on Distributed Computing Systems Workshops A Simulation Framework to Evaluate Virtual CPU Scheduling Algorithms Cuong Pham, Qingkun Li, Zachary Estrada, Zbigniew

More information

KEYNOTE FOR

KEYNOTE FOR THE OS SCHEDULER: A PERFORMANCE-CRITICAL COMPONENT IN LINUX CLUSTER ENVIRONMENTS KEYNOTE FOR BPOE-9 @ASPLOS2018 THE NINTH WORKSHOP ON BIG DATA BENCHMARKS, PERFORMANCE, OPTIMIZATION AND EMERGING HARDWARE

More information

Process- Concept &Process Scheduling OPERATING SYSTEMS

Process- Concept &Process Scheduling OPERATING SYSTEMS OPERATING SYSTEMS Prescribed Text Book Operating System Principles, Seventh Edition By Abraham Silberschatz, Peter Baer Galvin and Greg Gagne PROCESS MANAGEMENT Current day computer systems allow multiple

More information

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition Chapter 6: CPU Scheduling Silberschatz, Galvin and Gagne 2013 Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Thread Scheduling Multiple-Processor Scheduling Real-Time

More information

Operating Systems CMPSCI 377 Spring Mark Corner University of Massachusetts Amherst

Operating Systems CMPSCI 377 Spring Mark Corner University of Massachusetts Amherst Operating Systems CMPSCI 377 Spring 2017 Mark Corner University of Massachusetts Amherst Multilevel Feedback Queues (MLFQ) Multilevel feedback queues use past behavior to predict the future and assign

More information

GPU Consolidation for Cloud Games: Are We There Yet?

GPU Consolidation for Cloud Games: Are We There Yet? GPU Consolidation for Cloud Games: Are We There Yet? Hua-Jun Hong 1, Tao-Ya Fan-Chiang 1, Che-Run Lee 1, Kuan-Ta Chen 2, Chun-Ying Huang 3, Cheng-Hsin Hsu 1 1 Department of Computer Science, National Tsing

More information

Approaches to Performance Evaluation On Shared Memory and Cluster Architectures

Approaches to Performance Evaluation On Shared Memory and Cluster Architectures Approaches to Performance Evaluation On Shared Memory and Cluster Architectures Peter Strazdins (and the CC-NUMA Team), CC-NUMA Project, Department of Computer Science, The Australian National University

More information

Recap. Run to completion in order of arrival Pros: simple, low overhead, good for batch jobs Cons: short jobs can stuck behind the long ones

Recap. Run to completion in order of arrival Pros: simple, low overhead, good for batch jobs Cons: short jobs can stuck behind the long ones Recap First-Come, First-Served (FCFS) Run to completion in order of arrival Pros: simple, low overhead, good for batch jobs Cons: short jobs can stuck behind the long ones Round-Robin (RR) FCFS with preemption.

More information

Multitasking and scheduling

Multitasking and scheduling Multitasking and scheduling Guillaume Salagnac Insa-Lyon IST Semester Fall 2017 2/39 Previously on IST-OPS: kernel vs userland pplication 1 pplication 2 VM1 VM2 OS Kernel rchitecture Hardware Each program

More information

The RCU-Reader Preemption Problem in VMs

The RCU-Reader Preemption Problem in VMs The RCU-Reader Preemption Problem in VMs Aravinda Prasad1, K Gopinath1, Paul E. McKenney2 1 2 Indian Institute of Science (IISc), Bangalore IBM Linux Technology Center, Beaverton 2017 USENIX Annual Technical

More information

Characterizing Multi-threaded Applications based on Shared-Resource Contention

Characterizing Multi-threaded Applications based on Shared-Resource Contention Characterizing Multi-threaded Applications based on Shared-Resource Contention Tanima Dey Wei Wang Jack W. Davidson Mary Lou Soffa Department of Computer Science University of Virginia Charlottesville,

More information

Chapter 5: CPU Scheduling. Operating System Concepts 8 th Edition,

Chapter 5: CPU Scheduling. Operating System Concepts 8 th Edition, Chapter 5: CPU Scheduling Operating System Concepts 8 th Edition, Hanbat National Univ. Computer Eng. Dept. Y.J.Kim 2009 Chapter 5: Process Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

More information

Preserving I/O Prioritization in Virtualized OSes

Preserving I/O Prioritization in Virtualized OSes Preserving I/O Prioritization in Virtualized OSes Kun Suo 1, Yong Zhao 1, Jia Rao 1, Luwei Cheng 2, Xiaobo Zhou 3 and Francis C. M. Lau 4 1 University of Texas at Arlington, {kun.suo, yong.zhao, jia.rao}@uta.edu

More information

Uniprocessor Scheduling. Basic Concepts Scheduling Criteria Scheduling Algorithms. Three level scheduling

Uniprocessor Scheduling. Basic Concepts Scheduling Criteria Scheduling Algorithms. Three level scheduling Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three level scheduling 2 1 Types of Scheduling 3 Long- and Medium-Term Schedulers Long-term scheduler Determines which programs

More information

CPU Scheduling. The scheduling problem: When do we make decision? - Have K jobs ready to run - Have N 1 CPUs - Which jobs to assign to which CPU(s)

CPU Scheduling. The scheduling problem: When do we make decision? - Have K jobs ready to run - Have N 1 CPUs - Which jobs to assign to which CPU(s) CPU Scheduling The scheduling problem: - Have K jobs ready to run - Have N 1 CPUs - Which jobs to assign to which CPU(s) When do we make decision? 1 / 31 CPU Scheduling new admitted interrupt exit terminated

More information

MediaTek CorePilot 2.0. Delivering extreme compute performance with maximum power efficiency

MediaTek CorePilot 2.0. Delivering extreme compute performance with maximum power efficiency MediaTek CorePilot 2.0 Heterogeneous Computing Technology Delivering extreme compute performance with maximum power efficiency In July 2013, MediaTek delivered the industry s first mobile system on a chip

More information

Example: CPU-bound process that would run for 100 quanta continuously 1, 2, 4, 8, 16, 32, 64 (only 37 required for last run) Needs only 7 swaps

Example: CPU-bound process that would run for 100 quanta continuously 1, 2, 4, 8, 16, 32, 64 (only 37 required for last run) Needs only 7 swaps Interactive Scheduling Algorithms Continued o Priority Scheduling Introduction Round-robin assumes all processes are equal often not the case Assign a priority to each process, and always choose the process

More information