Analysis of Virtual Machine Scalability based on Queue Spinlock

Size: px
Start display at page:

Download "Analysis of Virtual Machine Scalability based on Queue Spinlock"

Transcription

1 , pp Analysis of Virtual Machine Scalability based on Queue Spinlock Seunghyub Jeon, Seung-Jun Cha, Yeonjeong Jung, Jinmee Kim and Sungin Jung Electronics and Telecommunications Research Institute, 218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, Korea {shjeon00, seungjunn, yjjeong, jinmee, Abstract. Depending on the needs of the applications that require a lot of memory and processing resources, cloud providers are offering instances that have many cores, but they have not been able to provide performance scalability based on the number of cpu. To solve these problems, various locking mechanisms have been proposed, and Linux kernel 4.2 provides a queue spinlock. In this paper, we analyze queue spinlock performance problems of a manycore virtual machine through benchmark and suggest simple ways to improve them. Keywords: queue spinlock, virtual machine scalability. 1 Introduction Applications such as in-memory databases, big-data analytics, and deep-learning are becoming increasingly popular. The characteristics of these applications require large amounts of memory and robust processing power to process large amounts of data simultaneously. To meet the requirements of these applications, cloud providers such as amazon and google have started offering enterprise-class X1 instances with 128- cores [1] and n1-standard-96 instance with 96-cores [2] respectively. However, the performance of a virtual machine does not increase in proportion to the number of cores [3][4]. One cause of this problem is due to ticket spinlock in Linux that generates cache coherency traffics, and another cause is performance anomaly in virtualized lock such as lock holder preemption problem(lhp), lock waiter preemption problem(lwp), and sleepy spinlock anomaly(ssa) [4], etc. Linux kernel 4.2 introduces queue spinlock that compensates the problems of existing ticket spinlocks. Hardware VM(HVM) using PLE(Pause-loop-exit) was scalable according to the number of cores, but it still suffered extreme performance degradation in the overcommitted virtualized environment. On the other hand, paravirtualized VM(PVM) has performance degradation from 90 cores, but it shows better performance than HVM in overcommitted environment [5]. In this paper, we improve the scalability of PVM by adding a hypercall to use the PLE handler of HVM and show the problems of HVM and PVM in the overcommitted state through performance analysis. ISSN: ASTL Copyright 2017 SERSC

2 2 Background and Related Work 2.1 Paravirtualized Queue Spinlock Queue spinlock [7] is a customized version of MCS lock [6] that has been modified to fit the existing Linux spinlock data structure. Queue spinlock is able to eliminate the cache-line bouncing by using per-cpu structure. AIM7 benchmark shows good results in case of high contention. [7]. Paravirtualized queue spinlock uses two hypercalls (pv_wait and pv_kick) for halting vcpu instead of busy-waiting. pv_wait suspends vcpu and pv_kick is used to wake the suspended vcpu. pv_wait is generally called after waiting for SPIN_THRESHOLD but is immediately called if previous lock waiter is in the halted state. This alleviates sleepy spinlock anomaly somewhat. In Linux, pv_wait is implemented using halt instruction. 2.2 Hardware VM using PLE PLE(Pause-Loop-Exit) is hardware to prevent vcpu from consuming meaningless CPU time due to busy-waiting when a spinlock is executed in the virtual machine. PLE detects when a virtual CPU is spinning on a lock and will trap to the host. And then PLE handler choose a best vcpu candidate to run and does a directed yield to it [8]. This reduces LHP problem by determining a potential lock holder and boosting the vcpu. 3 Design and Implementation To simply solve the PVM LHP problem mentioned above, we use the existing handler of HVM. For this purpose, I added hypercall which calls PLE handler (kvm_vcpu_on_spin) directly in KVM. We call this small modification version as PVM-SWPLE. Fig. 1. Difference between PVM, HVM and PVM-SWPLE 16 Copyright 2017 SERSC

3 Fig 1 shows the difference between PVM, HVM, and PVM-SWPLE. PVM- SWPLE is the same as the PVM which is waiting for lock acquisition during SPIN_THRESHOLD but uses hypercall instead of halt instructions. PVM-SWPLE differs slightly from HVM in that PVM-SWPLE performs sleep after yielding, while HVM boosts other candidates and returns to VM. 4 Evaluation 4.1 Experimental setup To measure the scalability of a virtual machine, we used an IBM x3950 x6 server with eight Xeon E (2.3 GHz, 15 cores). Each virtual machine has the same 120 vcpus as the host and 64 GBytes memory. The benchmark used is MOSBENCH gmake, which measures the Linux kernel build time and tmpfs is used to reduce the impact of I/O. Experiments were conducted on HVM, PVM, and PVM-SWPLE, and the kernel build time was measured by increasing the number of cores and virtual machines. 4.1 Experiment results Experiment results are shown Fig 2. In the case of HVM, performance increases in proportion to the number of vcpus at VM=1, but performance collapses from 30 vcpus at VM=2 and worst performance shows at VM=4. In the case of PVM, performance degradation occurs from 90 cores at VM=1, and performance is saturated from 60 cores at VM=2, VM=4. In the case of suggested PVM-SWPLE, performance is similar to that of HVM at VM=1 and performance degradation occurs from 90 vcpus at VM=2, but it is better than HVM and PVM. Fig. 2. Performance comparison according to the number of cores and virtual machines. Copyright 2017 SERSC 17

4 Fig 3 shows the number of occurrences of VM_EXIT in order to find the cause of performance collapse of HVM at VM=2. As can be seen in Figure 3, the number of VM_EXIT increase dramatically on over 30 cores in HVM and 90 cores in PVM- SWPLE. This is consistent with the point of degradation in Fig 2. The reason why PLE EXIT happens suddenly is that the PLE handler performs lock holder boosting and the vcpu returns to VM and spins on the same lock. Fig. 3. VM_EXIT COUNT at VM=2 Fig 4 is a snapshot of perf profiling call-graph [9] in HVM (at 45 vcpu) and is another evidence that too much VM_EXITs affects performance. It takes up a lot of execution time to process the PLE handler. Fig. 4. Snapshot of perf profiling. 5 Conclusion In this paper, we propose PVM-SWPLE applying PLE handler of HVM to PVM and examine its performance. PVM-SWPLE has improved performance by inheriting the advantages of both. However, it is still not scalable in overcommitted environments because LHP cannot be solved enough. It is necessary to improve the scalability by reducing the number of VM_EXIT by allowing the lock holder to be identified and yielded using the characteristics of queue spinlock. Acknowledgments. This work was supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government 18 Copyright 2017 SERSC

5 (MSIT) (No. B , Research on High Performance and Scalable Manycore Operating System) References 1. Amazon EC2 Instance Types, 2. Google Machine Types, 3. Seung-Jun Cha, et al. Virtual-Machine Scalability Evaluations in the Clouds of Manycore, KIISE winter conference, Kashyap, et al. "Scalability in the Clouds!: A Myth or Reality?." Proceedings of the 6th Asia- Pacific Workshop on Systems. ACM, SeungHyub Jeon, et al. Performance Experiments of Manycore Virtual machine in the Overcommitted Clouds, KIISE winter conference, M. Mellor-Crummey and M. L. Scott. Algorithms for scalable synchronization on sharedmemory multiprocessors. ACM Trans. Compute. Syst., 9(1):21 65, qspinlock: Introducing a 4-byte queue spinlock implementation, 8. K.T. Raghavendra, Virtual cpu scheduling techniques for Kernel Based Virtual Machine (KVM), CCEM, perf: Linux profiling with performance counters, Copyright 2017 SERSC 19

Remote Direct Storage Management for Exa-Scale Storage

Remote Direct Storage Management for Exa-Scale Storage , pp.15-20 http://dx.doi.org/10.14257/astl.2016.139.04 Remote Direct Storage Management for Exa-Scale Storage Dong-Oh Kim, Myung-Hoon Cha, Hong-Yeon Kim Storage System Research Team, High Performance Computing

More information

IN the cloud, the number of virtual CPUs (VCPUs) in a virtual

IN the cloud, the number of virtual CPUs (VCPUs) in a virtual IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 28, NO. 7, JULY 2017 1811 APPLES: Efficiently Handling Spin-lock Synchronization on Virtualized Platforms Jianchen Shan, Xiaoning Ding, and Narain

More information

Performance Optimization on Huawei Public and Private Cloud

Performance Optimization on Huawei Public and Private Cloud Performance Optimization on Huawei Public and Private Cloud Jinsong Liu Lei Gong Agenda Optimization for LHP Balance scheduling RTC optimization 2 Agenda

More information

Advance Operating Systems (CS202) Locks Discussion

Advance Operating Systems (CS202) Locks Discussion Advance Operating Systems (CS202) Locks Discussion Threads Locks Spin Locks Array-based Locks MCS Locks Sequential Locks Road Map Threads Global variables and static objects are shared Stored in the static

More information

Multiprocessor Systems. Chapter 8, 8.1

Multiprocessor Systems. Chapter 8, 8.1 Multiprocessor Systems Chapter 8, 8.1 1 Learning Outcomes An understanding of the structure and limits of multiprocessor hardware. An appreciation of approaches to operating system support for multiprocessor

More information

KVM PERFORMANCE OPTIMIZATIONS INTERNALS. Rik van Riel Sr Software Engineer, Red Hat Inc. Thu May

KVM PERFORMANCE OPTIMIZATIONS INTERNALS. Rik van Riel Sr Software Engineer, Red Hat Inc. Thu May KVM PERFORMANCE OPTIMIZATIONS INTERNALS Rik van Riel Sr Software Engineer, Red Hat Inc. Thu May 5 2011 KVM performance optimizations What is virtualization performance? Optimizations in RHEL 6.0 Selected

More information

Towards Fair and Efficient SMP Virtual Machine Scheduling

Towards Fair and Efficient SMP Virtual Machine Scheduling Towards Fair and Efficient SMP Virtual Machine Scheduling Jia Rao and Xiaobo Zhou University of Colorado, Colorado Springs http://cs.uccs.edu/~jrao/ Executive Summary Problem: unfairness and inefficiency

More information

Preemptable Ticket Spinlocks: Improving Consolidated Performance in the Cloud

Preemptable Ticket Spinlocks: Improving Consolidated Performance in the Cloud Preemptable Ticket Spinlocks: Improving Consolidated Performance in the Cloud Jiannan Ouyang Department of Computer Science University of Pittsburgh Pittsburgh, PA 526 ouyang@cs.pitt.edu John R. Lange

More information

Lecture 9: Multiprocessor OSs & Synchronization. CSC 469H1F Fall 2006 Angela Demke Brown

Lecture 9: Multiprocessor OSs & Synchronization. CSC 469H1F Fall 2006 Angela Demke Brown Lecture 9: Multiprocessor OSs & Synchronization CSC 469H1F Fall 2006 Angela Demke Brown The Problem Coordinated management of shared resources Resources may be accessed by multiple threads Need to control

More information

Cycle accurate transaction-driven simulation with multiple processor simulators

Cycle accurate transaction-driven simulation with multiple processor simulators Cycle accurate transaction-driven simulation with multiple processor simulators Dohyung Kim 1a) and Rajesh Gupta 2 1 Engineering Center, Google Korea Ltd. 737 Yeoksam-dong, Gangnam-gu, Seoul 135 984, Korea

More information

250P: Computer Systems Architecture. Lecture 14: Synchronization. Anton Burtsev March, 2019

250P: Computer Systems Architecture. Lecture 14: Synchronization. Anton Burtsev March, 2019 250P: Computer Systems Architecture Lecture 14: Synchronization Anton Burtsev March, 2019 Coherence and Synchronization Topics: synchronization primitives (Sections 5.4-5.5) 2 Constructing Locks Applications

More information

Virtualized Testbed Development using Openstack

Virtualized Testbed Development using Openstack , pp.742-746 http://dx.doi.org/10.14257/astl.2015.120.147 Virtualized Testbed Development using Openstack Byeongok Kwak 1, Heeyoung Jung 1, 1 Electronics and Telecommunications Research Institute (ETRI),

More information

Dynamic Monitoring Tool based on Vector Clocks for Multithread Programs

Dynamic Monitoring Tool based on Vector Clocks for Multithread Programs , pp.45-49 http://dx.doi.org/10.14257/astl.2014.76.12 Dynamic Monitoring Tool based on Vector Clocks for Multithread Programs Hyun-Ji Kim 1, Byoung-Kwi Lee 2, Ok-Kyoon Ha 3, and Yong-Kee Jun 1 1 Department

More information

Adaptive Lock. Madhav Iyengar < >, Nathaniel Jeffries < >

Adaptive Lock. Madhav Iyengar < >, Nathaniel Jeffries < > Adaptive Lock Madhav Iyengar < miyengar@andrew.cmu.edu >, Nathaniel Jeffries < njeffrie@andrew.cmu.edu > ABSTRACT Busy wait synchronization, the spinlock, is the primitive at the core of all other synchronization

More information

The RCU-Reader Preemption Problem in VMs

The RCU-Reader Preemption Problem in VMs The RCU-Reader Preemption Problem in VMs Aravinda Prasad1, K Gopinath1, Paul E. McKenney2 1 2 Indian Institute of Science (IISc), Bangalore IBM Linux Technology Center, Beaverton 2017 USENIX Annual Technical

More information

KVM Weather Report. Red Hat Author Gleb Natapov May 29, 2013

KVM Weather Report. Red Hat Author Gleb Natapov May 29, 2013 KVM Weather Report Red Hat Author Gleb Natapov May 29, 2013 Part I What is KVM Section 1 KVM Features KVM Features 4 KVM features VT-x/AMD-V (hardware virtualization) EPT/NPT (two dimensional paging) CPU/memory

More information

Lightweight caching strategy for wireless content delivery networks

Lightweight caching strategy for wireless content delivery networks Lightweight caching strategy for wireless content delivery networks Jihoon Sung 1, June-Koo Kevin Rhee 1, and Sangsu Jung 2a) 1 Department of Electrical Engineering, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon,

More information

Linux kernel synchronization. Don Porter CSE 506

Linux kernel synchronization. Don Porter CSE 506 Linux kernel synchronization Don Porter CSE 506 The old days Early/simple OSes (like JOS): No need for synchronization All kernel requests wait until completion even disk requests Heavily restrict when

More information

A Design of Building Group Management Service Framework for On-Going Commissioning

A Design of Building Group Management Service Framework for On-Going Commissioning , pp.84-88 http://dx.doi.org/10.14257/astl.2014.49.18 A Design of Building Group Management Service Framework for On-Going Commissioning Taehyung Kim 1, Youn Kwae Jeong 1 and Il Woo Lee 1, 1 Electronics

More information

Scalable Locking. Adam Belay

Scalable Locking. Adam Belay Scalable Locking Adam Belay Problem: Locks can ruin performance 12 finds/sec 9 6 Locking overhead dominates 3 0 0 6 12 18 24 30 36 42 48 Cores Problem: Locks can ruin performance the locks

More information

Real-Time Scalability of Nested Spin Locks. Hiroaki Takada and Ken Sakamura. Faculty of Science, University of Tokyo

Real-Time Scalability of Nested Spin Locks. Hiroaki Takada and Ken Sakamura. Faculty of Science, University of Tokyo Real-Time Scalability of Nested Spin Locks Hiroaki Takada and Ken Sakamura Department of Information Science, Faculty of Science, University of Tokyo 7-3-1, Hongo, Bunkyo-ku, Tokyo 113, Japan Abstract

More information

Network Intrusion Forensics System based on Collection and Preservation of Attack Evidence

Network Intrusion Forensics System based on Collection and Preservation of Attack Evidence , pp.354-359 http://dx.doi.org/10.14257/astl.2016.139.71 Network Intrusion Forensics System based on Collection and Preservation of Attack Evidence Jong-Hyun Kim, Yangseo Choi, Joo-Young Lee, Sunoh Choi,

More information

Optimization of thread affinity and memory affinity for remote core locking synchronization in multithreaded programs for multicore computer systems

Optimization of thread affinity and memory affinity for remote core locking synchronization in multithreaded programs for multicore computer systems Optimization of thread affinity and memory affinity for remote core locking synchronization in multithreaded programs for multicore computer systems Alexey Paznikov Saint Petersburg Electrotechnical University

More information

Enhancing Linux Scheduler Scalability

Enhancing Linux Scheduler Scalability Enhancing Linux Scheduler Scalability Mike Kravetz IBM Linux Technology Center Hubertus Franke, Shailabh Nagar, Rajan Ravindran IBM Thomas J. Watson Research Center {mkravetz,frankeh,nagar,rajancr}@us.ibm.com

More information

Performance and Optimization Issues in Multicore Computing

Performance and Optimization Issues in Multicore Computing Performance and Optimization Issues in Multicore Computing Minsoo Ryu Department of Computer Science and Engineering 2 Multicore Computing Challenges It is not easy to develop an efficient multicore program

More information

Spinlocks. Spinlocks. Message Systems, Inc. April 8, 2011

Spinlocks. Spinlocks. Message Systems, Inc. April 8, 2011 Spinlocks Samy Al Bahra Devon H. O Dell Message Systems, Inc. April 8, 2011 Introduction Mutexes A mutex is an object which implements acquire and relinquish operations such that the execution following

More information

CPSC/ECE 3220 Summer 2017 Exam 2

CPSC/ECE 3220 Summer 2017 Exam 2 CPSC/ECE 3220 Summer 2017 Exam 2 Name: Part 1: Word Bank Write one of the words or terms from the following list into the blank appearing to the left of the appropriate definition. Note that there are

More information

The Design and Implementation of a BLE-based WebD2D Service for Android Smartphone

The Design and Implementation of a BLE-based WebD2D Service for Android Smartphone , pp.1-5 http://dx.doi.org/10.14257/astl.2017.146.01 The Design and Implementation of a BLE-based WebD2D Service for Android Smartphone Do-Hyung Kim 1, Seok-Jin Yoon 1, Hyung-Seok Lee 1 and Jae-Ho Lee

More information

Deep Learning Based Real-time Object Recognition System with Image Web Crawler

Deep Learning Based Real-time Object Recognition System with Image Web Crawler , pp.103-110 http://dx.doi.org/10.14257/astl.2016.142.19 Deep Learning Based Real-time Object Recognition System with Image Web Crawler Myung-jae Lee 1, Hyeok-june Jeong 1, Young-guk Ha 2 1 Department

More information

CFS-v: I/O Demand-driven VM Scheduler in KVM

CFS-v: I/O Demand-driven VM Scheduler in KVM CFS-v: Demand-driven VM Scheduler in KVM Hyotaek Shim and Sung-Min Lee (hyotaek.shim, sung.min.lee@samsung.com) Software R&D Center, Samsung Electronics 2014. 10. 16 Problem in Server Consolidation 2/16

More information

GLocks: Efficient Support for Highly- Contended Locks in Many-Core CMPs

GLocks: Efficient Support for Highly- Contended Locks in Many-Core CMPs GLocks: Efficient Support for Highly- Contended Locks in Many-Core CMPs Authors: Jos e L. Abell an, Juan Fern andez and Manuel E. Acacio Presenter: Guoliang Liu Outline Introduction Motivation Background

More information

Algorithms for Scalable Synchronization on Shared Memory Multiprocessors by John M. Mellor Crummey Michael L. Scott

Algorithms for Scalable Synchronization on Shared Memory Multiprocessors by John M. Mellor Crummey Michael L. Scott Algorithms for Scalable Synchronization on Shared Memory Multiprocessors by John M. Mellor Crummey Michael L. Scott Presentation by Joe Izraelevitz Tim Kopp Synchronization Primitives Spin Locks Used for

More information

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8. Multiprocessor System Multiprocessor Systems Chapter 8, 8.1 We will look at shared-memory multiprocessors More than one processor sharing the same memory A single CPU can only go so fast Use more than

More information

Multiprocessor Systems. COMP s1

Multiprocessor Systems. COMP s1 Multiprocessor Systems 1 Multiprocessor System We will look at shared-memory multiprocessors More than one processor sharing the same memory A single CPU can only go so fast Use more than one CPU to improve

More information

CHAPTER 16 - VIRTUAL MACHINES

CHAPTER 16 - VIRTUAL MACHINES CHAPTER 16 - VIRTUAL MACHINES 1 OBJECTIVES Explore history and benefits of virtual machines. Discuss the various virtual machine technologies. Describe the methods used to implement virtualization. Show

More information

CS377P Programming for Performance Multicore Performance Synchronization

CS377P Programming for Performance Multicore Performance Synchronization CS377P Programming for Performance Multicore Performance Synchronization Sreepathi Pai UTCS October 21, 2015 Outline 1 Synchronization Primitives 2 Blocking, Lock-free and Wait-free Algorithms 3 Transactional

More information

Micro-sliced Virtual Processors to Hide the Effect of Discontinuous CPU Availability for Consolidated Systems

Micro-sliced Virtual Processors to Hide the Effect of Discontinuous CPU Availability for Consolidated Systems Micro-sliced Virtual Processors to Hide the Effect of Discontinuous CPU Availability for Consolidated Systems Jeongseob Ahn, Chang Hyun Park, and Jaehyuk Huh Computer Science Department, KAIST {jeongseob,

More information

Time Stamp based Multiple Snapshot Management Method for Storage System

Time Stamp based Multiple Snapshot Management Method for Storage System Time Stamp based Multiple Snapshot Management Method for Storage System Yunsoo Lee 1, Dongmin Shin 1, Insoo Bae 1, Seokil Song 1, Seungkook Cheong 2 1 Dept. of Computer Engineering, Korea National University

More information

Nested Virtualization Update From Intel. Xiantao Zhang, Eddie Dong Intel Corporation

Nested Virtualization Update From Intel. Xiantao Zhang, Eddie Dong Intel Corporation Nested Virtualization Update From Intel Xiantao Zhang, Eddie Dong Intel Corporation Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED,

More information

Reactive Synchronization Algorithms for Multiprocessors

Reactive Synchronization Algorithms for Multiprocessors Synchronization Algorithms for Multiprocessors Beng-Hong Lim and Anant Agarwal Laboratory for Computer Science Massachusetts Institute of Technology Cambridge, MA 039 Abstract Synchronization algorithms

More information

An Efficient Flow Table Management Scheme for SDNs Based On Flow Forwarding Paths

An Efficient Flow Table Management Scheme for SDNs Based On Flow Forwarding Paths , pp.88-93 http://dx.doi.org/10.14257/astl.2016.135.23 An Efficient Flow Table Management Scheme for SDNs Based On Flow Forwarding Paths Dongryeol Kim, Byoung-Dai Lee Kyonggi university, Department of

More information

Chapter 5 C. Virtual machines

Chapter 5 C. Virtual machines Chapter 5 C Virtual machines Virtual Machines Host computer emulates guest operating system and machine resources Improved isolation of multiple guests Avoids security and reliability problems Aids sharing

More information

Virtual Machines. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Virtual Machines. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Virtual Machines Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Today's Topics History and benefits of virtual machines Virtual machine technologies

More information

Virtual Machines Disco and Xen (Lecture 10, cs262a) Ion Stoica & Ali Ghodsi UC Berkeley February 26, 2018

Virtual Machines Disco and Xen (Lecture 10, cs262a) Ion Stoica & Ali Ghodsi UC Berkeley February 26, 2018 Virtual Machines Disco and Xen (Lecture 10, cs262a) Ion Stoica & Ali Ghodsi UC Berkeley February 26, 2018 Today s Papers Disco: Running Commodity Operating Systems on Scalable Multiprocessors, Edouard

More information

Virtualization. ...or how adding another layer of abstraction is changing the world. CIS 399: Unix Skills University of Pennsylvania.

Virtualization. ...or how adding another layer of abstraction is changing the world. CIS 399: Unix Skills University of Pennsylvania. Virtualization...or how adding another layer of abstraction is changing the world. CIS 399: Unix Skills University of Pennsylvania April 6, 2009 (CIS 399 Unix) Virtualization April 6, 2009 1 / 22 What

More information

The complete license text can be found at

The complete license text can be found at SMP & Locking These slides are made distributed under the Creative Commons Attribution 3.0 License, unless otherwise noted on individual slides. You are free: to Share to copy, distribute and transmit

More information

Priority Inheritance Spin Locks for Multiprocessor Real-Time Systems

Priority Inheritance Spin Locks for Multiprocessor Real-Time Systems Priority Inheritance Spin Locks for Multiprocessor Real-Time Systems Cai-Dong Wang, Hiroaki Takada, and Ken Sakamura Department of Information Science, Faculty of Science, University of Tokyo 7-3-1 Hongo,

More information

Advanced Topic: Efficient Synchronization

Advanced Topic: Efficient Synchronization Advanced Topic: Efficient Synchronization Multi-Object Programs What happens when we try to synchronize across multiple objects in a large program? Each object with its own lock, condition variables Is

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 27 Virtualization Slides based on Various sources 1 1 Virtualization Why we need virtualization? The concepts and

More information

Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin

Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin Department of Electrical Engineering, National Taiwan Normal University, Taipei, Taiwan Abstract String matching is the most

More information

Xen. past, present and future. Stefano Stabellini

Xen. past, present and future. Stefano Stabellini Xen past, present and future Stefano Stabellini Xen architecture: PV domains Xen arch: driver domains Xen: advantages - small surface of attack - isolation - resilience - specialized algorithms (scheduler)

More information

Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )

Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( ) Systems Group Department of Computer Science ETH Zürich Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Today Non-Uniform

More information

Fairlocks - A High Performance Fair Locking Scheme

Fairlocks - A High Performance Fair Locking Scheme Fairlocks - A High Performance Fair Locking Scheme Swaminathan Sivasubramanian, Iowa State University, swamis@iastate.edu John Stultz, IBM Corporation, jstultz@us.ibm.com Jack F. Vogel, IBM Corporation,

More information

CPSC/ECE 3220 Summer 2018 Exam 2 No Electronics.

CPSC/ECE 3220 Summer 2018 Exam 2 No Electronics. CPSC/ECE 3220 Summer 2018 Exam 2 No Electronics. Name: Write one of the words or terms from the following list into the blank appearing to the left of the appropriate definition. Note that there are more

More information

P2FS: supporting atomic writes for reliable file system design in PCM storage

P2FS: supporting atomic writes for reliable file system design in PCM storage LETTER IEICE Electronics Express, Vol.11, No.13, 1 6 P2FS: supporting atomic writes for reliable file system design in PCM storage Eunji Lee 1, Kern Koh 2, and Hyokyung Bahn 2a) 1 Department of Software,

More information

A Repository Framework for Self-Growing Robot Software

A Repository Framework for Self-Growing Robot Software A Repository Framework for Self-Growing Robot Software Hyung-Min Koo, In-Young Ko Information and Communications University (ICU) 119 Munjiro, Yuseong-gu, Daejeon, 305-732, Korea {hyungminkoo, iko}@icu.ac.kr

More information

Characterizing the Performance and Scalability of Many-core Applications on Virtualized Platforms

Characterizing the Performance and Scalability of Many-core Applications on Virtualized Platforms Characterizing the Performance and Scalability of Many-core Applications on Virtualized Platforms Xiang Song, Haibo Chen and Binyu Zang {xiangsong, hbchen, byzang}@fudan.edu.cn Parallel Processing Institute,

More information

A Mobile Device Classification Mechanism for Efficient Prevention of Wireless Intrusion

A Mobile Device Classification Mechanism for Efficient Prevention of Wireless Intrusion A obile Device Classification echanism for Efficient Prevention of Wireless Intrusion Hyeokchan Kwon 1, Sin-Hyo Kim 1, 1 Electronics and Telecommunications Research Institue, 218 Gajeong-ro, Yuseong-gu,

More information

COMPUTER ARCHITECTURE. Virtualization and Memory Hierarchy

COMPUTER ARCHITECTURE. Virtualization and Memory Hierarchy COMPUTER ARCHITECTURE Virtualization and Memory Hierarchy 2 Contents Virtual memory. Policies and strategies. Page tables. Virtual machines. Requirements of virtual machines and ISA support. Virtual machines:

More information

VIRTUALIZATION: IBM VM/370 AND XEN

VIRTUALIZATION: IBM VM/370 AND XEN 1 VIRTUALIZATION: IBM VM/370 AND XEN CS6410 Hakim Weatherspoon IBM VM/370 Robert Jay Creasy (1939-2005) Project leader of the first full virtualization hypervisor: IBM CP-40, a core component in the VM

More information

Today. SMP architecture. SMP architecture. Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )

Today. SMP architecture. SMP architecture. Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( ) Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Systems Group Department of Computer Science ETH Zürich SMP architecture

More information

Improving CPU Performance of Xen Hypervisor in Virtualized Environment

Improving CPU Performance of Xen Hypervisor in Virtualized Environment ISSN: 2393-8528 Contents lists available at www.ijicse.in International Journal of Innovative Computer Science & Engineering Volume 5 Issue 3; May-June 2018; Page No. 14-19 Improving CPU Performance of

More information

Automated and Massive-scale CCNx Experiments with Software-Defined SmartX Boxes

Automated and Massive-scale CCNx Experiments with Software-Defined SmartX Boxes Network Research Workshop Proceedings of the Asia-Pacific Advanced Network 2014 v. 38, p. 29-33. http://dx.doi.org/10.7125/apan.38.5 ISSN 2227-3026 Automated and Massive-scale CCNx Experiments with Software-Defined

More information

CSE 120 Principles of Operating Systems

CSE 120 Principles of Operating Systems CSE 120 Principles of Operating Systems Spring 2018 Lecture 15: Multicore Geoffrey M. Voelker Multicore Operating Systems We have generally discussed operating systems concepts independent of the number

More information

Byte Index Chunking Approach for Data Compression

Byte Index Chunking Approach for Data Compression Ider Lkhagvasuren 1, Jung Min So 1, Jeong Gun Lee 1, Chuck Yoo 2, Young Woong Ko 1 1 Dept. of Computer Engineering, Hallym University Chuncheon, Korea {Ider555, jso, jeonggun.lee, yuko}@hallym.ac.kr 2

More information

AutoNUMA Red Hat, Inc.

AutoNUMA Red Hat, Inc. AutoNUMA Red Hat, Inc. Andrea Arcangeli aarcange at redhat.com 1 Apr 2012 AutoNUMA components knuma_scand If stopped, everything stops Triggers the chain reaction when started NUMA hinting page faults

More information

A Personal Information Retrieval System in a Web Environment

A Personal Information Retrieval System in a Web Environment Vol.87 (Art, Culture, Game, Graphics, Broadcasting and Digital Contents 2015), pp.42-46 http://dx.doi.org/10.14257/astl.2015.87.10 A Personal Information Retrieval System in a Web Environment YoungDeok

More information

Real Time Linux patches: history and usage

Real Time Linux patches: history and usage Real Time Linux patches: history and usage Presentation first given at: FOSDEM 2006 Embedded Development Room See www.fosdem.org Klaas van Gend Field Application Engineer for Europe Why Linux in Real-Time

More information

Real-time scheduling for virtual machines in SK Telecom

Real-time scheduling for virtual machines in SK Telecom Real-time scheduling for virtual machines in SK Telecom Eunkyu Byun Cloud Computing Lab., SK Telecom Sponsored by: & & Cloud by Virtualization in SKT Provide virtualized ICT infra to customers like Amazon

More information

Design and Implementation of Virtual TAP for Software-Defined Networks

Design and Implementation of Virtual TAP for Software-Defined Networks Design and Implementation of Virtual TAP for Software-Defined Networks - Master Thesis Defense - Seyeon Jeong Supervisor: Prof. James Won-Ki Hong Dept. of CSE, DPNM Lab., POSTECH, Korea jsy0906@postech.ac.kr

More information

Design and Implementation of Secure OTP Generation for IoT Devices

Design and Implementation of Secure OTP Generation for IoT Devices , pp.75-80 http://dx.doi.org/10.14257/astl.2017.146.15 Design and Implementation of Secure OTP Generation for IoT Devices Young-Sae Kim 1 and Jeong-Nyeo Kim 1 1 Electronics and Telecommunications Research

More information

Network Traffic Anomaly Detection based on Ratio and Volume Analysis

Network Traffic Anomaly Detection based on Ratio and Volume Analysis 190 Network Traffic Anomaly Detection based on Ratio and Volume Analysis Hyun Joo Kim, Jung C. Na, Jong S. Jang Active Security Technology Research Team Network Security Department Information Security

More information

A Simple Model for Estimating Power Consumption of a Multicore Server System

A Simple Model for Estimating Power Consumption of a Multicore Server System , pp.153-160 http://dx.doi.org/10.14257/ijmue.2014.9.2.15 A Simple Model for Estimating Power Consumption of a Multicore Server System Minjoong Kim, Yoondeok Ju, Jinseok Chae and Moonju Park School of

More information

KVM on s390: what's next?

KVM on s390: what's next? : what's next? Agenda Current status Exploring the limits of our kvm port with the flower shop scenario Next steps 2 Current status Kernel components upstream in 2.6.26 Intermediate userspace kuli Kuli

More information

Implementation of an NIC for Virtualization Servers to Support the High- Speed Feature of Virtual Machine Networking

Implementation of an NIC for Virtualization Servers to Support the High- Speed Feature of Virtual Machine Networking Implementation of an NIC for Virtualization Servers to Support the High- Speed Feature of Virtual Machine Networking Changsu Kim 1, Kiwoong Jung 2, and Hoekyoung Jung 3* Department of Computer Engineering,

More information

What is KVM? KVM patch. Modern hypervisors must do many things that are already done by OSs Scheduler, Memory management, I/O stacks

What is KVM? KVM patch. Modern hypervisors must do many things that are already done by OSs Scheduler, Memory management, I/O stacks LINUX-KVM The need for KVM x86 originally virtualization unfriendly No hardware provisions Instructions behave differently depending on privilege context(popf) Performance suffered on trap-and-emulate

More information

Improving Throughput in Cloud Storage System

Improving Throughput in Cloud Storage System Improving Throughput in Cloud Storage System Chanho Choi chchoi@dcslab.snu.ac.kr Shin-gyu Kim sgkim@dcslab.snu.ac.kr Hyeonsang Eom hseom@dcslab.snu.ac.kr Heon Y. Yeom yeom@dcslab.snu.ac.kr Abstract Because

More information

Hardware: BBN Butterfly

Hardware: BBN Butterfly Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors John M. Mellor-Crummey and Michael L. Scott Presented by Charles Lehner and Matt Graichen Hardware: BBN Butterfly shared-memory

More information

VARIABILITY IN OPERATING SYSTEMS

VARIABILITY IN OPERATING SYSTEMS VARIABILITY IN OPERATING SYSTEMS Brian Kocoloski Assistant Professor in CSE Dept. October 8, 2018 1 CLOUD COMPUTING Current estimate is that 94% of all computation will be performed in the cloud by 2021

More information

Kernel Critical Sections

Kernel Critical Sections Kernel Critical Sections Real Time Operating Systems and Middleware Luca Abeni luca.abeni@unitn.it Critical Sections in Kernel Code Old Linux kernels used to be non-preemptable... Kernel Big critical section

More information

Measuring the impacts of the Preempt-RT patch

Measuring the impacts of the Preempt-RT patch Measuring the impacts of the Preempt-RT patch maxime.chevallier@smile.fr October 25, 2017 RT Linux projects Simulation platform : bi-xeon, lots ot RAM 200µs wakeup latency, networking Test bench : Intel

More information

Adaptation of Distributed File System to VDI Storage by Client-Side Cache

Adaptation of Distributed File System to VDI Storage by Client-Side Cache Adaptation of Distributed File System to VDI Storage by Client-Side Cache Cheiyol Kim 1*, Sangmin Lee 1, Youngkyun Kim 1, Daewha Seo 2 1 Storage System Research Team, Electronics and Telecommunications

More information

Design of a Processing Structure of CNN Algorithm using Filter Buffers

Design of a Processing Structure of CNN Algorithm using Filter Buffers , pp.37-41 http://dx.doi.org/10.14257/astl.2016.129.08 Design of a Processing Structure of CNN Algorithm using Filter Buffers Kwan-Ho Lee 1, Jun-Mo Jeong 2, Jong-Joon Park 3 1 Dept. of Electronics and

More information

Kernels and Locking. Luca Abeni

Kernels and Locking. Luca Abeni Kernels and Locking Luca Abeni luca.abeni@santannapisa.it Critical Sections in Kernel Code Old Linux kernels used to be non-preemptable... Kernel Big critical section Mutual exclusion was not a problem...

More information

Timers 1 / 46. Jiffies. Potent and Evil Magic

Timers 1 / 46. Jiffies. Potent and Evil Magic Timers 1 / 46 Jiffies Each timer tick, a variable called jiffies is incremented It is thus (roughly) the number of HZ since system boot A 32-bit counter incremented at 1000 Hz wraps around in about 50

More information

Lecture #7: Implementing Mutual Exclusion

Lecture #7: Implementing Mutual Exclusion Lecture #7: Implementing Mutual Exclusion Review -- 1 min Solution #3 to too much milk works, but it is really unsatisfactory: 1) Really complicated even for this simple example, hard to convince yourself

More information

Online Version Only. Book made by this file is ILLEGAL. Design and Implementation of Binary File Similarity Evaluation System. 1.

Online Version Only. Book made by this file is ILLEGAL. Design and Implementation of Binary File Similarity Evaluation System. 1. , pp.1-10 http://dx.doi.org/10.14257/ijmue.2014.9.1.01 Design and Implementation of Binary File Similarity Evaluation System Sun-Jung Kim 2, Young Jun Yoo, Jungmin So 1, Jeong Gun Lee 1, Jin Kim 1 and

More information

Synchronization. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Synchronization. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Synchronization Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Types of Synchronization Mutual Exclusion Locks Event Synchronization Global or group-based

More information

IBM PowerKVM available with the Linux only scale-out servers IBM Redbooks Solution Guide

IBM PowerKVM available with the Linux only scale-out servers IBM Redbooks Solution Guide IBM PowerKVM available with the Linux only scale-out servers IBM Redbooks Solution Guide The IBM POWER8 processors are built for big data and open innovation. Now, Linux administrators and users can maximize

More information

Migration Based Page Caching Algorithm for a Hybrid Main Memory of DRAM and PRAM

Migration Based Page Caching Algorithm for a Hybrid Main Memory of DRAM and PRAM Migration Based Page Caching Algorithm for a Hybrid Main Memory of DRAM and PRAM Hyunchul Seok Daejeon, Korea hcseok@core.kaist.ac.kr Youngwoo Park Daejeon, Korea ywpark@core.kaist.ac.kr Kyu Ho Park Deajeon,

More information

Concurrent Counting using Combining Tree

Concurrent Counting using Combining Tree Final Project Report by Shang Wang, Taolun Chai and Xiaoming Jia Concurrent Counting using Combining Tree 1. Introduction Counting is one of the very basic and natural activities that computers do. However,

More information

Synchronization. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Synchronization. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Synchronization Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) Types of Synchronization

More information

A Fine-grained Performance-based Decision Model for Virtualization Application Solution

A Fine-grained Performance-based Decision Model for Virtualization Application Solution A Fine-grained Performance-based Decision Model for Virtualization Application Solution Jianhai Chen College of Computer Science Zhejiang University Hangzhou City, Zhejiang Province, China 2011/08/29 Outline

More information

A simple correctness proof of the MCS contention-free lock. Theodore Johnson. Krishna Harathi. University of Florida. Abstract

A simple correctness proof of the MCS contention-free lock. Theodore Johnson. Krishna Harathi. University of Florida. Abstract A simple correctness proof of the MCS contention-free lock Theodore Johnson Krishna Harathi Computer and Information Sciences Department University of Florida Abstract Mellor-Crummey and Scott present

More information

Overcoming Virtualization Overheads for Large-vCPU Virtual Machines

Overcoming Virtualization Overheads for Large-vCPU Virtual Machines Overcoming Virtualization Overheads for Large-vCPU Virtual Machines Ozgur Kilic, Spoorti Doddamani, Aprameya Bhat, Hardik Bagdi, Kartik Gopalan Contact: {okilic1,sdoddam1,abhat3,hbagdi1,kartik}@binghamton.edu

More information

Car License Plate Detection Based on Line Segments

Car License Plate Detection Based on Line Segments , pp.99-103 http://dx.doi.org/10.14257/astl.2014.58.21 Car License Plate Detection Based on Line Segments Dongwook Kim 1, Liu Zheng Dept. of Information & Communication Eng., Jeonju Univ. Abstract. In

More information

Fit for Purpose Platform Positioning and Performance Architecture

Fit for Purpose Platform Positioning and Performance Architecture Fit for Purpose Platform Positioning and Performance Architecture Joe Temple IBM Monday, February 4, 11AM-12PM Session Number 12927 Insert Custom Session QR if Desired. Fit for Purpose Categorized Workload

More information

Deterministic Futexes Revisited

Deterministic Futexes Revisited A. Zuepke Deterministic Futexes Revisited Alexander Zuepke, Robert Kaiser first.last@hs-rm.de A. Zuepke Futexes Futexes: underlying mechanism for thread synchronization in Linux libc provides: Mutexes

More information

LINUX Virtualization. Running other code under LINUX

LINUX Virtualization. Running other code under LINUX LINUX Virtualization Running other code under LINUX Environment Virtualization Citrix/MetaFrame Virtual desktop under Windows NT. aka Windows Remote Desktop Protocol VNC, Dameware virtual console. XWindows

More information

High Performance Synchronization Algorithms for. Multiprogrammed Multiprocessors. (Extended Abstract)

High Performance Synchronization Algorithms for. Multiprogrammed Multiprocessors. (Extended Abstract) High Performance Synchronization Algorithms for Multiprogrammed Multiprocessors (Extended Abstract) Robert W. Wisniewski, Leonidas Kontothanassis, and Michael L. Scott Department of Computer Science University

More information