Reducing CPU usage of a Toro Appliance

Similar documents
A dedicated kernel named TORO. Matias Vara Larsen

Speeding up the Booting Time of a Toro Appliance

Nested Virtualization and Server Consolidation

Virtualization. ...or how adding another layer of abstraction is changing the world. CIS 399: Unix Skills University of Pennsylvania.

Chapter 5 C. Virtual machines

Dan Noé University of New Hampshire / VeloBit

Virtual Machines. Part 2: starting 19 years ago. Operating Systems In Depth IX 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

Spring 2017 :: CSE 506. Introduction to. Virtual Machines. Nima Honarmand

CS 550 Operating Systems Spring Introduction to Virtual Machines

CS370 Operating Systems

Hypervisor security. Evgeny Yakovlev, DEFCON NN, 2017

Virtualization. Pradipta De

CIT 480: Securing Computer Systems. Operating System Concepts

Background. IBM sold expensive mainframes to large organizations. Monitor sits between one or more OSes and HW

Virtualization Overview NSRC

Distributed Systems COMP 212. Lecture 18 Othon Michail

Virtual Pc Manual Windows 7 64 Bit Guest On 32-bit Host

Comprehensive Kernel Instrumentation via Dynamic Binary Translation

LINUX Virtualization. Running other code under LINUX

Module 1: Virtualization. Types of Interfaces

CHAPTER 16 - VIRTUAL MACHINES

The Kernel Abstraction

Lecture 5: February 3

The Challenges of X86 Hardware Virtualization. GCC- Virtualization: Rajeev Wankar 36

Multiprocessor Scheduling. Multiprocessor Scheduling

Virtualization Introduction

A Survey on Virtualization Technologies

SEEDAndroid User Manual

Originally prepared by Lehigh graduate Greg Bosch; last modified April 2016 by B. Davison

Virtualization and Performance

CHAPTER 16 - VIRTUAL MACHINES

OS Structure. Hardware protection & privilege levels Control transfer to and from the operating system


Bacula Systems Virtual Machine Performance Backup Suite

MultiLanes: Providing Virtualized Storage for OS-level Virtualization on Many Cores

Virtualisation: The KVM Way. Amit Shah

CS 350 Winter 2011 Current Topics: Virtual Machines + Solid State Drives

DPDK Summit 2016 OpenContrail vrouter / DPDK Architecture. Raja Sivaramakrishnan, Distinguished Engineer Aniket Daptari, Sr.

Micro VMMs and Nested Virtualization

Multi-Hypervisor Virtual Machines: Enabling An Ecosystem of Hypervisor-level Services

What is KVM? KVM patch. Modern hypervisors must do many things that are already done by OSs Scheduler, Memory management, I/O stacks

Virtualization, Xen and Denali

Virtualization. Darren Alton

CSCE Operating Systems Interrupts, Exceptions, and Signals. Qiang Zeng, Ph.D. Fall 2018

LINUX KVM FRANCISCO JAVIER VARGAS GARCIA-DONAS CLOUD COMPUTING 2017

International Journal of Advance Engineering and Research Development. DPDK-Based Implementation Of Application : File Downloader

CS370 Operating Systems

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

COS 318: Operating Systems

The only open-source type-1 hypervisor

Cloud Networking (VITMMA02) Server Virtualization Data Center Gear

Continuous integration & continuous delivery. COSC345 Software Engineering

Task Scheduling of Real- Time Media Processing with Hardware-Assisted Virtualization Heikki Holopainen

Lecture 7. Xen and the Art of Virtualization. Paul Braham, Boris Dragovic, Keir Fraser et al. 16 November, Advanced Operating Systems

Managed Conversion of Guests to ovirt. Arik Hadas Senior Software Engineer Red Hat 21/8/15

Cymmetria MazeRunner INSTALLATION GUIDE

Review: Hardware user/kernel boundary

Live block device operations in QEMU

Virtualization for Desktop Grid Clients

Operating Systems 4/27/2015

AMD SEV Update Linux Security Summit David Kaplan, Security Architect

CSE543 - Computer and Network Security Module: Virtualization

CSE543 - Computer and Network Security Module: Virtualization

QEMU and the Linux Kernel

Low overhead virtual machines tracing in a cloud infrastructure

19: I/O Devices: Clocks, Power Management

Abstractions for Practical Virtual Machine Replay. Anton Burtsev, David Johnson, Mike Hibler, Eric Eride, John Regehr University of Utah

Virtual Machines. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

24-vm.txt Mon Nov 21 22:13: Notes on Virtual Machines , Fall 2011 Carnegie Mellon University Randal E. Bryant.

Cloud and Datacenter Networking

HKG : OpenAMP Introduction. Wendy Liang

Use of containerisation as an alternative to full virtualisation in grid environments.

Cisco Modeling Labs OVA Installation

Introduction to OS. Introduction MOS Mahmoud El-Gayyar. Mahmoud El-Gayyar / Introduction to OS 1

vnetwork Future Direction Howie Xu, VMware R&D November 4, 2008

CrashOS: Hypervisor testing tool

COS 318: Operating Systems

Virtual Pc Manual Windows 7 64 Bit Guest Os

Protection and System Calls. Otto J. Anshus

Virtualization. Operating Systems, 2016, Meni Adler, Danny Hendler & Amnon Meisels

Questions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation. What s in a process? Organizing a Process

SaaSaMe Transport Workload Snapshot Export for. Alibaba Cloud

CS-580K/480K Advanced Topics in Cloud Computing. VM Virtualization II

CSCE Introduction to Computer Systems Spring 2019

Virtualization. Virtualization

Simultaneous Multithreading on Pentium 4

ANALYSIS OF VIRTUAL NETWORKS IN DATA CENTERS.

Xen and CloudStack. Ewan Mellor. Director, Engineering, Open-source Cloud Platforms Citrix Systems

SUSE An introduction...

Container Adoption for NFV Challenges & Opportunities. Sriram Natarajan, T-Labs Silicon Valley Innovation Center

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT I

The Slide does not contain all the information and cannot be treated as a study material for Operating System. Please refer the text book for exams.

PROTECTING VM REGISTER STATE WITH AMD SEV-ES DAVID KAPLAN LSS 2017

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

What is Cloud Computing? Cloud computing is the dynamic delivery of IT resources and capabilities as a Service over the Internet.

Virtualization. Guillaume Urvoy-Keller UNS/I3S

Advanced Systems Security: Virtual Machine Systems

Baremetal with Apache CloudStack

CSCI 8530 Advanced Operating Systems. Part 19 Virtualization

Virtual Asymmetric Multiprocessor for Interactive Performance of Consolidated Desktops

Transcription:

Reducing CPU usage of a Toro Appliance Matias E. Vara Larsen matiasevara@gmail.com

Who am I? Electronic Engineer from Universidad Nacional de La Plata, Argentina PhD in Computer Science, Universite NiceSophia Antipolis, Nice, France Citrix, Cambridge Silicon-Gears, Barcelona (Current job) During my free Time, I develop Toro Kernel

What is Toro? It is a kernel based on Intel x86-64 architecture It is written in Freepascal, Yes, It is... It provides a simple API for user application, i.e., application-oriented It is compiled within the user application thus resulting in a image, i.e., library OS-like designing It runs together with the user application at ring 0 and shares the memory space It can run either on baremetal or on top of an hypervisor/emulator, e.g., HyperV, KVM, QEMU, VirtualBox

What is Toro? Toro Kernel y db ke Lin run so n Compilation Generates Lin ke db y Uses User Application Baremetal Hypervisor The same image can run in all hypervisors/emulator s n ru Image on

Current state of Toro kernel User application ext2 Scheduler Virtual Filesystem cooperative multithreading Memory Flat (up to 512gb) ne2000 Virtio IDE-Disk e1000 Device Driver TORO kernel Hypervisor, e.g., KVM, VirtualBox, VMWare, QEMU, HyperV Hardware (x86-64) Stack TCP-IP Network

What is this talk about? The CPU usage of a Toro guest is too high The CPU usage is high even when the guest is idle This is undesirable situation in production because CPU is shared resource A high CPU usage value can lead to increased ready time and processor queuing of the virtual machines on the host [1] [1] VMware vsphere 5.1 Documentation Center, Solutions for Consistently High CPU Usage

What is this talk about? Top of a Qemu s guest that runs a Http server in Toro

Understanding the Problem The main issue is idle loops that consume a lot of CPU i.e., a loop that keeps checking a condition I identified several cases in which idle loops were used: Spin-locks Scheduler loop to get mutual exclusion to a shared resource in a multicore system When there is no a thread in ready state, i.e., 2 cases System Threads Threads that keeps checking a condition when they are idle

Proposal Avoid the using of idle loops because is a bad programming habit Relax the CPU during idle loops [1,2,3] 1) Spin-locks 2) Scheduler 3) Thread polling a variable [1] Benefiting Power and Performance Sleep Loops [2] Idle Thread Talk, G. Somlo [3] Intel Manual

Partial Solution 1) Spin-locks: Use pause instruction: Essentially, the pause instruction delays the next instruction's execution for a finite period of time. By delaying the execution of the next instruction, the processor is not under demand, and parts of the pipeline are no longer being used, which in turn reduces the power consumed by the processor.

Partial Solution 2) Scheduler: Use halt instruction when Scheduler is idle: Stops instruction execution and places the processor in a HALT state. An enabled interrupt (including NMI and SMI), a debug exception, the BINIT# signal, the INIT# signal, or the RESET# signal will resume execution. If an interrupt (including NMI) is used to resume execution after a HLT instruction, the saved instruction pointer (CS:EIP) points to the instruction following the HLT instruction. (It is a privileged instrucction).

Partial Solution 3) A System Thread that polls a variable: It is a hard case because scheduler has to figured out when a thread is doing idle work Proposal: To provide an API to enable the thread to tell the scheduler when it is doing idle work

Partial Solution SysThreadSwitch(IdleFlag: Boolean) tells the scheduler when it can schedule a new thread When IdleFlag=true, the scheduler knows that the thread is doing idle work, e.g., polling the variable. The scheduler does the following steps: 1. To check how much time the thread has been idle 2. if it is more than some constant the thread s state becomes idle 3. To check if all threads in the system are idle and there is no thread in ready state, in this case the scheduler halts the core. SysThreadActive() tells the scheduler that the thread has work to do, the scheduler stops to count the idle time and the thread state become ts_ready.

Example: ProcessNetworkPackets()... while True do begin Packet := SysNetworkRead; if Packet = nil then begin // thread tells scheduler that is idle Executes when is idle SysThreadSwitch(True); Continue; end else begin // thread tells scheduler is not idle SysThreadActive; end; EthPacket := Packet.Data;... Executes when is not idle

Experimentation Run a Web server in Toro as Qemu s guest 2 cores but only one used 512 MB, ~256 MB per core Generate N http requests and then stop, repeat it every X time Measure the CPU usage of the Qemu process in the host with Top Experiment and Compare the following cases: 1. Toro with and without the improvements 2. Toro and Apache running as Qemu s guest

Toro guest without improvement1 1 Toro guest with improvement1 Thanks to Cesar Bernardini! mesarpe@gmail.com

Toro guest 2 cores, 512MB1 1 Linux Ubuntu guest 2 cores, 512MB1 Thanks to Cesar Bernardini! mesarpe@gmail.com

Toro guest 2 cores, 512MB1 1 Linux Ubuntu guest 2 cores, 512MB1 Thanks to Cesar Bernardini! mesarpe@gmail.com

Take-away lessons CPU usage of VMs becomes very important in production because CPU is a shared resource The presented solution has reduced the CPU usage in a half, however a complete power management solution must also scale the CPU, i.e., processor in P-State Some solutions may depend on the hypervisor and it ability to emulate some instructions, e.g., mwat/mcontrol

Questions?

Thanks! www.torokernel.io matiasevara@gmail.com