Virtual Machines Disco and Xen (Lecture 10, cs262a) Ion Stoica & Ali Ghodsi UC Berkeley February 26, 2018

Similar documents
Disco: Running Commodity Operating Systems on Scalable Multiprocessors

G Disco. Robert Grimm New York University

CS533 Concepts of Operating Systems. Jonathan Walpole

VIRTUALIZATION: IBM VM/370 AND XEN

Xen and the Art of Virtualization. CSE-291 (Cloud Computing) Fall 2016

Lecture 7. Xen and the Art of Virtualization. Paul Braham, Boris Dragovic, Keir Fraser et al. 16 November, Advanced Operating Systems

VMMS: DISCO AND XEN CS6410. Ken Birman

Today s Papers. Virtual Machines Background. Why Virtualize? EECS 262a Advanced Topics in Computer Systems Lecture 19

Xen and the Art of Virtualization

Xen and the Art of Virtualization. Nikola Gvozdiev Georgian Mihaila

Xen and the Art of Virtualization

Background. IBM sold expensive mainframes to large organizations. Monitor sits between one or more OSes and HW

DISCO and Virtualization

Modern systems: multicore issues

Virtualization and Virtual Machines. CS522 Principles of Computer Systems Dr. Edouard Bugnion

Virtual Machines. 2 Disco: Running Commodity Operating Systems on Scalable Multiprocessors([1])

Disco: Running Commodity Operating Systems on Scalable Multiprocessors

Virtualization. Pradipta De

VM Migration, Containers (Lecture 12, cs262a)

INFLUENTIAL OS RESEARCH

Linux and Xen. Andrea Sarro. andrea.sarro(at)quadrics.it. Linux Kernel Hacking Free Course IV Edition

Advanced Operating Systems (CS 202) Virtualization

OS Virtualization. Why Virtualize? Introduction. Virtualization Basics 12/10/2012. Motivation. Types of Virtualization.

CSE 120 Principles of Operating Systems

Disco. CS380L: Mike Dahlin. September 13, This week: Disco and Exokernel. One lesson: If at first you don t succeed, try try again.

Virtualization, Xen and Denali

Virtual machine architecture and KVM analysis D 陳彥霖 B 郭宗倫

Disco: Running Commodity Operating Systems on Scalable Multiprocessors

COMPUTER ARCHITECTURE. Virtualization and Memory Hierarchy

CS 350 Winter 2011 Current Topics: Virtual Machines + Solid State Drives

Virtualization. Part 1 Concepts & XEN

Dynamic Translator-Based Virtualization

Virtual Memory. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Virtualization. Michael Tsai 2018/4/16

Xen and the Art of Virtualiza2on

Virtual Leverage: Server Consolidation in Open Source Environments. Margaret Lewis Commercial Software Strategist AMD

24-vm.txt Mon Nov 21 22:13: Notes on Virtual Machines , Fall 2011 Carnegie Mellon University Randal E. Bryant.

Virtual Machines. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

COS 318: Operating Systems. Virtual Machine Monitors

Virtualization and memory hierarchy

Virtual Machines. To do. q VM over time q Implementation methods q Hardware features supporting VM q Next time: Midterm?

Chapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs

CSC 5930/9010 Cloud S & P: Virtualization

CHAPTER 16 - VIRTUAL MACHINES

Background. IBM sold expensive mainframes to large organiza<ons. Monitor sits between one or more OSes and HW

Multiprocessor Scheduling. Multiprocessor Scheduling

Spring 2017 :: CSE 506. Introduction to. Virtual Machines. Nima Honarmand

CSCI 8530 Advanced Operating Systems. Part 19 Virtualization

MODERN SYSTEMS: EXTENSIBLE KERNELS AND CONTAINERS

Virtual Machine Monitors (VMMs) are a hot topic in

Chapter 5 C. Virtual machines

CS 550 Operating Systems Spring Introduction to Virtual Machines

Virtual Virtual Memory

CLOUD COMPUTING IT0530. G.JEYA BHARATHI Asst.Prof.(O.G) Department of IT SRM University

Supporting Isolation for Fault and Power Management with Fully Virtualized Memory Systems

Kernel Support for Paravirtualized Guest OS

Chapter 5 B. Large and Fast: Exploiting Memory Hierarchy

CS370 Operating Systems

Virtualization. Dr. Yingwu Zhu

Distributed Systems COMP 212. Lecture 18 Othon Michail

System Virtual Machines

System Virtual Machines

Virtualization with XEN. Trusted Computing CS599 Spring 2007 Arun Viswanathan University of Southern California

We ve seen lots of examples of virtualization. This is another: a virtual machine monitor. A VMM virtualizes an entire hardware machine.

Unit 5: Distributed, Real-Time, and Multimedia Systems

The Architecture of Virtual Machines Lecture for the Embedded Systems Course CSD, University of Crete (April 29, 2014)

Originally prepared by Lehigh graduate Greg Bosch; last modified April 2016 by B. Davison

Module 1: Virtualization. Types of Interfaces

The Challenges of X86 Hardware Virtualization. GCC- Virtualization: Rajeev Wankar 36

Virtual Machine Security

Lecture 5: February 3

Nested Virtualization and Server Consolidation

Mach External pager. IPC Costs. Why the difference? Example of IPC Performance. First generation microkernels were slow Mach, Chorus, Amoeba

Intel s Virtualization Extensions (VT-x) So you want to build a hypervisor?

Virtualization. Virtualization

Background. IBM sold expensive mainframes to large organiza<ons. Monitor sits between one or more OSes and HW

Cloud Computing Virtualization

Pmake NFS. Scientific App SMP OS. Thin OS DISCO PE PE PE PE PE PE PE PE. Interconnect. CC NUMA Multiprocessor

Last 2 Classes: Introduction to Operating Systems & C++ tutorial. Today: OS and Computer Architecture

NON SCHOLAE, SED VITAE

Distributed File Systems Issues. NFS (Network File System) AFS: Namespace. The Andrew File System (AFS) Operating Systems 11/19/2012 CSC 256/456 1

Introduction to Cloud Computing and Virtualization. Mayank Mishra Sujesha Sudevalayam PhD Students CSE, IIT Bombay

Virtualization. Operating Systems, 2016, Meni Adler, Danny Hendler & Amnon Meisels

HY225 Lecture 12: DRAM and Virtual Memory

CSC501 Operating Systems Principles. OS Structure

CSCE 410/611: Virtualization!

SCALABILITY AND HETEROGENEITY MICHAEL ROITZSCH

Concepts. Virtualization

The Multikernel: A new OS architecture for scalable multicore systems Baumann et al. Presentation: Mark Smith

references Virtualization services Topics Virtualization

An overview of virtual machine architecture

CS370: Operating Systems [Spring 2016] Dept. Of Computer Science, Colorado State University

Knut Omang Ifi/Oracle 6 Nov, 2017

OS concepts and structure. q OS components & interconnects q Structuring OSs q Next time: Processes

Performance Evaluation of Virtualization Technologies

Virtual Machines. Part 2: starting 19 years ago. Operating Systems In Depth IX 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

Overview of System Virtualization: The most powerful platform for program analysis and system security. Zhiqiang Lin

SERVE. -Priyal Lokhandwala

CSE 120 Principles of Operating Systems

CS 5204 Midterm 1 Solutions

Transcription:

Virtual Machines Disco and Xen (Lecture 10, cs262a) Ion Stoica & Ali Ghodsi UC Berkeley February 26, 2018

Today s Papers Disco: Running Commodity Operating Systems on Scalable Multiprocessors, Edouard Bugnion, Scott Devine, and Mendel Rosenblum, SOSP 97 (https://amplab.github.io/cs262a-fall2016/notes/disco.pdf) Xen and the Art of Virtualization, P. Barham, B. Dragovic, K Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt and A. Warfield, SOSP 03 (www.cl.cam.ac.uk/research/srg/netos/papers/2003-xensosp.pdf)

What is Virtual Machine? Virtual Machine: A fully protected and isolated copy of the underlying physical machine s hardware. -- definition by IBM Virtual Machine Monitor (aka Hypervisor): A thin layer of software that's between the hardware and the Operating system, virtualizing and managing all hardware resources

VM Classification: Type 1 vs Type 2 (VMM) VMM implemented directly on physical hardware VMM performs scheduling and allocation of system s resources E.g., IBM VM/370, Disco, VMware s ESX Server, Xen (http://www.ibm.com/developerworks/library/mw-1608-dejesus-bluemix-trs/index.html)

VM Classification: Type 1 vs Type 2 VMMs built completely on top of a host OS Host OS provides resource allocation and standard execution environment to each guest OS User-mode Linux (UML), UMLinux (VMM) (http://www.ibm.com/developerworks/library/mw-1608-dejesus-bluemix-trs/index.html)

VM Classification: Type 1 vs Type 2 (VMM) (VMM) (http://www.ibm.com/developerworks/library/mw-1608-dejesus-bluemix-trs/index.html)

Virtual Machine An Old Idea CP/CMS (Control Program/Cambridge Monitor System), 1968 Time sharing providing each user with a single-user OS Allow users to concurrently share a computer IBM System/370, 1972 Run multiple OSes Hot research topic in 60s-70s; entire conferences devoted to VMs

Why Virtual Machine? Multiplex an expensive machine Ability to run multiple OSes targeted for different workloads IBM System/360 Ability to run new OS versions side by side with a stable older version No problem if new OS breaks compatibility for existing apps; just run them in a VM on old OS

What Happened? Interest died out in 80s (why?) More powerful, cheaper machines Could deploy new OS on different machine More powerful OSes (e.g., Unix running on Workstations) No need to use VM to provide multi-user support Sun Workstation, late 80s

New Motivation (1990s) Multiprocessor in the market Innovative Hardware Hardware development faster than system software Customized OS are late, incompatible, and possibly buggy Commodity OSes not suited for multiprocessors Do not scale due to lock contention, memory architecture Do not isolate/contain faults; more processors, more failures

Two Approaches Modify OS to handle multiprocessor machines: Hive, Hurricane, Cellular-IRIX Innovative, single system image But large effort, can take a long time Hard-partition machine into independent failure units: OS-light Sun Enterprise10000 machine Partial single system image Cannot dynamically adapt the partitioning

Disco Solution Virtual Machine Monitor between hardware and OS running commercial OS Virtualization: Used to be: make a single resource appear as multiple resources Disco: make multiple resources appear like a single resource Trading off between performance and development costs

Disco Extend modern OS to run efficiently on shared memory multiprocessors with minimal OS changes A VMM built to run multiple copies of Silicon Graphics IRIX operating system on Stanford Flash multiprocessor IRIX Unix based OS Stanford FLASH: cache coherent NUMA

Disco Architecture

Challenges Facing Virtual Machines Overhead Trap and emulate privileged instructions of guest OS Access to I/O devices Replication of memory in each VM Resource Management Lack of information to make good policy decisions Communication and Sharing Hard to communicate between stand alone VMs

Disco s Interface Processors MIPS R10000 processor: emulates all instructions, MMU, trap architecture Extension to support common processor operations Enabling/disabling interrupts, accessing privileged registers Physical memory Contiguous, starting at address 0 I/O devices Virtualize devices like I/O, disks Physical devices multiplexed by Disco Special abstractions for SCSI disks and network interfaces Virtual disks for VMs Virtual subnet across all virtual machines

Disco Implementation Multi threaded shared memory program Attention to NUMA memory placement, cache aware data structures and IPC patterns Disco code copied to each flash processor Communicate using shared memory

Virtual CPUs Direct execution on real CPU: Set real CPU registers to those of virtual CPU Jump to current PC of virtual CPU Maintains data structure for each virtual CPU for trap emulation Trap examples: page faults, system calls, bus errors Scheduler multiplexes virtual CPU on real processor

Virtual Physical Memory Extra level of indirection: physical-to-machine address mappings VM sees contiguous physical addresses starting from 0 OS Map physical addresses to the 40 bit machine addresses used by FLASH When OS inserts a virtual-to-physical mapping into TLB, translates the physical address into corresponding machine address App VM OS VM TLB App virtual address physical address TLB virtual address physical address pmap physical address machine address Traditional OS TLB virtual address Disco machine address

Virtual Physical Memory Extra level of indirection: physical-to-machine address mappings VM sees contiguous physical addresses starting from 0 Map physical addresses to the 40 bit machine addresses used by FLASH When OS inserts a virtual-to-physical mapping into TLB, translates the physical address into corresponding machine address To quickly compute corrected TLB entry, Disco keeps a per VM pmap data structure that contains one entry per VM physical page Each entry in TLB tagged with address space identifier (ASID) to avoid flushing TLB on MMU context switches TLB flushed when scheduling a different virtual CPU on a physical processor Software TLB to cache recent virtual-to-machine translations

NUMA Memory Management Dynamic Page Migration and Replication Pages frequently accessed by one node are migrated Read-shared pages are replicated among all nodes Write-shared are not moved, since maintaining consistency requires remote access anyway Migration and replacement policy is driven by cache-miss-counting facility provided by the FLASH hardware

Transparent Page Replication 1. Two different virtual processors of same virtual machine logically read-share same physical page, but each virtual processor accesses a local copy 2. memmap maintains an entry for each machine page that contains which virtual page reference it; used during TLB shootdown* *Processors flushing their TLBs when another processor restrict access to a shared page.

Disco Memory Management

Transparent Page Sharing Global buffer cache that can be transparently shared between virtual machines

Transparent Page Sharing over NFS 1. The monitor s networking device remaps data page from source s machine address to destination s 2. The monitor remaps data page from driver s to client s buffer cache

Impact Revived VMs for the next 20 years Now VMs are commodity Every cloud provider, and virtually every enterprise uses VMs today VMWare successful commercialization of this work Founded by authors of Disco in 1998, $6B revenue today Initially targeted developers Killer application: workload consolidation and infrastructure management in enterprises

Summary Disco VMM hides NUMA-ness from non-numa aware OS Disco VMM is low effort Only 13K LoC Moderate overhead due to virtualization Only 16% overhead for uniprocessor workloads System with eight virtual machines can run some workloads 40% faster

Xen

Xen Motivation Performance Goal to run 100 VM simultaneously Functionality

Virtualization vs Paravirtualization Full Virtualization: expose hardware as being functionally identical to the physical hardware (e.g., Disco, VMWare) Cannot access the hardware Challenging to emulate some privileged instructions (e.g., traps) Hard to provide real-time guarantees Paravirtualization: just expose a similar architecture, not 100% identical to physical hardware (e.g., Xen) Better performance Need modifications to the OS Still no modifications to applications

The Cost of Porting an OS to Xen Privileged instructions Page table access Network driver Block device driver <2% of code-base

Xen Terminology Guest OS: OS hosted by VM Domain: VM within which a guest operating system executes Guest OS and domain analogous to program and process Hypervisor: VMM

Xen s Architecture Domain0 hosts application-level management software E.g., creation/deletion of virtual network interfaces and block devices

CPU X86 supports 4 privilege levels 0 for OS, and 3 for applications Xen downgrades OS to level 1 System-call and page-fault handlers registered to Xen fast handlers for most exceptions, doesn t involve Xen

CPU Scheduling: Borrowed Virtual Time Scheduling Fair sharing scheduler: effective isolation between domains However, allows temporary violations of fair sharing to favor recently-woken domains Reduce wake-up latency à improves interactivity

Times and Timers Times: Real time since machine boot: always advances regardless of the executing domain Virtual time: time that only advances within the context of the domain Wall-clock time Each guest OS can program timers for both: Real time Virtual time

Control Transfer: Hypercalls and Events Hypercalls: synchronous calls from a domain to Xen Allows domains to perform privileged operation via software traps Similar to system calls Events: asynchronous notifications from Xen to domains Replace device interrupts

Exceptions Memory faults and software traps Virtualized through Xen s event handler

Memory Physical memory Reserved at domain creation time Statically partitioned among domains Virtual memory: OS allocates a page table from its own reservation and registers it with Xen OS gives up all direct privileges on page table to Xen All subsequent updates to page table must be validated by Xen Guest OS typically batch updates to amortize hypervisor calls

Data Transfer: I/O Rings Zero-copy semantics Data is transferred to/from domains via Xen through a buffer ring

History and Impact Released in 2003 (Cambridge University) Authors founded XenSource, acquired by Citrix for $500M in 2007 Widely used today: Linux supports dom0 in Linux s mainline kernel AWS EC2 based on Xen

Discussion: OSes, VMs, Containers App A App B App A Bins/L ibs App B Bins/L ibs OS Services App A Bins/L ibs Extensions App B Bins/L ibs App A LibOS App B LibOS Bins/L ibs Guest OS Bins/L ibs Guest OS Hypervisor App A Bins/L ibs App B Bins/L ibs Microkernel Hardware SPIN Hardware Exokernel Hardware Host OS Hardware Host OS Hardware Microkernels Extensible OSes VMs Containers