Virtualization and Virtual Machines. CS522 Principles of Computer Systems Dr. Edouard Bugnion

Similar documents
Virtualization. Pradipta De

Virtual Machines. To do. q VM over time q Implementation methods q Hardware features supporting VM q Next time: Midterm?

Virtualization. Starting Point: A Physical Machine. What is a Virtual Machine? Virtualization Properties. Types of Virtualization

Virtualization. ! Physical Hardware Processors, memory, chipset, I/O devices, etc. Resources often grossly underutilized

Introduction to Virtual Machines. Carl Waldspurger (SB SM 89 PhD 95) VMware R&D

COMPUTER ARCHITECTURE. Virtualization and Memory Hierarchy

Virtual Machines Disco and Xen (Lecture 10, cs262a) Ion Stoica & Ali Ghodsi UC Berkeley February 26, 2018

Virtualization. Operating Systems, 2016, Meni Adler, Danny Hendler & Amnon Meisels

Disco: Running Commodity Operating Systems on Scalable Multiprocessors

Virtualization. Dr. Yingwu Zhu

Virtualization and memory hierarchy

The Architecture of Virtual Machines Lecture for the Embedded Systems Course CSD, University of Crete (April 29, 2014)

The Challenges of X86 Hardware Virtualization. GCC- Virtualization: Rajeev Wankar 36

DISCO and Virtualization

CSE 120 Principles of Operating Systems

OS Virtualization. Why Virtualize? Introduction. Virtualization Basics 12/10/2012. Motivation. Types of Virtualization.

Virtual Machines. Part 2: starting 19 years ago. Operating Systems In Depth IX 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

Virtualization. Adam Belay

Overview of System Virtualization: The most powerful platform for program analysis and system security. Zhiqiang Lin

G Disco. Robert Grimm New York University

references Virtualization services Topics Virtualization

Virtual Machine Monitors (VMMs) are a hot topic in

Spring 2017 :: CSE 506. Introduction to. Virtual Machines. Nima Honarmand

Background. IBM sold expensive mainframes to large organizations. Monitor sits between one or more OSes and HW

Knut Omang Ifi/Oracle 6 Nov, 2017

System Virtual Machines

Nested Virtualization and Server Consolidation

Advanced Operating Systems (CS 202) Virtualization

VIRTUALIZATION: IBM VM/370 AND XEN

Chapter 5 C. Virtual machines

System Virtual Machines

Module 1: Virtualization. Types of Interfaces

COS 318: Operating Systems. Virtual Machine Monitors

Distributed Systems COMP 212. Lecture 18 Othon Michail

Micro VMMs and Nested Virtualization

Cloud Computing Virtualization

Linux and Xen. Andrea Sarro. andrea.sarro(at)quadrics.it. Linux Kernel Hacking Free Course IV Edition

Virtualization History and Future Trends

Virtual Machines. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Virtualization, Xen and Denali

CS 350 Winter 2011 Current Topics: Virtual Machines + Solid State Drives

CprE Virtualization. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

W4118: virtual machines

CS5460: Operating Systems. Lecture: Virtualization. Anton Burtsev March, 2013

CS370 Operating Systems

What is KVM? KVM patch. Modern hypervisors must do many things that are already done by OSs Scheduler, Memory management, I/O stacks

CSCE 410/611: Virtualization!

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University

CS533 Concepts of Operating Systems. Jonathan Walpole

Server Virtualization Approaches

COLORADO, USA; 2 Usov Aleksey Yevgenyevich - Technical Architect, RUSSIAN GOVT INSURANCE, MOSCOW; 3 Kropachev Artemii Vasilyevich Manager,

CSCE 410/611: Virtualization

Virtual Machines. 2 Disco: Running Commodity Operating Systems on Scalable Multiprocessors([1])

Intel s Virtualization Extensions (VT-x) So you want to build a hypervisor?


CS-580K/480K Advanced Topics in Cloud Computing. VM Virtualization II

Virtual Machine Systems

Learning Outcomes. Extended OS. Observations Operating systems provide well defined interfaces. Virtual Machines. Interface Levels

CS 550 Operating Systems Spring Introduction to Virtual Machines

Introduction to Virtualization

Operating Systems 4/27/2015

24-vm.txt Mon Nov 21 22:13: Notes on Virtual Machines , Fall 2011 Carnegie Mellon University Randal E. Bryant.

Xen and the Art of Virtualization. CSE-291 (Cloud Computing) Fall 2016

Disco. CS380L: Mike Dahlin. September 13, This week: Disco and Exokernel. One lesson: If at first you don t succeed, try try again.

CS 5600 Computer Systems. Lecture 11: Virtual Machine Monitors

Virtual Machine Monitors!

Lecture 4: Extensibility (and finishing virtual machines) CSC 469H1F Fall 2006 Angela Demke Brown

Virtual Virtual Memory

Multiprocessor Scheduling. Multiprocessor Scheduling

Today s Papers. Virtual Machines Background. Why Virtualize? EECS 262a Advanced Topics in Computer Systems Lecture 19

Virtual Machine Security

CHAPTER 16 - VIRTUAL MACHINES

Lecture 7. Xen and the Art of Virtualization. Paul Braham, Boris Dragovic, Keir Fraser et al. 16 November, Advanced Operating Systems

Performance & Scalability Testing in Virtual Environment Hemant Gaidhani, Senior Technical Marketing Manager, VMware

Concepts. Virtualization

Fast access ===> use map to find object. HW == SW ===> map is in HW or SW or combo. Extend range ===> longer, hierarchical names

CSC 5930/9010 Cloud S & P: Virtualization

Πποχωπημένη Κατανεμημένη Υπολογιστική

Modern systems: multicore issues

COSC6376 Cloud Computing Lecture 14: CPU and I/O Virtualization

Xen and the Art of Virtualization. Nikola Gvozdiev Georgian Mihaila

Lecture 5: February 3

The Future of Virtualization

CLOUD COMPUTING IT0530. G.JEYA BHARATHI Asst.Prof.(O.G) Department of IT SRM University

Virtualization. Part 1 Concepts & XEN

Introduction to Cloud Computing and Virtualization. Mayank Mishra Sujesha Sudevalayam PhD Students CSE, IIT Bombay

I/O virtualization. Jiang, Yunhong Yang, Xiaowei Software and Service Group 2009 虚拟化技术全国高校师资研讨班

CS5460: Operating Systems

Virtualisation: The KVM Way. Amit Shah

I/O and virtualization

CSE543 - Computer and Network Security Module: Virtualization

A Survey on Virtualization Technologies

CSCI 8530 Advanced Operating Systems. Part 19 Virtualization

Mach External pager. IPC Costs. Why the difference? Example of IPC Performance. First generation microkernels were slow Mach, Chorus, Amoeba

EE 660: Computer Architecture Cloud Architecture: Virtualization

0x1A Great Papers in Computer Security

Nested Virtualization Update From Intel. Xiantao Zhang, Eddie Dong Intel Corporation

CIS Operating Systems CPU Mode. Professor Qiang Zeng Spring 2018

CSE Memory Hierarchy Design Ch. 5 (Hennessy and Patterson)

CHAPTER 16 - VIRTUAL MACHINES

Transcription:

Virtualization and Virtual Machines CS522 Principles of Computer Systems Dr. Edouard Bugnion

Virtualization and Virtual Machines 2

This week Introduction, definitions, A short history of virtualization On VMM construction Case study Resource management in virtualized systems Case study VMM construction for non-virtualizeable architectures 3

Virtualization Virtualization application of fundamental principles of computer systems Abstraction & Modularity Layering & Indirection Interpretation Motivation reduce complexity in computer systems Enable automation Virtual machines are only one example of virtualization 4

Definition Layer that exports the same abstraction as the layer that it relies upon 5

Definition Layer that exports the same Virtual X abstraction as the layer that it relies upon Hides the physical names from the abstraction (isolation) Virtualization enforces modularity Relies on combination of: X (physical) multiplexing aggregation emulation 6

Virtualization within an OS [Salzer/Kasshoek ch 5.1] Threads Virtual X Virtual CPUs Virtual memory Most significant innovation in computing since von Neumann architecture Sockets, pipes Virtual links,... Virtual disks X (physical) Volumes, RAID,... 7

3 Virtualization Mechanisms Multiplexing Expose one resource multiple times Isolation through indirection Often with hardware support E.g. Virtual memory. 8

3 Virtualization Mechanisms Multiplexing Expose one resource multiple times Isolation through indirection Often with hardware support E.g. Virtual memory. Aggregation Expose multiple resources as a single resource Often with enhanced capabilities (availability, performance) E.g. RAID, link aggregation, 9

3 Virtualization Mechanisms Multiplexing Expose one resource multiple times Isolation through indirection Often with hardware support E.g. Virtual memory. Aggregation Expose multiple resources as a single resource Often with enhanced capabilities (availability, performance) E.g. RAID, link aggregation, Emulation Use software to emulate a virtual resource different than the physical resource E.g. RAM disk, virtual tape, byte code, 10

Architectural support for virtualization When the resource is architected (i.e. designed) to enable virtualization in a straightforward manner without having to rely on generalized emulation (big hammer) Attribute of the underlying hardware architecture Applies to various elements of computer system processors (Intel VT-x, AMD-v, ARM v7) memory management unit (e.g. TLB, EPT) i/o devices,... 11

Closing thought Two forms of layering Create a new abstraction Export the same abstraction (=virtualization) Many forms of virtualization: of a computer system, a disk, a database, a broadcast domain, an application server,. When should you virtualize (rather than create a new abstraction)? What should you virtualize? 12

Next: Virtualization and Virtual Machines 13

delete 14

Virtualization by multiplexing (1/3) Expose a resource among multiple virtual entities E.g., CPU, memory, machine, link Isolation generally ensured through indirection (which hides physical names) Examples: OS abstractions of thread (virtual CPU), address space (virtual memory), pipes (virtual links),... Storage abstractions of volumes / LUNs Virtual Machine Monitor: abstraction of virtual machine (virtual computer) 15

Virtualization by aggregation (2/3) Aggregate multiple physical resources into a single virtual resource Typically with enhanced availability and scalability properties Examples: Storage (RAID 0, RAID 1, RAID 1+0, RAID 5) Networking : Link aggregation, L4-L7 load balancing, proxies,... Applications (active/passive clusters) 16

Virtualization by emulation (3/3) Use software to emulate a virtual resource different than the underlying physical resource Example: RAM disk, virtual tapes,... binary translation, byte-code runtimes,... virtual devices 17

Virtual Machines a historical perspective CS522 Principles of Computer Systems Dr. Edouard Bugnion

Virtualization and Virtual Machines 2

Popek/Goldberg [1974] A virtual machine is an efficient, isolated duplicate of the real machine The Virtual Machine Monitor is in complete control of the system uses the hardware so that VMM programs show at worst minor decreases in speed 3

Early Virtual Machines Early mainframe era Few platforms! allowed sharing of resources Limited OS functionality! enabled rapid innovation Hardware support for virtualization Solid set of principles and requirements [Popek/Goldberg 1974] 4

Early Virtual Machines Early mainframe era Few platforms! allowed sharing of resources Limited OS functionality! enabled rapid innovation Hardware support for virtualization Solid set of principles and requirements [Popek/Goldberg 1974] Rejected in favor of operating systems Innovation shifted to deliver new capabilities within OS (e.g., IBM MVS) Loss of architectural support for virtualization By mid-1990 s: virtual machines were largely viewed as a relic of early computing era 5

The Return of Virtual Machines Why consider bringing them back? When the OS fails Missing functionality Poorly adapted to hardware Too complex and slow to innovate Because the OS is special There can only be one (!) Near monopolies 6

The Return of Virtual Machines Why consider bringing them back? When the OS fails Missing functionality Poorly adapted to hardware Too complex and slow to innovate Because the OS is special There can only be one (!) App OS HW App Near monopolies Innovate below the OS 7

Next: On VMM construction 8

Virtualization VMM construction CS522-Principles of Computer Systems Dr. Edouard Bugnion

VMM multiplex and emulate Multiplex CPU and memory Each VM has its own virtual CPU/core P=core; V=core Each VM has its own virtual physical memory P=physical memory; V=physical memory Emulate Sensitive instructions executed by the VM ( trap-and-emulate ) I/O devices: virtual disk, virtual keyboard, virtual screen, virtual NIC, 2

Review OS/App layering Key observation application executes (unprivileged) instructions directly on the CPU Applica+on$layer$(user4mode)$$ API (e.g. POSIX+x86)! $ OS$layer$(kernel4mode)$ $ HW/SW interface (i.e. ISA)! CPU$ Fig: see Saltzer/Kaashoek 2.16. 3

Adding a layer of indirection Virtual CPU == virtualize the HW/SW interface (in particular ISA) Standard virtualization trick expose the same abstraction V as the underlying resource P and everything else will still work Applica+on$layer$(user4mode)$$ API (e.g. POSIX+x86)! $ OS$layer$(kernel4mode)$ $ HW/SW interface (e.g. x86)! VMM$ HW/SW interface (e.g. x86)! x86$ 4

Multiple VMs space/time multiplexing Multiplexing in space and in time N VMs with n vcpu each on M hardware core N * n can be greater than M! scheduling similar to OS VM are isolated... core% core% core% core% VMM$ core% core% core% core% x86$ 5

Efficiency through direct execution When P=x86 && V=x86; use x86 whenever possible How / when? Application (user-mode) $ API (e.g. POSIX+x86)! OS (kernel-mode) $ $ $ HW/SW interface (e.g. x86)! $$$VMM $ $ $ $ HW/SW interface (e.g. x86)! x86$ $ 6

Popek / Goldberg Theorem [1974] Defines a property of an ISA Classification of instructions Privileged kernel mode only Sensitive -- semantics vary (other than trap) based on privilege level Theorem -- A VMM using only direct execution is possible only if all sensitive instructions are privileged. Intuition: construst a VMM that: Runs guest kernel in user mode trap-and-emulate of all sensitive instructions Details in the paper 7

Virtualizing physical memory Challenge efficient multiplexing of physical memory 8

Virtualizing physical memory Challenge efficient multiplexing of physical memory Two levels of indirection: VA! gpa (managed by OS) gpa! hpa (managed by VMM) OS TLB TLB maps virtual to (host-)physical VMM address Two possible answers Shadow page tables. Nested page tables. 9

Shadow Page Tables (no architectural support)............... cr3...... VM............... VMM cr3......... VMM area 10

Nested page tables (aka architectural support) Hardware implementation that simultaneously walks through two page table structures 2 hardware registers (P-cr3 and DP-cr3) determine roots Available on all current Intel and AMD processors 11

type I and type II VMMs type I VMMs allocate and schedule physical resources (cpu, mem, io) among the virtual machines. type II VMMs rely on a separate host operating system for resource scheduling. Efficient resource scheduling is fundamental to any layered design Specific complexities with virtual machines 12

type I and type II VMMs type I VMMs allocate and schedule physical resources (cpu, mem, io) among the virtual machines. type II VMMs rely on a separate host operating system for resource scheduling. Efficient resource scheduling is fundamental to any layered design Specific complexities with virtual machines Examples of current tools type I architectures Xen (open-source) VMware vsphere Microsoft Hyper-V type II architectures KVM (Linux host) (open-source) VMware Workstation (Windows host) and Fusion (OX X host) Parallels (Windows and OS X hosts) 13

Wrap-up 14

Virtualization Case study Memory resource management in Disco CS522 Principles of Computer Systems Dr. Edouard Bugnion

Memory Resource Management in Disco Disco: Running Commodity Operating Systems on Scalable Multiprocessors [SOSP 97, TOCS 97, SIGOPS HOF 08] Early paper that returned interest in VMM Technical contributions Memory resource management (ccnuma optimizations) Approach influenced solutions found in today s commercial products 2

Stanford FLASH + Disco IRIX& IRIX& IRIX& IRIX& SPLASHOS& Disco& PE PE PE PE PE PE PE PE Interconnect CC NUMA Multiprocessor Disco highly-scalable Virtual Machine Monitor for a scalable ccnuma machine Memory Resource Management goals: (1) minimize memory overheads; (2) optimize locality 3

Memory virtualization using shadow page tables Virtual address Guest-physical address OS TLB Host-physical address (aka machine address) VMM Key building block of VMMs: Multiplexing : of physical memory among virtual machines Indirection: inserting virtual --> host-physical mappings in TLB Resource management - allocation of guest-physical to host-physical mappings 4

Memory resource management Known challenge when memory is over-committed VMM controls additional level of indirection the guest-physical --> machine mappings can change mappings at any time can force them to be read-only 5

Double swapping problem VMM is under memory pressure, swaps page X Later, OS is under memory pressure, decides to swap page X VMM must first (i) swap out another page, and (ii) swap page X back in Only then can OS swap X out. OS VMM HW 6

Global memory efficiency Workload split in N virtual machines of a virtual cluster Processes can be partitioned File system buffer cache is replicated, with mostly identical content How to transparently share the buffer caches without involving the OS? Buffer cache BC BC BC BC VMM HW HW 7

Transparent memory sharing Solution: transparently share physical memory between virtual machines Identify know identical pages by: Interposing on DMA Interposing on network traffic 8

Interpose on DMA shared base disk N copies of nearly identical filesystems OS OS On SCSI READ to the base disk: Associate page with disk offset Protect the page; share it read-only Avoid subsequent SCSI READs Delta Disk Disco Copy-on-write (memory) Triggered by page fault Copy-on-write (disk) Triggered by SCSI write Shared base disk 9

Background VM networking Modern OS have built-in networking stacks OS OS tcp ip eth udp tcp ip eth udp Disco Virtual Ethernet NIC per VM Software Ethernet switch between VMs Virtual machines communicate like real machines Memory-based Disco Ethernet switch 10

Interpose on network traffic N NFS clients + 1 NFS sever Disco s in-memory Ethernet switch NFS Server Client Copies small fragments Remaps page-size fragments Operating system modification IRIX s NFS code copy using hypercall Disco Memory-based Ethernet switch! Transparently share filesystem buffer caches of NFS clients and servers. Example: NFS Read reply 11

Memory resource management today As important as ever VM memory size increases VM density increases VM migration increases Limitation of Disco s mechanisms: Applies only to pages that are a-priori identical because of I/O interposition Does not solve the double buffering problem Key improvements : Carl Waldspurger, Memory Resource Management in ESX Server, OSDI 2002 12

Thank you 13

Virtualization Case study Bringing virtualization to x86 CS522 Principles of Computer Systems Dr. Edouard Bugnion

Welcome back to 1998-99 Bringing Virtualization to the x86 architecture with VMware Workstation 1.0 ACM System Software Award 2009 [TOCS 2012] 2

Challenges x86 architecture is not virtualizable 17 instructions violate Popek/Goldberg criteria Need to run unmodified OS (Windows!) x86 architecture is of daunting complexity Many protection mechanisms and execution modes None virtualizable (save for v8066) Diversity of peripherals With vendor-specific drivers Little documentation or source Need for a simple user experience No reinstall 3

Two contributions How to create a type II VMM running on top of a protected commodity host operating systems Addresses the device diversity and user experience challenges How to virtualize the x86 architecture (which fails the Popek/ Goldberg criteria) Addresses pragmatically the x86 challenges Principle: focus on requirements of supported guest operating systems 4

#1: Simplicity and diversity Solution: Operate VMM as an type II hypervisor extension of an existing (host) operating system Challenge: constraints from the host operating system Address space Exception handlers, 5

VMware Hosted Architecture User space Any App System space Host OS CPU 6

VMware Hosted Architecture User space Any App VMware System space Host OS CPU Host Context 7

VMware Hosted Architecture User space Any App VMX (App) App Guest OS App VM System space Host OS Kernel driver VMM CPU Host Context VMM context 8

#2: Virtualizing the x86 architecture Challenge can t use classic technique trap-and-emulate (direct execution) is not sufficient 17 sensitive, non-privileged instructions Need to support binary compatibility Observation most x86 sensitive instructions are only sensitive when executing system software Solution Use trap-and-emulate when possible, i.e. when there are no sensitive instructions Use binary translation when necessary and faithfully emulate the semantics of all instructions (including the sensitive ones) 9

Basics of Dynamic Binary Translation Translator convert basic block of VM instructions into a executable sequence OS Convert sensitive instructions Translation cache buffer holding many cached translations Dispatch function jumps into translation cache VMM Challenge performance suitable for VMM 10

Most instructions are translated 1:1 Source Instructions Translation Cache... t1 t2 t3 t4 mov 12(%ebp), %eax inc 8(%eax) call 0(%ebx) 1: mov 12(%ebp), %eax 2: inc 8(%eax) 3: mov %eax, <gs>:bt.tmp_eax 4: mov 0(%ebx), %eax 5: push #t4 6: mov %eax, <gs>:vcpu.eip 7: mov <gs>:bt.tmp_eax, %eax 8: jmp fastdispatch No software relocation, sandboxing or SFI required Rely on hardware segmentation for protection... Use <gs> to access VMM area 11

Decision algorithm Direct execution only possible in constrained situations Running untrusted code that cannot enable/disable interrupts (or has interrupts enabled) Running either protected mode or v8086 mode (not real mode) All segments are reversible CPU version of segment matches memory copy 12

Key decision algorithm what is system? System = when the possibly sensitive instructions are actually cpl=0 iopl=3 DE Non-rev. segments real mode sensitive Direct execution only possible outside of system modes, i.e when: Running untrusted, user-level code that cannot enable/disable interrupts Running in protected mode or in v8086 mode Running with all segments reversible Details in the paper 13

Virtualization since 2005: Intel VT-x and AMD-v Available on all current 64-bit processors Duplicate the 4 protection rings Meets Popek/Goldberg criteria Used by all virtualization solutions today De9Privileged" 3" user" 2" 1" 0" kernel" De9Privileged" 3" user" 2" 1" 0" kernel" Virtual Machines Privileged" Privileged" 3" 2" 1" 0" user" 2" 1" 0" Host"/"VMM" Host/ VMM 14

Thank you. 15