Advanced Operating Systems (CS 202) Virtualization

Similar documents
Xen and the Art of Virtualization. CSE-291 (Cloud Computing) Fall 2016

Hardware- assisted Virtualization

Lecture 7. Xen and the Art of Virtualization. Paul Braham, Boris Dragovic, Keir Fraser et al. 16 November, Advanced Operating Systems

CS 5600 Computer Systems. Lecture 11: Virtual Machine Monitors

CSE 120 Principles of Operating Systems

Virtual Machines. Part 2: starting 19 years ago. Operating Systems In Depth IX 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

Virtualization. Starting Point: A Physical Machine. What is a Virtual Machine? Virtualization Properties. Types of Virtualization

Virtualization. ! Physical Hardware Processors, memory, chipset, I/O devices, etc. Resources often grossly underutilized

Virtual Machine Monitors (VMMs) are a hot topic in

Chapter 5 C. Virtual machines

Background. IBM sold expensive mainframes to large organizations. Monitor sits between one or more OSes and HW

OS Virtualization. Why Virtualize? Introduction. Virtualization Basics 12/10/2012. Motivation. Types of Virtualization.

Virtualization. Operating Systems, 2016, Meni Adler, Danny Hendler & Amnon Meisels

Xen and the Art of Virtualization. Nikola Gvozdiev Georgian Mihaila

Cloud Computing Virtualization


COMPUTER ARCHITECTURE. Virtualization and Memory Hierarchy

Module 1: Virtualization. Types of Interfaces

CS370 Operating Systems

[537] Virtual Machines. Tyler Harter

Virtual Machines Disco and Xen (Lecture 10, cs262a) Ion Stoica & Ali Ghodsi UC Berkeley February 26, 2018

Virtualization, Xen and Denali

Spring 2017 :: CSE 506. Introduction to. Virtual Machines. Nima Honarmand

Introduction to Virtual Machines. Carl Waldspurger (SB SM 89 PhD 95) VMware R&D

Intel VMX technology

24-vm.txt Mon Nov 21 22:13: Notes on Virtual Machines , Fall 2011 Carnegie Mellon University Randal E. Bryant.

CS-580K/480K Advanced Topics in Cloud Computing. VM Virtualization II

DISCO and Virtualization

COSC6376 Cloud Computing Lecture 14: CPU and I/O Virtualization

Xen and the Art of Virtualization

Virtualization. Pradipta De

Virtualization. Adam Belay

references Virtualization services Topics Virtualization

Overview of System Virtualization: The most powerful platform for program analysis and system security. Zhiqiang Lin

Linux and Xen. Andrea Sarro. andrea.sarro(at)quadrics.it. Linux Kernel Hacking Free Course IV Edition

The Challenges of X86 Hardware Virtualization. GCC- Virtualization: Rajeev Wankar 36

Micro VMMs and Nested Virtualization

CS 350 Winter 2011 Current Topics: Virtual Machines + Solid State Drives

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University

What is KVM? KVM patch. Modern hypervisors must do many things that are already done by OSs Scheduler, Memory management, I/O stacks

Virtualization and memory hierarchy

CS 550 Operating Systems Spring Introduction to Virtual Machines

CSC 5930/9010 Cloud S & P: Virtualization

Virtual Machines. To do. q VM over time q Implementation methods q Hardware features supporting VM q Next time: Midterm?

Intel s Virtualization Extensions (VT-x) So you want to build a hypervisor?

Virtual Virtual Memory

Knut Omang Ifi/Oracle 6 Nov, 2017

Lecture 5: February 3

Intel Virtualization Technology Roadmap and VT-d Support in Xen

Making Nested Virtualization Real by Using Hardware Virtualization Features

Advanced Systems Security: Virtual Machine Systems

Operating Systems 4/27/2015

VIRTUALIZATION: IBM VM/370 AND XEN

Virtual Memory. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Computer Architecture Background

CHAPTER 16 - VIRTUAL MACHINES

COSC 6385 Computer Architecture. Virtualizing Compute Resources

Multiprocessor Scheduling. Multiprocessor Scheduling

Virtualization and Virtual Machines. CS522 Principles of Computer Systems Dr. Edouard Bugnion

Pre-virtualization internals

Xen is not just paravirtualization

The Architecture of Virtual Machines Lecture for the Embedded Systems Course CSD, University of Crete (April 29, 2014)

Nested Virtualization and Server Consolidation

Nested Virtualization Friendly KVM

COS 318: Operating Systems. Virtual Machine Monitors

G Xen and Nooks. Robert Grimm New York University

I/O virtualization. Jiang, Yunhong Yang, Xiaowei Software and Service Group 2009 虚拟化技术全国高校师资研讨班

Master s Thesis! Improvement of the Virtualization Support in the Fiasco.OC Microkernel! Julius Werner!

To EL2, and Beyond! connect.linaro.org. Optimizing the Design and Implementation of KVM/ARM

Virtualization History and Future Trends

CprE Virtualization. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

Fast access ===> use map to find object. HW == SW ===> map is in HW or SW or combo. Extend range ===> longer, hierarchical names

Network device virtualization: issues and solutions

Server Virtualization Approaches

Today s Papers. Virtual Machines Background. Why Virtualize? EECS 262a Advanced Topics in Computer Systems Lecture 19

Cross-architecture Virtualisation

CS533 Concepts of Operating Systems. Jonathan Walpole


Xen and the Art of Virtualization

Background. IBM sold expensive mainframes to large organiza<ons. Monitor sits between one or more OSes and HW

Virtual Machines. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Task Scheduling of Real- Time Media Processing with Hardware-Assisted Virtualization Heikki Holopainen

1 Virtualization Recap

Fast access ===> use map to find object. HW == SW ===> map is in HW or SW or combo. Extend range ===> longer, hierarchical names

I/O and virtualization

Virtualization with XEN. Trusted Computing CS599 Spring 2007 Arun Viswanathan University of Southern California

VMMS: DISCO AND XEN CS6410. Ken Birman

Xen and the Art of Virtualiza2on

Part 1: Introduction to device drivers Part 2: Overview of research on device driver reliability Part 3: Device drivers research at ERTOS

KVM/ARM. Linux Symposium Christoffer Dall and Jason Nieh

CS370 Operating Systems

CSCI 8530 Advanced Operating Systems. Part 19 Virtualization

CS 152 Computer Architecture and Engineering

CLOUD COMPUTING IT0530. G.JEYA BHARATHI Asst.Prof.(O.G) Department of IT SRM University

Introduction Construction State of the Art. Virtualization. Bernhard Kauer OS Group TU Dresden Dresden,

Nested Virtualization Update From Intel. Xiantao Zhang, Eddie Dong Intel Corporation

Virtualization. Virtualization

Chapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs

System Virtual Machines

Virtualisation: The KVM Way. Amit Shah

Transcription:

Advanced Operating Systems (CS 202) Virtualization

Virtualization One of the natural consequences of the extensibility research we discussed What is virtualization and what are the benefits? 2

Virtualization motivation Cost: multiplex multiple virtual machines on one hardware machine Cloud computing, data center virtualization Why not processes? Why not containers? Heterogeneity: Allow one machine to support multiple OS s Maintaining compatibility Other: security, migration, energy optimization, customization, 3

How do we virtualize? Create an operating system to multiplex resources among operating systems! Exports a virtual machine to the Operating systems Called a hypervisor of Virtual Machine Monitor 4

VIRTUALIZATION MODELS 5

Two types of hypervisors Type 1: Native (bare metal) Hypervisor runs on top of the bare metal machine e.g., KVM Type 2: Hosted Hypervisor is an emulator e.g., VMWare, virtual box, QEMU 6

Hybrid organizations Some hybrids exist, e.g., Xen Mostly bare metal VM0/Dom0 to keep device drivers out of VMM 7

Stepping back some history IBM VM 370 (1970s) Microkernels (late 80s/90s) Extensibility (90s) SIMOS (late 90s) Eventually became VMWare (2000) Xen, Vmware, others (2000s) Ubiquitous use, Cloud computing, data centers, Makes computing a utility 8

Full virtualization Idea: run guest operating systems unmodified However, supervisor is the real privileged software When OS executes privileged instruction, trap to hypervisor who executes it for the OS This can be very expensive Also, subject to quirks of the architecture Example, x86 fails silently if some privileged instructions execute without privilege E.g., popf 9

Example: Disable Interrupts Guest OS tries to disable interrupts the instruction is trapped by the VMM which makes a note that interrupts are disabled for that virtual machine Interrupts arrive for that machine Buffered at the VMM layer until the guest OS enables interrupts. Other interrupts are directed to VMs that have not disabled them

Binary translation--making full virtualization practical Use binary translation to modify OS to rewrite silent failure instructions More aggressive translation can be used Translate OS mode instructions to equivalent VMM instructions Some operations still expensive Cache for future use Used by VMWare ESXi and Microsoft Virtual Server Performance on x86 typically ~80-95% of native 11

Binary Translation Example Guest OS Assembly do_atomic_operation: cli mov eax, 1 xchg eax, [lock_addr] test eax, eax jnz spinlock mov [lock_addr], 0 sti ret Translated Assembly do_atomic_operation: call [vmm_disable_interrupts] mov eax, 1 xchg eax, [lock_addr] test eax, eax jnz spinlock mov [lock_addr], 0 call [vmm_enable_interrupts] ret 12

Paravirtualization Modify the OS to make it aware of the hypervisor Can avoid the tricky features Aware of the fact it is virtualized Can implement optimizations Comparison to binary translation? Amount of code change? 1.36% of Linux, 0.04% for Windows 13

Hardware supported virtualization (Intel VT-x, AMD-V) Hardware support for virtualization Makes implementing VMMs much simpler Streamlines communication between VM and OS Removes the need for paravirtualization/binary translation EPT: Support for shadow page tables More later 14

NUTS AND BOLTS 15

What needs to be done? Virtualize hardware Memory hierarchy CPUs Devices Implement data and control transfer between guests and hypervisor We ll cover this by example Xen paper Slides modified from presentation by Jianmin Chen 16

Xen Design principles: Unmodified applications: essential Full-blown multi-task O/Ss: essential Paravirtualization: necessary for performance and isolation

Xen

Implementation summary 19

Xen VM interface: Memory Memory management Guest cannot install highest privilege level segment descriptors; top end of linear address space is not accessible Guest has direct (not trapped) read access to hardware page tables; writes are trapped and handled by the VMM Physical memory presented to guest is not necessarily contiguous

Two Layers of Virtual Memory Virtual address à physical address Physical address à machine address Guest OS s View of RAM 0xFFFFFFFF Host OS s View of RAM Guest App s View of RAM 0xFFFF Page 2 0xFF Page 0 Page 0 Page 3 Page 2 Page 1 Page 1 Page 3 0x00 Page 0 Known to the guest OS 0x0000 Page 3 Page 2 Unknown to the guest OS 0x00000000 Page 1

Guest s Page Tables Are Invalid Guest OS page tables map virtual page numbers (VPNs) to physical frame numbers (PFNs) Problem: the guest is virtualized, doesn t actually know the true PFNs The true location is the machine frame number (MFN) MFNs are known to the VMM and the host OS Guest page tables cannot be installed in cr3 Map VPNs to PFNs, but the PFNs are incorrect How can the MMU translate addresses used by the guest (VPNs) to MFNs? 22

Shadow Page Tables Solution: VMM creates shadow page tables that map VPN à MFN (as opposed to VPNàPFN) Guest Page Table Physical Memory Virtual Memory 64 Page 3 48 Page 2 32 Page 1 16 Page 0 0 VPN PFN 00 (0) 01 (1) 01 (1) 10 (2) 10 (2) 11 (3) 11 (3) 00 (0) Shadow Page Table VPN MFN 00 (0) 10 (2) 01 (1) 11 (3) 10 (2) 00 (0) 11 (3) 01 (1) 64 Page 3 48 Page 2 32 Page 1 16 Page 0 0 Machine Memory 64 Page 3 48 Page 2 32 Page 1 16 Page 0 0 Maintained by the guest OS Invalid for the MMU Maintained by the VMM Valid for the MMU 23

Building Shadow Tables Problem: how can the VMM maintain consistent shadow pages tables? The guest OS may modify its page tables at any time Modifying the tables is a simple memory write, not a privileged instruction Thus, no helpful CPU exceptions :( Solution: mark the hardware pages containing the guest s tables as read-only If the guest updates a table, an exception is generated VMM catches the exception, examines the faulting write, updates the shadow table 24

More VMM Tricks The VMM can play tricks with virtual memory just like an OS can Balooning: The VMM can page parts of a guest, or even an entire guest, to disk A guest can be written to disk and brought back online on a different machine! Deduplication: The VMM can share read-only pages between guests Example: two guests both running Windows XP 25

Xen VM interface: CPU CPU Guest runs at lower privilege than VMM Exception handlers must be registered with VMM Fast system call handler can be serviced without trapping to VMM Hardware interrupts replaced by lightweight event notification system Timer interface: both real and virtual time

Details: CPU Frequent exceptions: Software interrupts for system calls Page faults Allow guest to register a fast exception handler for system calls that can be accessed directly by CPU in ring 1, without switching to ring-0/xen Handler is validated before installing in hardware exception table: To make sure nothing executed in Ring 0 privilege. Doesn t work for Page Fault

Xen VM interface: I/O I/O Virtual devices exposed as asynchronous I/O rings to guests Event notification replaces interrupts

Details: I/O 1 Xen does not emulate hardware devices Exposes device abstractions for simplicity and performance I/O data transferred to/from guest via Xen using shared-memory buffers Virtualized interrupts: light-weight event delivery mechanism from Xen-guest Update a bitmap in shared memory Optional call-back handlers registered by O/S

Details: I/O 2 I/O Descriptor Ring:

OS Porting Cost Number of lines of code modified or added compared with original x86 code base (excluding device drivers) Linux: 2995 (1.36%) Windows XP: 4620 (0.04%) Re-writing of privileged routines; Removing low-level system initialization code

Control Transfer Guest synchronously call into VMM Explicit control transfer from guest O/S to monitor hypercalls VMM delivers notifications to guest O/S E.g. data from an I/O device ready Asynchronous event mechanism; guest O/S does not see hardware interrupts, only Xen notifications

Event notification Pending events stored in per-domain bitmask E.g. incoming network packet received Updated by Xen before invoking guest OS handler Xen-readable flag may be set by a domain To defer handling, based on time or number of pending requests Analogous to interrupt disabling

Data Transfer: Descriptor Ring Descriptors are allocated by a domain (guest) and accessible from Xen Descriptors do not contain I/O data; instead, point to data buffers also allocated by domain (guest) Facilitate zero-copy transfers of I/O data into a domain

Network Virtualization Each domain has 1+ network interfaces (VIFs) Each VIF has 2 I/O rings (send, receive) Each direction also has rules of the form (<pattern>,<action>) that are inserted by domain 0 (management) Xen models a virtual firewall+router (VFR) to which all domain VIFs connect

Network Virtualization Packet transmission: Guest adds request to I/O ring Xen copies packet header, applies matching filter rules E.g. change header IP source address for NAT No change to payload; pages with payload must be pinned to physical memory until DMA to physical NIC for transmission is complete Round-robin packet scheduler

Network Virtualization Packet reception: Xen applies pattern-matching rules to determine destination VIF Guest O/S required to exchange unused page frame for each packet received Xen exchanges packet buffer for page frame in VIF s receive ring If no receive frame is available, the packet is dropped Avoids Xen-guest copies; requires pagealigned receive buffers to be queued at VIF s receive ring

Disk virtualization Domain0 has access to physical disks Currently: SCSI and IDE All other domains: virtual block device (VBD) Created & configured by management software at domain0 Accessed via I/O ring mechanism Possible reordering by Xen based on knowledge about disk layout

Disk virtualization Xen maintains translation tables for each VBD Used to map requests for VBD (ID,offset) to corresponding physical device and sector address Zero-copy data transfers take place using DMA between memory pages pinned by requesting domain Scheduling: batches of requests in round-robin fashion across domains

Evaluation

Microbenchmarks Stat, open, close, fork, exec, etc Xen shows overheads of up to 2x with respect to native Linux (context switch across 16 processes; mmap latency) VMware shows up to 20x overheads (context switch; mmap latencies) UML shows up to 200x overheads Fork, exec, mmap; better than VMware in context switches

VT-x : Motivation To solve the problem that the x86 instructions architecture cannot be virtualized. Simplify VMM software by closing virtualization holes by design. Ring Compression Non-trapping instructions Excessive trapping Eliminate need for software virtualization (i.e paravirtualization, binary translation). 42

CPU Virtualization with VT-x 43

VMX Virtual Machine Extensions define processor-level support for virtual machines on the x86 platform by a new form of operation called VMX operation. Kinds of VMX operation: root: VMM runs in VMX root operation non-root: Guest runs in VMX non-root operation Eliminate de-privileging of Ring for guest OS. 44

VT-x Pre VT-x Post VMM ring de-privileging of guest OS Guest OS aware its not at Ring 0 VMM executes in VMX root-mode Guest OS de-privileging eliminated Guest OS runs directly on hardware 45

VMX Transitions Transitions between VMX root operation and VMX non-root operation. Kinds of VMX transitions: VM Entry: Transitions into VMX non-root operation. VM Exit: Transitions from VMX non-root operation to VMX root operation. Registers and address space swapped in one atomic operation. 46

VM 1 VM 2 VM n VMX Non-Root Operation Ring 3 Ring 3 Ring 3 Ring 0 Ring 0 Ring 0 VM Entry VM Exit VMC S 1 VMC S 2 VMC S n VMX Root Operation vmlaunch / vmresume VMX Transitions Ring 3 Ring 0 47

VMCS: VM Control Structure Data structure to manage VMX non-root operation and VMX transitions. Specifies guest OS state. Configured by VMM. Controls when VM exits occur. 48

VMCS: VM Control Structure The VMCS consists of six logical groups: Guest-state area: Processor state saved into the gueststate area on VM exits and loaded on VM entries. Host-state area: Processor state loaded from the hoststate area on VM exits. VM-execution control fields: Fields controlling processor operation in VMX non-root operation. VM-exit control fields: Fields that control VM exits. VM-entry control fields: Fields that control VM entries. VM-exit information fields: Read-only fields to receive information on VM exits describing the cause and the nature of the VM exit. 49

CPU Virtualization with VT-x 50 Source: [2]

MMU Virtualization with VT-x 51

VPID: Motivation First generation VT-x forces TLB flush on each VMX transition. Performance loss on all VM exits. Performance loss on most VM entries Guest page tables not modified always Better VMM software control of TLB flushes is beneficial. 52

VPID: Virtual Processor Identifier 16-bit virtual-processor-id field in the VMCS. Cached linear translations tagged with VPID value. No flush of TLBs on VM entry or VM exit if VPID active. TLB entries of different virtual machines can all co-exist in the TLB. 53

Virtualizing Memory in Software Three abstractions of memory: 0 4GB Current Guest Process Guest OS Virtual Address Spaces 0 4GB Virtual RAM Virtual Frame Buffer Virtual Devices Virtual ROM Physical Address Spaces 0 4GB RAM Devices Frame Buffer ROM Machine Address Space 54

Shadow Page Tables VMM maintains shadow page tables that map guest-virtual pages directly to machine pages. Guest modifications to V->P tables synced to VMM V->M shadow page tables. Guest OS page tables marked as read-only. Modifications of page tables by guest OS -> trapped to VMM. Shadow page tables synced to the guest OS tables 55

Virtual CR3 Guest Page Table Guest Page Table Guest Page Table Real CR3 Shadow Page Table Shadow Page Table Set CR3 by guest OS (1) Shadow Page Table

Virtual CR3 Guest Page Table Guest Page Table Guest Page Table Real CR3 Shadow Page Table Shadow Page Table Set CR3 by guest OS (2) Shadow Page Table 57

Drawbacks: Shadow Page Tables Maintaining consistency between guest page tables and shadow page tables leads to an overhead: VMM traps Loss of performance due to TLB flush on every world-switch. Memory overhead due to shadow copying of guest page tables. 58

Nested / Extended Page Tables Extended page-table mechanism (EPT) used to support the virtualization of physical memory. Translates the guest-physical addresses used in VMX non-root operation. Guest-physical addresses are translated by traversing a set of EPT paging structures to produce physical addresses that are used to access memory. 59

Nested / Extended Page Tables 60

Nested / Extended Page Tables 61 Source: [4]

Advantages: EPT Simplified VMM design. Guest page table modifications need not be trapped, hence VM exits reduced. Reduced memory footprint compared to shadow page table algorithms. 62

Disadvantages: EPT TLB miss is very costly since guest-physical address to machine address needs an extra EPT walk for each stage of guest-virtual address translation. 63