Virtualization, Xen and Denali Susmit Shannigrahi November 9, 2011 Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 1 / 70
Introduction Virtualization is the technology to allow two or more OSes one one hardware. Virtualization = Abstraction. More powerful computers mean less utilization. Virtualization is not new, it dates back to the 1960s. CP-40 for S/360 was the first operating system that implemented complete virtualization. The goal was to test other OSes for S/360. S/360-370 mainframes are still used. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 2 / 70
Why Save hardware cost. Save maintainance and running costs. Properly utilize hardware. Get best of all OSes. Maintain legacy software. Good testing platform. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 3 / 70
Types Too many keywords and terminology Can be categorized into three main categories: Full Virtualization Hardware assisted Para Virtualization Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 4 / 70
Terminology Hypervisor / Virtual Machine Monitor Ring 0..4 Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 5 / 70
Normal OS Execution Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 6 / 70
Bare Metal Virtualization Note that there is no underlying OS. Hypervisor communicates directly with the hardware. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 7 / 70
Bare Metal Virtualization - I/O - I Virtual machines share the I/O devices: ethernet, hard drives etc. Hypervisor must have a low-level drivers to communicate with the devices. Hypervisor must emulate each shared devices. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 8 / 70
Bare Metal Virtualization - I/O - II Assign individual devices to specific virtual machines. Called partitioning. I/O devices can be accessed directly from VMs using native drivers. Less intervention by VMM improves I/O performance. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 9 / 70
Hosted Virtualization Each virtual machine has access to limited I/O devices. Host provides an emulated view of actual hardware. Some hardware can not be emulated, generic hardware is offered. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 10 / 70
Hosted Virtualization I/O is pretty complex in here. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 11 / 70
Hosted Virtualization - Benefits and Drawbacks Benefits: Ease of installation and configuration. Run on almost all hardware. Drawbacks: No RTOS. Performance penalty. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 12 / 70
Full Virtualization - I Insert a Vitual Machine Manager between hardware and OS. VMM is common for all OSes running on that hardware. In this approach, VMM is directly installed on bare metal. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 13 / 70
Full Virtualization - Benefits and Drawbacks Benefits: Improved I/O performance. Supports real time OSes because deterministic performance is possible. Can run generic and RTOS is parallel. Drawbacks: More drivers need to be written and incorporated in VMM. More difficult to install and configure. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 14 / 70
Hardware Assist Instead of implementing the VMM in software, move it to hardware. Directly trap system calls and send to hardware. Need newer processors -Intel VT-x or AMD-V. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 15 / 70
Hardware Assist - Benefits and Drawbacks Benefits: Closer to hardware, better performance. Drawbacks: Implemented in Hardware, little room for flexibility. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 16 / 70
Paravirtualization Software interface to virtual machines pretty close(not exact) to underlying hardware. Paravirtualization provides specially defined hooks for tasks. The host operating system is modified and ported for para-api. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 17 / 70
Virtualization Comparison Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 18 / 70
References http://www.vmware.com/files/pdf/vmware_ paravirtualization.pdf http://zone.ni.com/devzone/cda/tut/p/id/8709 http://en.wikipedia.org/wiki/hardware_virtualization Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 19 / 70
Xen and the Art of Virtualization Pal Barham et. al. SOSP 03 Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 20 / 70
Xen and requirements for concurrent performance Xen - High performance resource managed virtual machine monitors. VMs isolated form one another. Commodity machines running commodity OSes. Performance overhead of virtualization should be low. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 21 / 70
Features Some source modifications needed for Xen. Dynamic instantiation of OS. Multiple OS on a shared system is the easiest way to go. System administration is a nightmare for such a complex system. Not enough isolation. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 22 / 70
Isolation Support for isolation at the operating system level. Multiplexes physical resources on per OS basis. Reduces interference between OSes. A bit more heavyweight. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 23 / 70
Xen- approach Unmodified binary support needed. Full OS support needed. But full virtualization was never supported on X86. Workarounds possible, but complex. Solution: Paravirtualization. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 24 / 70
VM interfaces Memory management CPU I/O devices Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 25 / 70
Memory Management Most difficult part. Hypervisor support and modifications in each guest OS. X86 does not have a software managed TLB. Complete TLB flush for each OS. In Xen, guest OSes are responsible for hardware page tables. Xen acts as a proxy to page table/tlb. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 26 / 70
CPU Generally, OSes are the most privileged entity. In virtualization, they should have lower privilege than VMM. Guest OSes run on lower privileged level of apps. Privileged instructions are paravirtualized, validated and executed via xen. Exceptions are registered with Xen, which creates a copy of exception stack. Safety is ensured by validating exception handlers. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 27 / 70
Device I/O Abstraction, not emulation. All I/O via Xen, shared memory, asynchronous. High performance, Xen does validation. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 28 / 70
LOC cost for porting OSes Linux - 2995 XP - 4620 Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 29 / 70
Control and Management Policies are separate from mechanisms. Policy decisions are in software on guestoses. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 30 / 70
Control and Data transfer Domain to Xen => hypercall. Xen to domain => asynchronous event notifications. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 31 / 70
Subsystem Virtualization - CPU Scheduling Borrowed Virtual Time Scheduling. With BVT scheduling, thread execution time is monitored in terms of virtual time. It dispatches the runnable thread with the earliest effective virtual time (EVT). However, a latency-sensitive thread is allowed to warp back in virtual time. It borrows time from CPU to gain preference. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 32 / 70
Subsystem Virtualization - Time Guest OSes are provided with real time, virtual time, wall-clock time. Guest OS can have two timers, for real and virtual timers. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 33 / 70
Subsystem Virtualization - Physical Memory Initial allocation at the time of creation. Statically partitioned between guests. Max reservation can also be specified. Memory allocation is sparse. Physical to logical mapping is on GuestOS. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 34 / 70
Network Xen provides Virtual Firewall Router. Each VM has virtual interfaces connected to the VFR. For transmitting a packet, GuestOS en-queues a buffer descriptor on the transmit ring. Xen copies the descriptor and header. Transmission using scatter gather DMA. Received packets directly written to a page frame (frame exchange). Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 35 / 70
Disk VMs access persistent storage through virtual block devices. Only host OS have unchecked access to hardware. Xen uses RR between competing VMs in batch. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 36 / 70
Performance Evaluation Dell 2650 2.4GHz Xeon Server 2GB Ram Broadcom 3Gbps NIC Hitachi 146GB 10K RPM SCSI Disk. Linux 2.4.21 i686 kernel. RedHat 7.2 on ext3 file system. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 37 / 70
Relative performance SPEC INT = CPU scheduling, OSDB = DB benchmark (postgres), dbench = FS, WEB99 = web hosting (apache 1.3) Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 38 / 70
Network Performance 400 MB transfer. Socket buffer size = 128Kb. Uses ttcp. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 39 / 70
Scalability Target 100 VMs. RH7.2 clients. Reservation of 64MB on boot. Minimize page usage reduces memory footprint to 6.2 MB. Swap allowed reduction to 4.2 MB. 20kB swap per domain. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 40 / 70
Optimizing Network Virtualization in Xen Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 41 / 70
Goal Retain Xen architecture. Redefine virtual network interfaces. Optimize implementation of data transfer path between guest and driver domains. Support for Guest OS to effectively utilize advanced virtual memory. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 42 / 70
Network I/O architecture in Xen Host OS uses native drivers to access I/O directly. GuestOSes use virtual interfaces. Zero copy data transfer between backend and virtual interfaces. Remaps physical page into target domain. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 43 / 70
Stock performance - Xen Much higher overhead. 60% time for transfer from physical to virtual. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 44 / 70
Virtual Interface Optimization - New I/O architecture Checksum offloading, scatter-gather I/O, TCP segmentation offloading. Guest can transmit network packets much larger than network MTU. Offload driver takes care of functionality not supported by NIC. For the functionality supported by NIC, hand packets over to it. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 45 / 70
Advantages Reduce network processing overhead by handing some overhead to NIC. Most of the overhead is logical to physical interface transfer. Bulk transfer reduces per-packet overhead. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 46 / 70
I/O channel optimization 30-40% execution time spent in Xen VMM. Page remapping, ownership transfer. Three address remap and two memory allocation fo each packet receive. Two remap for each transmit. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 47 / 70
Alternate approach - Transmit path optimization I/O channel augmented with out-of-band header channel using shared pages. Guest passes the headers to the driver. Driver examines the header and if possible, constructs the packet from unmapped fragments. Network driver uses only physical addresses, so unmapped page passing is safe. NIC must support gather DMA. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 48 / 70
Alternate approach - Receive path optimization Xen uses page remapping to avoid data copy. Multiple small packets : Copy is cheaper. Even all transfer by data copy results in small improvement. Data copy is done by using shared memory. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 49 / 70
Virtual memory modifications Introduces super page mapping (contiguous virtual address to physical address), global page table mapping. Modified Xen and Guest OSes to support these. Improves performance. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 50 / 70
Evaluation - Transmit Xen, Xen-driver, Xen-driver-opt achieve 3760Mb/S. CPU utilization 40, 46, 43%. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 51 / 70
Evaluation - Receive Native: 2508Mb/s, Xen-driver:1738Mb/s, Xen-driver-opt: 2343Mb/s. Guest: Xen: 820Mb/s, Xen-Opt: 970Mb/s. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 52 / 70
Evaluation - High Level Interface Shows the overheads (in million CPU cycles/second) High level interfaces are useful. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 53 / 70
Reduced overhead. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 54 / 70 Evaluation - I/O channel optimization I/O channel with high level interfaces. Improves performance from 2794Mb/s to 3239Mb/s.
Denali: Lighweight Virtual Machines for Distributed and Networked Applications Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 55 / 70
High level picture Many independent, untrusted servers on a single machine. Aggressive use of par-virtualization. Should scale up to order of magnitude compared to existing VMs. Prototype VMM, prototype guest OS. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 56 / 70
Lightweight protection domain Strong isolation Scales to many protected domains Rapid swapping of services. Sharing across domains is infrequent. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 57 / 70
Denali - VMM Isolation: The unit of protection is services, not users. No sharing if not over network. Data private to VM. File system, network stacks on guest OS. Each VM has private namespaces. No global namespace. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 58 / 70
Denali - performance isolation Fairness in resource sharing across services. Does not provide any performance guarantee. Virtual hardware devices within the VMM act as queues of resources. Exposes hardware resources to all VMs. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 59 / 70
Denali - Design Uses para-virtualization for scaling. No idle loop for VMs, idle instruction similar to sleep instead. VM remains unscheduled after idle instruction. Second issue is virtual interrupt dispatching. Asynchronous interrupt dispatch mechanism, interrupts are queues until VM runs. Interrupt means something happened, not just happened. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 60 / 70
Para-virtualization in Denali Enables Denali to remove unnecessary components. Does not expose a virtual memory hardware. All VMs use single address space. Exposes a small number of generic I/O devices. Currently supports an NIC, timer, console, keyboard. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 61 / 70
Yakima VMM Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 62 / 70
Yakima VMM OSKit for hardware abstraction. RR CPU at core. Emulates ethernet subnet. Allocates static physical memory. Supervisor VM for booting the VMM. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 63 / 70
Ilwaco guest OS Based on FreeBSD. Uses BSD kernel. Console I/O via printf, scanf. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 64 / 70
Benchmarks 1700MHz P4 with 256KB L2 cache. 1GB Ram. Intel Gigabit Ethernet card. 1500 bytes of MTUs. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 65 / 70
Context switch time Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 66 / 70
Packer processing overhead 100 and 1400 bytes UDP packets. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 67 / 70
TCP/IP performance Latency increases with number of machines. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 68 / 70
Discussions Xen and Denali has different goals. Both have their design trade-offs. Xen is well established while Denali is more of a proof of concept. In either cases, performance improvements are active area of research. Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 69 / 70
Thank You Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 70 / 70