Virtual machine architecture and KVM analysis D97942011 陳彥霖 B96902030 郭宗倫 Virtual machine monitor serves as an interface between hardware and software; no matter what kind of hardware under, software can ignore the dependency with specific machine and just use the hardware as resource based on support of VMM. In addition to serves as interface, VMM should also schedule, synchronize VMs and manipulates memory among them, with more and more diverse OSs today, VMM becomes heavier and heavier. CPU virtualization To maintain the interface and to give each VM the illusion that it s running on a real machine, VMM should be able to trap those unprivileged operation from VM to hardware, because when application sends privileged operation( ex : IO request ), the guest OS in VM at ring 3 will want to directly contact with device, which makes hardware raise an interrupt and cause a fault for the unprivileged operation from ring3, the system crashes; That s why VMM is needed. Here is the one of the typical approaches : detecting VM has sent an unprivileged operation, CPU traps it and notify VMM, which disables the interrupt of that VM and handles the operation. Work is done based on assumption that CPU
provides trap semantics for VMM to use the CPU to execute VM; such CPU is called virtualizable. However, not all CPU architecture does provide such semantics, thus other techniques such as paravirtualization, fast binary translation, hardware assisted virtualization, and CPU emulation appear. Memory virtualization Each VM has its own page table, which is mapped to host physical memory of real machine and managed by shadow page table in VMM; when traditional virtual memory mechanism in each guest OS maps virtual memory to physical memory, VMM uses shadow page table to distributes pages of real memory to each VM as physical memory. VMM also does some jobs to raise the performance, such as KSM KSM, Kernel same-page merging : Each VM has its own pages, when two VMs have identical pages KSM merges the two pages into one, which is shared by the two VMs IO virtualization Guest OS sends IO operation to virtual IO device, which could actually be 1) channel processor that manipulates hardware directly, 2) drivers in host OS that helps to handle the operation from VM, 3) drivers in VMM. 1) Channel processor Lowest overhead, VMM trusts channel processor and maps it into VM, allows VM directly read write devices. VM devices(channel processor does those operation) 2) VMM traps IO operation and asks host OS to do the operation. With most compatibility for generally host OS has drivers of most IO devices, but high overhead VM VMM host OS devices
Xen has a special privileged VM called dom0 to manage the other VMs( create, destroy, migrate, save, restore )( domu ) and control IO assignment to devices. CPU scheduling and memory access are controlled by hypervisor. devices. Hypervisor sends IO request from domu to dom0, which is privileged to contact Because the work is done by VMM and dom0, which still includes an entire OS, the hypervisor is very heavy. VM VMM dom0 devices 3)
The VM sends IO operation down which is either trapped, or directly translated and sent to VMM, VMM then processes them with device itself. Heavy VMM containing all drivers needed; such VMM has low security. VM VMM devices IO performance is decided by number of context switches, with examples above we could know that more components involved to deal with IO operation, lower performance is gained. To regain reduced overhead and ease implementation, devices such as x86 PC keyboard controller and IDE disk controller are giving way to old channel-like IO device design, in this way, the virtual machine s device drivers will be able to communicate directly with the I/O device without the overhead of trapping into the VMM. Approaches while CPU is unvirtualizable Binary translation Unable to trap the unprivileged operation, so the hypervisor scans the memory of VM and rewrites all unprivileged instruction; VMware implements this technique. The guest OS is unaware of the translation so no modification is needed, though the implementation is complex but it already gains much more performance than full emulating CPU. Paravirtualization Guest OS is modified ( ex : the driver part ) so that it will contact hypervisor every time before it wants to start an unprivileged operation; such guest OS is aware of hypervisor and thus provides better scheduling and IO work. Xen and KVM apply paravirtualization. Hardware assisted virtualization In intel s VT-X and AMD s AMD-V, new ISA is provided and CPU is added a new mode different from user mode and kernel mode, called the guest mode. When
process running in guest mode starts an unprivileged operation, CPU could trap those instructions and return control to VMM, instead of just raising an interrupt and causing a fault. Kernel-based virtualization machine, KVM IO virtualization at KVM KVM applies hardware assisted virtualization, VM runs non-io code in guest mode, when guest OS sends unprivileged operation it s trapped by CPU, which switches into kernel mode and checks : if there s an external event ( shadow page table fault or external interrupt ) then kernel resume guest code after handling it; if it s an IO operation then kernel returns to user mode and initiates the IO operation on behalf of guest code.
CPU virtualization at KVM There are many components that a hypervisor requires in addition to the ability to virtualize the CPU and memory, for example: a memory manager, a process scheduler, an I/O stack, device drivers, a security manager, a network stack, etc. In fact a hypervisor is really a specialized operating system, differing only from it's general purpose peers in that it runs virtual machines rather than applications. Since the Linux kernel already includes the core features required by a hypervisor and has been hardened into an mature and stable enterprise platform by over 15 years of support and development it is more efficient to build on that base rather than writing all the required components such as a memory manager, scheduler, etc from the ground up. With traditional approach to include all necessary components ( OS, drivers ) into VMM, or provide an environment for VMM to work or agent to help VMM ( such as the image above, a special VM is used to help VMM handle IO operation from the other VM, which is applied by VMware and Xen ), KVM chooses another way to does the job.
Don t want to reinvent the wheel and also reject to include all necessary components into VMM, KVM implements VMM as one loadable module from linux OS, which is already very good at manging memory, scheduling processes, or IO processing; KVM also views each VM as a process so that all work at VMM goes to linux OS, every virtual machine is viewed as a regular Linux process scheduled by the standard Linux scheduler. Its memory is allocated by the Linux memory allocator, with its knowledge of NUMA and integration into the scheduler.
References 1 Virtual Machine Monitors: Current Technology and Future Trends by Mendel Rosenblum and Tal Garfinkel 2 The Architecture of Virtual Machines by James E. Smith and Ravi Nair 3 IO for Virtual Machine Monitors : security and performance issue by Paul A. Karger and David R. Safford 4 KVM : the Linux Virtual Machine Monitor by Avi Kivity, Yaniv Kamay, Dor Laor, Uri Lublin, and Anthony Liguori 5 KVM : kernel based virtual machine by red hat 6 A survey on virtual machine security by Jenni, Susan, and Reuben 6 KVM : kernel-based virtualization driver by white paper from Qumranet 7 Xen and the art of virtualization by Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield