LINUX-KVM
The need for KVM x86 originally virtualization unfriendly No hardware provisions Instructions behave differently depending on privilege context(popf) Performance suffered on trap-and-emulate CISC nature complicates instruction replacements Early approaches to x86 virtualization Binary translation (e.g. VMware) Execute substitution code for privileged guest code May require substantial replacements to preserve illusion CPU paravirtualization (e.g Xen) Needs modifications in the guest Hypervisor provides replacement services (hypercalls) Raised abstraction levels for better performance
What is KVM? Introduced to make the hardware extensions (Intel VT or AMD-V) to x86 available in user space Uses Linux as a bare metal hypervisor open source kvm.ko - loadable kernel module, that provides the core virtualization infrastructure part of mainline Linux. Modern hypervisors must do many things that are already done by OSs Scheduler, Memory management, I/O stacks KVM patch Driver for h/w virtualization extensions to x86 The driver adds a device file /dev/kvm which exposes virtualization functions to userspace Each VM is a process on the host; a vcpu is a thread in that process. All commands on a typical process can be executed
KVM+QEMU QEMU - open source machine emulator and virtualizer. Used with accelerators in the form of hypervisors such as KVM/Xen QEMU without using virtualization extensions Runs entirely in user-space using its built-in binary translator(tiny Code Generator) More overhead than using CPU virt extensions Inefficient and slow
General KVM Arch. VMs are created by opening a device node - open( /dev/kvm ) Guest has its own memory, separate from the userspace process that created it KVM API set of ioctl()s used to create and control VMs using FDs. System ioctls Query and set global attributes of the KVM system Create VMs (KVM_CREATE_VM) VM ioctls Query and set attributes of a particular VM To create vcpus for a VM (KVM_CREATE_VCPU) vcpu ioctls Query and set attributes to control a single vcpu (e.g. KVM_GET_REGS read GPRs from vcpu)
Guest Execution Loop KVM supports the privilege rings that are added by Intel-VT, AMD-V
Virtualizing the MMU 2 levels of indirection required gva-gpa-hpa (MMU can handle one) Shadow Page Tables (gva-hpa) no extra h/w support Starts empty; built incrementally as faults are reported to host Consistency b/w guest page table and shadow page table required - overhead Write protect guest memory pages that are shadowed by KVM Memory overhead due to shadow copying of guest page tables EPT/NPT hardware support EPT/NPT enabled MMU can translate 2 levels of indirection gva-gpa is maintained by the guest and gpa-hpa by KVM both the guest page tables and the nested page tables are exposed to the hardware Eliminates the need to maintain SPT and synchronize them Guest page table modifications need not be trapped, VM exits reduced TLB miss is very costly - for m level EPT and n level guest PT, it takes mn+m+n (2D structure) page references can be reduced by using large page sizes
Comparison paging Kernbench (cpu throughput) http://www.linux-kvm.org/images/c/c8/kvmforum2008%24kdf2008_21.pdf http://www.vmware.com/pdf/perf_esx_intel-ept-eval.pdf
Virtualization I/O Full device emulation no changes to guest required Complex & inefficient
Effect of full device emulation http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5708625
Virtualization I/O Para-virtualized device VirtIO Requires special guest drivers I/O emulation pushed into kernel instead of sys-calls from QEMU Direct I/O (pass-through) Guest is assigned a device completely Near-native speeds VM migration diff Addr translation req IOMMU to validate DMA req from device SR IOV
Comparison I/O
Managing KVM - Libvirt Monitoring and managing guests through libvirt API Each host runs the libvirt daemon, which provides secure remote management APIs The libvirt daemon maintains guest configurations across reboot and is the central point for setting up networking and storage pools. virsh command line interface virt-manager - graphical tool Cloud-stacks can be used for DC and cloud mgmt which also integrate with libvirt Enables cloning, migration, and overcommitting
THANK YOU!