KVM 在 OpenStack 中的应用 Dexin(Mark) Wu
Agenda Overview CPU Memory Storage Network
Architecture Overview nova-api REST API nova-scheduler nova-conductor nova-compute DB RPC Call libvirt driver libvirt Cinder QMP Monitor qemu driver Neutron Qemu/KVM Storage disk driver tap Network switch router
Agenda Overview CPU Memory Storage Network
KVM/Qemu Model Each vm is a qemu process Each vcpu is a qemu thread Reuse Kernel facilities
Cgroup Weight quota:cpu_shares No hard limit Bandwidth Control quota:cpu_period quota:cpu_quota Can't exceed 'quota' ms in a period Gold Root 3072 2048 Silver A B C D 1024 1024 1024 1024 30% 30% 20% 20% 60% 40% 100%
CPU topology cpu0 cpu1 cpu2 cpu3 cpu8 cpu9 cpu10 cpu11 L1 I L1 D L2 Core 0 L1 I L1 D L2 Core 1 L1 I L1 D L2 Core 0 L1 I L1 D L2 Core 1 cpu4 cpu5 cpu6 cpu7 cpu12 cpu13 cpu14 cpu15 L1 I L1 D L2 Core 2 L1 I L1 D L2 Core 3 L1 I L1 D L2 Core 2 L1 I L1 D L2 Core 3 L3 L3 Local Socket 0 Remote Remote Socket 1 Local Memory Memory NUMA Node 0 NUMA Node 1
vcpu topology Benefit Remove licensing restrictions Improve performance by working with vcpu pinning Implemented in Juno * hw:cpu_sockets=nn - preferred number of sockets to expose to the guest * hw:cpu_cores=nn - preferred number of cores to expose to the guest * hw:cpu_threads=nn - preferred number of threads to expose to the guest * hw:cpu_max_sockets=nn - maximum number of sockets to expose to the guest * hw:cpu_max_cores=nn - maximum number of cores to expose to the guest * hw:cpu_max_threads=nn - maximum number of threads to expose to the guest
vnuma Benefit increase the effective utilization of compute resources Implemented in Juno virt-driver-numa-placement.rst * hw:numa_nodes=nn - num of NUMA nodes to expose to the guest. * hw:numa_mempolicy=preferred strict - memory allocation policy * hw:numa_cpus.0=<cpu-list> - mapping of vcpus N-M to NUMA node 0 * hw:numa_cpus.1=<cpu-list> - mapping of vcpus N-M to NUMA node 1 * hw:numa_mem.0=<ram-size> - mapping N GB of RAM to NUMA node 0 * hw:numa_mem.1=<ram-size> - mapping N GB of RAM to NUMA node 1 Qemu and libvirt dependencies -object memory-ram,size=1024m,policy=bind,host-nodes=0,id=ram-node0 \ -numa node,nodeid=0,cpus=0,memdev=ram-node0
Other Features vcpu Pinning Approved in Kilo: virt-driver-cpu-pinning.rst Dedicated CPU Forbid overcommit of CPU vcpu hotplug 'live-resize' proposed, but not approved yet. virsh command setvcpus domain count live Auto oneline new vcpu in guest udev rule Guest agent
Agenda Overview CPU Memory Storage Network
Physical memory virtualization Guest physical memory is mapped into qemu virtual address space Mapping is maintained in memory slots Qemu use malloc or mmap to allocate memory Reuse kernel memory feature Overcommit Hugepage KSM
Memory Hugepage Approved in Kilo: virt-driver-large-pages.rst Benefit increase TLB hit ratio less page table footprint Why not THP? No hard guarantees
Memory Balloon (1) Memory Overcommit Guest Guest Guest Qemu Qemu Qemu Inflate Deflate Balloon device is added by default Missing Overcommit Manager
Memory Balloon (2) Guest Memory Stats Query More detailed and accurate Re-enabled by polling instead of asynchronous Not real time Nova support available in Juno Guest CONF.libvirt.mem_stats_period_seconds Ceilometer support available in Kilo Balloon Thread fetch last update synchronously Polling Client Memory Stats Qemu
Memory Hotplug Added in qemu 2.1 Libvirt support is under development Qemu commands (qemu) object_add memory-backend-ram,id=ram1,size=1g (qemu) device_add pc-dimm,id=d1,memdev=ram1 Auto online via udev SUBSYSTEM=="cpu", ACTION=="add", TEST=="online", ATTR{online}=="0", ATTR{online}="1" SUBSYSTEM=="memory", ACTION=="add", TEST=="state", ATTR{state}=="offline", ATTR{state}="online"
Agenda Overview CPU Memory Storage Network
Storage Architecture Frontend IDE, SCSI, Virtio Image format Raw, Qcow2, QED, VMDK Backend File, host, ceph, glusterfs, sheepdog, iscsi
Cache Mode Cache Mode Host Page Cache Guest Disk Cache Semantics none No Yes direct directsync No No direct+ flush writeback Yes Yes writethrough Yes No writeback + flush nosafe Yes Yes, but flush is ignored writeback - flush Configuration direct = O_DIRECT flush = fdatasync or fsync disk_cachemodes= file=directsync,block=none Is writeback safe? data lost on power failure data corruption Guest FS barrier Live migration
I/O throttling Why not cgroup? Exposed by Cinder qos spec Currently missing online update support New version qemu re-implements throttling based on leaky bucket Support burst Missing cluster-level I/O throttling
Discard Return freed blocks to the storage Two underlying specifications ATA TRIM Command SCSI UNMAP Nova configuration hw_disk_discard=unmap Image metadata hw_scsi_model=virtio-scsi Issued from guest fstrim, mount option '-o discard' Supported in file,qcow2,rbd,glusterfs,sheepdog,iscsi
Virtio SCSI vhba Improve scalability Enable advanced SCSI features Recognized as 'sda', not vda vhost-scsi Better performance No format driver support Disallow live migration
Other features Snapshot quiesced-image-snapshots-with-qemu-guestagent.rst driver-mirror storage live migration Multi-queue virtio-disk
Agenda Overview CPU Memory Storage Network
Network Vhost-net Less context switch Zero-copy transmit Vhost-net + macvtap +sriov Live migration Multi-queue virtio NIC Scale performance with vcpu increase Vhost-user Approved in Kilo Userspace equivalent of vhost-net Used with userspace switch
Reference http://www.slideshare.net/meituan/kvmopt-osforce-27669119 http://www.linux-kvm.org/wiki/images/7/7b/kvm-forum-2013-openstack.pdf http://www.linux-kvm.org/wiki/images/f/f6/01x07a-vhost.pdf http://www.virtualizemydc.ca/2014/01/26/understanding-vnuma-virtual-non-uniform-memory-access/ http://www.searchtb.com/2012/12/%e7%8e%a9%e8%bd%accpu-topology.html http://www.virtualopensystems.com/en/solutions/guides/snabbswitch-qemu/ http://log.amitshah.net/wp-content/uploads/2014/11/virt-6-7-centos-dojo.pdf
Thanks! Email: wudx@awcloud.com