Changpeng Liu, Cloud Software Engineer Piotr Pelpliński, Cloud Software Engineer
Introduction to VirtIO and Vhost SPDK Vhost Architecture Use cases for vhost Benchmarks Next steps
QEMU VIRTIO Vhost (KERNEL) vhost (USERSPACE)
Virtio Guest VM (Linux*, Windows*, FreeBSD*, etc.) virtio front-end drivers virtio back-end drivers device emulation virtqueue Hypervisor (i.e. QEMU/KVM) Paravirtualized driver specification Common mechanisms and layouts for device discovery, I/O queues, etc. virtio device types include: virtio-net virtio-blk virtio-scsi virtio-gpu virtio-rng virtio-crypto 5
QEMU VirtIO SCSI QEMU I/O Processing AIO Guest VM Application Guest kernel virtqueue Kernel 1. Add IO to virtqueue 2. IO processed by QEMU 3. IO issued to kernel 4. Kernel pins memory 5. Device executes IO 6. Guest completion interrupt 6
QEMU VIRTIO Vhost (KERNEL)
Vhost Guest VM (Linux*, Windows*, FreeBSD*, etc.) virtio front-end drivers virtqueue Separate process for I/O processing vhost protocol for communicating guest VM parameters memory number of virtqueues virtqueue locations virtio back-end drivers device emulation vhost vhost Hypervisor (i.e. QEMU/KVM) vhost target (kernel or userspace)
Kernel VHOST QEMU Guest VM Application Guest kernel virtqueue 1. Add IO to virtqueue 2. Write virtio doorbell 3. Wake vhost kernel 4. Kernel pins memory 5. Device executes IO 6. Guest completion interrupt Kernel kvm vhost-kernel AIO 9
QEMU VIRTIO Vhost (KERNEL) vhost (USERSPACE)
SPDK VHOST Architecture QEMU SPDK vhost Guest VM virtio-scsi vhost eventfd UNIX domain socket virtio-scsi DPDK vhost virtqueue Shared Guest VM Memory Host Memory 11
SPDK VHOST QEMU Guest VM Application Guest kernel virtqueue 1. Add IO to virtqueue 2. Poll virtqueue 3. Device executes IO 4. Guest completion interrupt Kernel SPDK Vhost kvm vhost i/o 12
JSON RPC SPDK VHOST Layers vhost DPDK rte_vhost SCSI scsi/bdev Translation Block Device Layer (bdev) QEMU NVMe bdev driver malloc bdev driver Ceph RBD bdev driver Linux AIO bdev driver 13
COMPARISON with existing solutions QEMU VIRTIO SCSI Target VHOST Kernel Target VHOST Userspace Target QEMU Guest VM Guest Kernel VIRTIO_SCSI QEMU Guest VM Guest Kernel VIRTIO_SCSI QEMU Guest VM Guest Kernel VIRTIO_SCSI VIRTIO_SCSI_PCI VHOST_SCSI_PCI IOCTL VHOST_USER_SCSI_PCI SOCKET Host Kernel NVME_MOD Host Kernel VHOST LIO NVME_MOD SPDK VHOST VHOST_USER SCSI PMD_NVME 14
VM EPHEMERAL STORAGE VM SPDK Vhost SPDK Increased efficiency yields greater VM density SCSI BDAL Blob Bdev Blobstore BDAL NVMe Bdev NVMe Driver Intel SSD for Datacenter 16
VM Remote Storage VM SPDK Vhost SCSI SPDK Enable disaggregation and migration of VMs using remote storage BDAL NVMe-oF BD NVMe-oF Initiator NVMe-oF Target Intel SSD for Datacenter 17
VM CEph Storage VM SPDK Vhost SPDK Potential for innovation in data services SCSI Cache Deduplication BDAL Ceph Bdev Ceph RBD Driver Ceph Cluster Intel SSD for Datacenter 18
Cores IO per second Benchmarks 16 14 12 10 1200000 1000000 800000 8 600000 6 4 400000 2 200000 0 QEMU VirtIO Vhost Kernel SPDK Vhost 0 QEMU VirtIO Vhost Kernel SPDK Vhost VM IO Processing IO per second System configuration: 44x Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz (HT off); Cores per socket: 22; 8x Samsung 8GB DDR4 @2400 12x Intel SSD DC P3700 Series 1,5T @ FW 8DV101H0 DPDK: 17.02; Host Dist/Kernel: Fedora 25/Kernel 4.8.15-300; Guest Dist/Kernel: Ubuntu 16.04/Kernel 4.4.0-59-generic, mq enabled; Fio ver: fio-2.2.10; Fio workload: blocksize=4k, iodepth=512, iodepth_batch=128, iodepth_low=256, ioengine=libaio, size=10g, ramp_time=10, group_reporting, thread, numjobs=1, direct=1, rw=randread 20
Next Steps VFIO Support Support for vhost-blk protocol Live migration Performance tuning, including: multiqueue completion event coalescing Integration software 22
Big improvement in VM I/O efficiency Unmodified guest VMs Add data services and add value
Notices and Disclaimers Intel technologies features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. No computer system can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance. Intel, the Intel logo, Xeon, and others are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 2017 Intel Corporation.