Software and hardware support for Network Virtualization part 2 Knut Omang Ifi/Oracle 20 Oct, 2015 32 Overview Introduction to virtualization (Virtual machines) Aspects of network virtualization: Virtual network infrastructure, interfaces, adapters Network interface attach points (PCI, PCIe) Software emulation of a network interface Paravirtualized network interfaces Hardware support for sharing a network adapter (SR/IOV) Use cases, challenges, risks and tradeoffs 33
PCI (Peripheral Component Interconnect) DMA (Direct Memory Access) support for devices New, more compact physical design Standardized, extensible software interface! 3 Address space types: Config space I/O ports (ISA compat++) Memory mapped I/O (MMIO) Config space has standardized layout, standardized semantics 34 DMA (Direct memory access) DMA engines in the PCI infrastructure DMA engines on each device Typically programmed via registers accessible from BAR space Uses DMA addresses read, write, atomic (PCIe 3.0) In simple systems (x86, older x86_64): DMA addr == physical addr Modern x86_64: IOMMU/DMAR (later) Can in principle write almost everywhere in memory! 35
Communication between driver and PCI device device driver MMIO read/write req ordinary memory read/write happens transparently to driver/cpu interrupt handler for the requested interrupt invoked MMIOread/ write resp DMA memory read req DMA memory read resp interrupt from device performs side effect of write/read if any.. read a request from a queue in memory, handle it, then write response something happened that needs attention from driver (ex. DMA finished..) 36 PCI capabilities Linked list of information describing extra capabilities, such as: Message Signaled Interrupts (MSI, MSI-X) Power management PCI Express Example from lspci -vvv:... Capabilities: [40] MSI-X: Enable+ Count=2 MaskedVector table: BAR=1 offset=00000000 PBA: BAR=1 offset=00000800 37
PCI Express Most PCs still running today has this Software compatible with PCI w/extensions: PCI Express a PCI capability Extended config space: 256 byte 4096 byte Extended capabilities: New capability list Completely different hardware: Different physical interfaces Serial, point-to-point May define a hierarchy of domains and switches 1,4,8 or 16 lanes, different speeds.. 38 A PCI Express based system 39
PCI Express x1 and x16 vs PCI 40 PCI Express capability 41
PCI Express Ethernet 42 Implementing an emulated device Device emulation and driver code runs in the same process Access to config and BAR spaces through traps: memory protection signal handler in emulation code I/O threads for DMA Signals for interrupts Benefit: Can use existing OS driver in guest, no modification necessary Drawback: Performance, must implement irrelevant hardware features to satisfy driver. 43
Paravirtualized I/O support Use existing framework: PCI Implement a new device type Ex. virtio: shared memory queues between hypervisor and guest optimized for the virtualization scenario: reducing copying limit amount of traps common transport for several driver types Benefit: performance Drawback: Guest OS must be aware (virtio drivers must be installed in guest) Still some software overhead compared to bare metal 44 Paravirt example: Virtio based Ethernet (Qemu) 45
Device assignment (device passthrough) A system can have multiple devices of each type Can we dedicate a device to a specific virtual machine? Device description and access passed through to guest Guest loads a driver for that device and runs happily Bare metal performance, no software overhead? Great, simple idea, any but's? 46 Device assignment (device passthrough) Config space: Device numbers, BAR addresses? DMA: Addressing: GPA!= HPA Memory overcommit: memory of VM might be on disk Security: A device can (in principle) write everywhere Anywhere in global memory... Manipulate other devices? Moving a VM with passed through device? Interrupts: Security: Denial of service attack from a VM? Routing - traps required Need a lot of devices if many VMs - enough PCIe slots? 47
IOMMU (I/O Memory Management Unit(s)) Extra level of protection and translation between I/O device and memory Intel calls this DMAR (DMA Remapping) units Vt-d on Intel, AMD-Vi on AMD Allows device to use GPA Protects memory against malicious driver code in guest Also interrupt remapping 48 How to deal with memory overcommit Disallow Memory used for DMA must be pinned Worst case: All memory for a guest must be pinned x86: A guest OS might not care to tell what memory is used for DMA Hardware to handle page faults PCI Express extended capability PRI (Page Request Interface) Few devices implement it (yet..) 49
Cross access to other devices? A device can (in principle) DMA into another device's BAR space(s) Depends on PCIe bridges and switches A bridge may support ACS (Access Control Services) Linux with VFIO uses a concept of IOMMU groups A group consists of all devices that are considered within the same domain If two devices can access each other without limitations, they are within the same domain 50 ACS (Access Control Services) Optional PCI Express extended capability Describes how a bridge/switch handles cross access 51
Device assignment: Sharing devices? SR/IOV - Single Root I/O Virtualization PCI Express extension A physical device may support a number of virtual functions Number of active virtual functions can be configured dynamically Programmable in config space via PCI Express extended capability A virtual function may be assigned to a VM all functions still shares resources, but implemented in hardware 52 SR/IOV PCIe extended capability 53
Can the IOMMU become a bottleneck? Potential issues: Page table memory Number of entries in IOMMU's caches (sizing problem in chipsets) Pure translation performance? Solutions: Use big pages for GVA to GPA Continuous memory allocator, huge pages, Get help from devices ATS (Address Translation Support) 54 ATS (Address Translation Support) PCI Express Extended capability Allows device to optionally aid and offload the IOMMU Protocol for communication between IOMMU and device Request a translation Pre-translated DMA request from device (tag'ed to bypass IOMMU) Invalidation protocol: IOMMU sends request to device to invalidate Security implications? 55
Use cases for virtualization Server consolidation Cloud services Software appliances Emulating unavailable platforms Development and testing Demonstration and showcasing... 56 How does a VM communicate? Mostly out on the network (with external hosts..?) Mostly with other VMs on the same server A mix.. How will that affect performance? Passthrough vs Software only 57
Moving VMs around (VM migration) Avoid downtime Live migration? Move a running VM Copy (some) state while machine is running minimize delay when execution is moved Live migration and network interfaces emulate the same hardware on the new machine copy state? what about network state (packets on the wire, addressing, routing..) what if device was passed through? 58 VM performance on a NUMA machine? Locality between CPU/core/thread and memory CPU affinity Cache affinity Passthrough: CPU closeness to device In cases of contention: Which VM to move? How to detect? 59
Security. 60 Summary Goal: Understanding some of the challenges and trade-offs in providing fast network access for virtual machines Need to understand technology base Many roads to network access for VMs Performance: Depends on where to communicate Each has it's pros and cons In some cases software can be made faster! But sometimes hardware support the only viable solution Migration: software only easier to move? 61