Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, t he Energy Efficient Solutions logo, mobilegt, PowerQUICC, QorIQ, StarCore and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, ColdFire+, CoreNet, Flexis, Kinetis, MXC, Platform in a Package, Processor Expert, QorIQ Qonverge, Qorivva, QUICC Engine, SMAROS, TurboLink, VortiQa and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. 2014 Freescale Semiconductor, Inc.
Network Function Virtualization I/O Virtualization Mechanisms Vhost-net Details NFV with virtio Issues Workaround Proposed Architecture 2
Network Function Virtualization 3
Dynamic network service launch restricted by existing hardware Increased capital investment New infrastructure = Increased power requirement Increased cost of energy Shorter procure-design-integrate-deploy cycle due to rapidly evolving hardware Shorter hardware lifecycles 4
Allows for dynamic network service launch Reduces time for deploying new network services Same infrastructure can be used/shared for hosting multiple services Reduced capital investment Reduced cost of energy Offers Agility and Flexibility Quickly scale up or down services based on the demand Reduces time to deploy new networking services 5
Network Functions Data plane and Control plane: Firewall, VPN, U etc. Virtualization Network Function Network Function Network Function Hypervisor managed VMs IT virtualization technology to consolidate network equipment. OS OS Hypervisor OS hardware 6
Server Server Pkt Pkt Pkt VM1 VM2 VM3 VM4 VM1 NFVM VM3 VM4 Pkt Switch Hypervisor Shared NIC Switch Hypervisor Shared NIC Pkt NFVM1 NFVM2 NFVM3 NFVM4 Pkt Switch Hypervisor Shared NIC Network Appliance Pkt 7 Pkt
I/O Virtualization Mechanisms 8
Guest OS Hypervisor (Full Virtualization) Traps Device Emulation Guest OS Hypervisor (ParaVirtualization) Para-drivers Interfaces Device Emulation Hardware Hardware 9
Guest OS HV (Para) Front end drivers Virtio Back end Drivers Device Emulation virtqueues 10
Partition 1 Partition 2 Each guest sees a private VirtIO network device on PCI bus VirtIO network driver is needed in guest App OS QEMU QEMU App OS App Standard tun/tap interface is used by QEMU to bridge to host network stack tun/tap Bridge Backend driver in QEMU eth0 Linux Kernel / KVM Hardware 11
Vhost-net is a module in Linux Kernel. Transfers buffers between host and guest. Bypasses QEMU for virtqueue operations Partition 1 Partition 2 App App App QEMU QEMU OS OS vhost Linux Kernel / tun/tap KVM Bridge eth0 Hardware 12
Vhost-net Details 13
Guest VNIC RX vq vhost thread TX vq Linux Kernel/KVM Tap Ethernet Driver Bridge NIC 14
QEMU User Space ioctl vhost_dev_ioctl Kernel Space A worker thread is created per vhost instance kthread_create This thread is responsible for handling both Rx and Tx jobs for a vhost device 15
RX Packet Flow 16
START raise_softirq_irqoff qman_p_poll_dqrr Hardware Interrupt irq_exit priv_rx_default_dqrr Interrupt Handler do_softirq other routines Other Routines net_rx_action netif_receive_skb napi_schedule dpaa_eth_poll A Softirq Hardirq 17
A dev_hard_start_xmit br_handle_frame br_handle_frame_finish Bridge Handling of skb tun_net_xmit skb_queue_tail Tun receives the skb and queues it br_forward wakeup_poll br_dev_queue_push_xmit This is going to wake up tun_chr_poll dev_queue_xmit Softirq 18
START Return True kthread_should_stop False RX processing False True need_ schedule schedule list_empty True False Work dequeued 19
vhost_worker() Return False True Poll handle_rx() sock_len False Return False Total bytes received >= Vhost Budget True Socket refers to tap socket Updated used list sock_recvmsg() calls tun_recvmsg tun_do_read() Copy Packet 20
A Free Rx Buffer > size/2 try_fill_rcv try_fill_rcv received < budget schedule_delayed refill_work Done. Added empty buffers napi_complete 21
TX Packet Flow 22
Traffic from Guest OS Virtio-net Driver Ring_Full Report Congestion Write Desc in Available Ring 23
start_xmit() virtqueue_kick() virtqueue_kick_prepare False Return True virtqueue_notify() iowrite16 kvm_io_bus_write 24
NFV with VirtIO 25
Server Pkt VM1 VM2 VM3 VM4 Switch Hypervisor Shared NIC Packet Drop Pkt NFVM1 NFVM2 NFVM3 NFVM4 Pkt Packet Flooding Switch Hypervisor Shared NIC Network Appliance 26 Pkt
27
Vhost Thread Fails to Schedule due to lower Priority than SoftIRQ Tap 1 2 500 Tap Queue Full Packets keep on dropping at tap level Bridge NIC driver Ingress Rate Physical NIC Packet Flooding NAPI processing 28
29
Vhost Rx Processing NAPI SoftIRQ Processing Vhost_Rx NAPI Handler Process Pkt True Congestion False Rx Ring < Lower Watermark Disable HWI Send Buffer to Bridge/Tap Enable HWI 30
Vhost Thread Now Vhost can schedule during packet flooding. Tap Bridge 1 2 Last index last index Packet gets queued Now Tap Queue Full Congestion Notification sent to bridge Bridge stops forwarding NIC driver Physical NIC Packet Flooding NIC driver Disable HWI 31 Packet Drop at HW
egress Rate 1000 900 800 700 600 500 400 300 200 100 0 0 500 1000 1500 2000 2500 Ingress Rate Current Design Modified Design 32
33
Dynamic vhost thread controlled NIC driver instance. Vhost thread registers with NIC driver: Rx Function handle Buffer depletion handler Driver provides following during registration to Vhost: Tx Function Handler Buffer pool management handle 34
KVM Rx Ring GUEST VNET1 Vhost Thread Tx Ring New Network Acceleration Interface Driver Vhost Initialization Fill Tx Handler, Buffer Pool Management handler Register and for Return Driver Instantiation Buffer Pool Rx FQ/BD Ring Ethernet Port Tx FQ/BD Ring BD Ring Creation, Buffer Pool Creation PCD block 35
GUEST VNET1 KVM Rx Ring Tx Ring Vhost Thread Rx New Network Acceleration Interface Driver Tx Buffers Seeded by Vhost Buffer Pool Rx FQ/BD Ring Packet DMAed to buffer and Sent to Rx Ring No Rx Buffers Left in Buffers available Pool in Buffer Pool Buffer Buffer Acquire Acquire Buffer Failure Requested Acquire Succeed Ethernet Port PCD block Packet Dropped at HW Pkt 36
Vhost-net interfaces can get saturated at high traffic rates No mechanism to communicate Vhost-net interface saturation information to host With a more integrated guest/host Virtio infrastructure, it s possible to avoid system saturation and increase throughput Hardware acceleration can be leveraged for Vhost net interfaces 37