Service Edge Virtualization - Hardware Considerations for Optimum Performance Executive Summary This whitepaper provides a high level overview of Intel based server hardware components and their impact on virtualization. It is intended to be used as a guide for software developers and system integrators to help them in their choice of open standard networking platforms for service edge applications..
Choosing the right hardware Over the past several years, focus of server virtualization has increasingly shifted to networking space. Both wireless and wireline networking device vendors have created virtual versions of their dedicated hardware devices. This has also provided opportunities for many independent software vendors (ISVs) and new entrants to provide innovative ways to deliver networking services. This is made possible thanks to advancement in generic x86 Intel architecture-based hardware platform designs. Features such as Intel virtualization technologies (VT-x, VT-d, VT-c), Intel TXT, AES-NI, SR-IOV, PCI Express pass-through, DPDK and QuickAssist make it possible to run many workloads, that previously required specialized ASICs and Network Processor Units (NPUs), in traditional off the shelf Intel based servers. As such, a growing number of workloads can now take advantage of Intel virtualization technologies and provide even lower cost alternatives to traditional proprietary networking gear. However, virtualization has several unintended problems. One key problem is to map virtualization to hardware ie. CPU, Memory, IO and Storage. It is very common to see most software vendors using a dual socket server as a reference platform for their application. In some cases however, a standard dual socket server can be overkill for specific applications or workloads, does not offer the most appropriate configuration or hardware is simply not optimized for network function scale-up or scale-out. In such cases, customers are then left to determine how to best optimize hardware and software to match their desired workloads and throughput. Each component that makes up a system must be considered, and that includes motherboard design, BIOS, IPMI, CPU, integrated Intel VT features, memory mapping and memory type, PCIe lane mapping, integrated IO and storage. Wireless and wireline networks have already taken great advantage of virtualization. Specifically in the Radio Access Network (RAN), virtualization has provided the means for service providers to replace many services such as the Evolved Packet Core (EPC) consisting of PCRF, MME, PDN, SGW and PDN GW functions or Deep Packet Inspection (DPI ) in the GiLAN etc with Virtual Network Functions (VNFs) running on standard servers. In wireline data networks, virtualization has caused even greater disruption; many services that ran on dedicated equipment at customer premises are now replaced by a single box that can consolidate many of those services, and in some cases allow customers to move those services in to the cloud. This whitepaper provides a high level overview of Intel based server hardware components and their impact on virtualization. It is intended to be used as a guide for software developers and system integrators to help them in their choice of open standard networking platforms.
Hardware considerations NFV CPU to PCIE lane mapping With regard to non-uniform memory access (NUMA) all modern x86 multi-socket architected support NUMA in hardware. When running a Virtual Machine (VM) it should be deployed in a NUMA node. Workload performance will be degraded if it has to cross NUMA node boundaries. NUMA node memoryaffinity is typically handled by operating systems and hypervisors, however some applications may require it to be manually configured. Memory bank population In a multiprocessor system is it important pay attention to how memory DIMMs are populated. A balanced configuration can be optimized for density and/or performance. It is relatively easy to understand how to configure servers for maximum memory capacity based on the number of DIMM slots available and the type of memory allowed. But it is not always clear which factors to consider for performance optimization. Several factors need to be considered and these are enumerated below: DIMM Speed Rank The faster the memory, the lower the time memory requests have to wait before they can be executed. This could depend on application. More ranks can help applications parallelize memory requests but for applications that need low latency, this could result in lower performance CAS Latency This has to do with DRAM response time. Basically the number of clock cycles the memory controller has to wait before it can issue CAS. This goes hand in hand with DIMM speed. When selecting DIMMs, pay attention to CAS latency as well as DIMM speed, and in the resulting memory latency. ECC or non-ecc? For networking and mission critical services, ECC memory capability is an essential feature for server reliability. ECC support is mandatory in most data centers, cloud and enterprise IT infrastructures as it is a crucial element in providing enterprise customers with acceptable service levels. Memory Type UDIMMs UDIMMs are not a good option for applications that require low latency. Memory controllers access the memory on the DIMM by interfacing with each DRMA individually, resulting in low performance and capacity.
RDIMMs In the RDIMM case, the memory controller interfaces with the register on the DIMM. The register then communicates with actual DRMA. This adds an extra clock cycle but allows for better latency and density than on UDIMMs. LRDIMMs For LRDIMMs, the register is replaced by an isolation memory buffer (imb by Inphi).The memory buffer re-drives all of the data, command, address and clock signals from the host memory controller and provides them to the multiple ranks of DRAM. LRDIMM supports rank multiplication, where multiple physical DRAM ranks appear to the host controller as a single logical rank of a larger size. However, the relative high cost of LRDIMM is a major limitation. Intel Virtualization Technologies (VT) Intel VT provides hardware offload features for CPU virtualization, memory virtualization, I/O virtualization, Intel Graphics virtualization and Virtualization of Security and Network functions. Intel VT-x HW assist for Virtualization VT-x previously codenamed Vanderpool provides CPU and chipset features to enable virtualization. The following is a brief explanation of key CPU and memory virtualization features: Intel Flex Priority Improves Virtual Machine access to the Task Priority Register (TPR). This reduces IO performance and eliminates the overheard that most VMs have with regards to guest TPR. It is designed to accelerate VMM interrupt handling and thereby improve overall performance. Intel Flex Migration This allows the Virtual-Machine Monitor (VMM) to abstract available processor features and report a consistent set of features to the guest OS (VM). Thanks to Intel Flex Migration, the VM can seamlessly migrate from one generation of Intel processor to the next or even across different families of Intel processors. CPUID Virtualization This allows a VMM to virtualize CPUID instruction. With this feature VMM can migrate to different machine/processor that support different features. Virtual Processor IDs (VPID) With this feature VMMs can support multiple VMs. With VPID active transition, the look-aside buffer (TLB) is not flushed on VM entry or exit. Performance benefits are workload dependent.
Guest Pre-emption Timer With this features VMM can preempt execution of a guest OS. It is programmable timer and can be set to cause a VM to exit when the timer expires. It also allows the VMM to add quality of service (QoS) features in specific implementations. Descriptor Table Exiting Allows a VMM to protect a guest from internal attack. This is useful feature for VM and security agents to enhance security features. Pause-Loop Exiting (PLE) To make sure no single vcpu that is running a VM is locking up or holding a spin-lock. With this feature spin-locks are detected and PLE allows the VMM to exit the lock-holding VM. Extended page Table (EPT) This allows virtualization of physical memory. When EPT is in use, certain physical addresses are treated as VM-physical addresses and are not used to access memory directly. Instead, VM-physical addresses are translated by navigating through a set of EPT paging structures to produce physical addresses that are used to access memory. Intel VT for directed IO (VT-d) This feature is implement in the chipset. When Intel VT-d is enabled, the guest OS can choose to use either the traditional approach or, as needed, pass-through devices. In pass-through mode, the PCI* device is not allocated by the hypervisor and, therefore, the device can be allocated directly by a VM which now sees the physical PCI device. Intel VT for Directed I/O provides VMM software with the following capabilities: I/O device assignment: for flexibly assigning I/O devices to VMs and extending the protection and isolation properties of VMs for I/O operations. DMA remapping: for supporting address translations for Direct Memory Accesses (DMA) from devices. Interrupt remapping: for supporting isolation and routing of interrupts from devices and external interrupt controllers to appropriate VMs. Interrupt posting: for supporting direct delivery of virtual interrupts from devices and external interrupt controllers to virtual processors. Reliability: for recording and reporting of DMA and in interrupt errors to system software that may otherwise corrupt memory or impact VM isolation.
Intel Virtualization Technology for Connectivity (Intel VT-c) Intel Virtualization Technology (Intel VT) for Connectivity (Intel VT-c) enables lower CPU utilization, reduced system latency, and improved networking throughput. VT-c is a key feature of many server class Intel network controllers. Intel VT-c consist of the following main virtualization technologies: Intel TXT Virtual Machine Device Queues (VMDQ) VMDQ offloads the sorting burden from the VMM to the network controller, to accelerate network I/O throughput. This is a hardware assist feature of Intel networking silicon that improves VM networking performance and lowers VMM CPU utilization. Single-Root I/O Virtualization (SR-IOV) SR-IOV is a part of the PCI Special Interest Group (PCISIG) specification. SR-IOV provides a standard method for sharing a single PCIe device concurrently with multiple VMs. But the more interesting feature is the ability to allow for creation of multiple virtual functions from a single PCIe function. Each virtual function gets a slice of the physical function and can be allocated to a VM using standard interfaces. Intel Trusted Execution Technology (Intel TXT) provides a hardware-based security foundation on which to build and maintain a chain of trust to protect the platform from software-based attacks. In virtualization, Intel TXT can help prevent spread of infected VMs from one machine to another. For example in case of a compromised VM, Intel TXT can prevent migration of this VM to another machine by creating "trusted pools", thus limiting VM migration. Intel AES-NI AES-NI is set of six new instructions that leverage the fast processing of Rijndael algorithms (AES) in hardware. For networking application this feature can encrypt and decrypt messages and data without consuming precious CPU cycles. AES-NI is available with all major hypervisors and supports AES-NI usage in guest OSs. Intel DDIO Intel Data Direct IO Technology allows Intel network adapters to talk directly with the processor cache. By allowing direct access to cache instead of first accessing main memory. Intel DDIO is available on all Intel Xeon E5 processors and has no hardware, OS or Hypervisor dependency. Application and system integrators that want to take advantage of this feature need to make sure IO devices/add-in cards i.e. Ethernet, RAID controllers etc. are built with DDIO technology. For many NFV use cases DDIO can greatly improve performance as data from the NIC is available faster for processing and service chaining. Intel CMT/CAT Intel Cache Monitoring Technology / Cache Allocation Technology (CMT/CAT) is available on all Intel Xeon Processor D and Intel Xeon Processor E5-2600 v4 (and some v3) SKUs. CMT/CAT allows hypervisors to determine cache utilization by application. This provides an even playing field for multiple VNF or VMs running on the same processor. CMT can be used to monitor cache occupancy per VNF or VM, whereas CAT allows control of a CPUs shared last level cache (LLC). With CMT and CAT, applications can implement QoS by metering and monitoring VNFs or VMs i.e. take action if a VNF is using
less than or greater than x% of LLC. This technology also provides a way to detect noisy or aggressive VNFs or VMs and implement alarms and actions. DPDK The Data Plane Developer Kit (DPDK) is probably the most important sofwtare for transforming a generic Intel processor in to a packet processing engine. DPDK is an absolutely must-have for any networking application that requires 10 Gbps or higher line-rate throughput. In virtualized environments, DPDK can be used for accelerating virtual switches (vswitch) as well as applications and VNFs. DPDK has very strong open source community support via http://dpdk.org, http://openvswitch.org and https://fd.io. For SDN and NFV deployment, DPDK-accelerated Open vswitch, or ovs-dpdk, is a key component for applications that require line rate performance. Fast Data I/O (Fido) which leverages Vector Packet Processing (VPP) technology from Cisco takes this a step further with virtual switch and virtual router functionality based on the DPDK. This is good news for application developers trying to develop high bandwidth, low latency and line rate applications. Intel QuickAssist Technology Intel QuickAssist technology (QAT) provides hardware offload for crypto functions and compression. This is typically implemented via an onboard chipset or an add-in PCIe adaptor card. Some Intel System-on-a- Chip (SoC) processors have embedded QAT support including Intel Xeon based SoCs. Current generation Intel QAT supports SR-IOV with up to 32 virtual functions. Each virtual function can be assigned to individual VNFs or VMs, bypassing the hypervisor and virtual switch resulting in line rate performance for crypto and compression applications.
Advantech Platforms designed for Network Service Edge virtualization Advantech understands the need for hardware designs that are optimized for network specific workloads. Customers planning to deploy SDN and NFV application can rely on Advantech for application-ready platforms leveraging all the features of Intel processors, chipsets and LAN controllers. Each platform is designed with one of following key ingredients: Latest generation Intel Xeon Processors Intel Vt-x (Vt-d, Vt-c), AES-NI, TXT, CMT/CAT, DDIO Balanced DDR4 Memory Balanced PCIe lane mapping and IO pinning PCIe IO with SR-IOV Intel QuickAssist Technology DPDK Next Gen CPU Figure 1 Platform Selection Guide
FWA-3260 Intel Xeon Processor D System On Chip up to 16 cores and 1.5MB last level cache per core 4 x DDR4 ECC UDIMMs/RDIMMs and RDIMMs, up to 2400MHz and up to 128GB 4 server class GbE ports implemented by an Intel i350 Ethernet Controller with advanced LAN bypass 2 GbE management ports 2 x 10GE SFP+ ports 2 Network Mezzanine Card bays for PCIe gen.3 based port expansion with 1GbE and 10GbE and 40GbE ports 1 x PCIex8 full-height / half-length add-on cards Two 2.5" SATA HDDs/SSDs and two M.2 SSD sockets IPMI2.0 compliant Remote Management (optional) FWA-5020 Single or dual Intel Xeon E5-2600 v4 processor(s) up to 145W TDP DDR4 2400 MHz ECC up to 512 GB (CPU SKU) 4 x GbE with LAN bypass, 2 x GbE for Mgmt (SKU dependent) 2 x 10GbE SFP+ NICs (SKU dependent) Up to 4 x NMC (Network Mezzanine Card) slots for a wide range of GbE, 10GbE and 40GbE NMCs with or without advanced LAN bypass 2 x 2.5" SATA HDDs/SSDs IPMI 2.0 compliant Remote Management Advanced Platform Reliability and Serviceability 2 x internal CLC PCIE card (Dual DH8955) supported (SKU dependent) FWA-6520 2 x Intel Xeon E5-2600 v3/v4 processors up to 145W TDP DDR4 1866/2133 ECC memory up to 512GB PCIe gen. 3 support Up to 8 x NMC (Network Mezzanine Card) slots for a wide range of GbE, 10GbE and 40GbE NMCs with or without advanced LAN bypass 2 x PCIe x16 slots support FH/HL add-on cards 2 x 2.5" removable external SATA HDDs/SSDs IPMI 2.0 compliant Remote Management Advanced Platform Reliability and Serviceability